Help with subsetting data and manipulating it

Hi y'all still new to r but I 've looked so I figure I might as well ask. Anyway I'm working with a pretty big dataset and I finally figured how to filter out what I want, but when I try to take the mean of the subsetted data all I get in return is NA for an answer. The code I'm using to filter the data is below.

test<-subset(data1,prodDate=="1999", select = prod)
I should mention the data1 object is stored under

data1<-na.omit(data)

If that means anything. Also if there's a package that makes this easier I'm all ears since it is already painful having to do this thanks.

the tidyverse package is recommended. There is a great free book you can study from

@nirgrahamuk is pointing you in the right direction. This is what a similar piece would look like, using the built in mtcars dataset to find all the car models with 4 cyclinders

suppressPackageStartupMessages(library(dplyr)) 
mtcars %>% filter(cyl == 4) -> result
result
#>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 2  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> 3  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 4  32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> 5  30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> 6  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> 7  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> 8  27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> 9  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> 10 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> 11 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Created on 2020-04-06 by the reprex package (v0.3.0)

So I managed to figure it out and use the book, but when I try to take the mean of the filtered and selected data I get NA as well as the warning message that the argument is not logical or numeric. How do I take care of this. Below is how I wrote my code, it's crude but it works for me.

test<-filter(data, prodDate==1999)
test2<-select(test,prod)
meantest2)

probably this is a typo and you meant :

test<-filter(data, prodDate==1999)
test2<-select(test,prod)
mean(test2)

if your data has NA's in prod, this wont have addressed them

test<-filter(data, prodDate==1999) %>% select(prod) %>% na.omit()
mean(test)

or

test<-filter(data, prodDate==1999) %>% select(prod) 
mean(test,na.rm=TRUE)

Thanks for the select code, much cleaner. I still get the NA response, I ran the summary function and it's saying the numbers in the list are characters

it is important to know ones data.
There is a great package you could install called skmir.
once you load the library(skimr)
you will be able to skim(data) and know whether your datatypes are numeric or character, and all sorts of things.
also you could share a sample of your data here. the first 5 records

dput(head(data,n=5))

copy and paste the structure that gets output here.

if you have a column that is numbers but stored incorrectly as character. the base method as.numeric(x) can be used. otherwise readr package has useful parse_number() function that is more powerful and can handle more subtle numeric representations than base.

Thanks for the heads up, I ended up figuring the solution using

as.numeric(as.character(unlist(test)))
Then I was able to take the mean of my filtered data.

seems a little odd to have an approach of converting to character before converting to numeric, but sometimes its a case of if it works it works I suppose

Yeah, it seems there were some strings or something in there but as you said it works thanks again

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.