NA values when trying to calculate means

Hi,

I’m trying to calculate the mean of a certain column after having used group_by to group my table by a certain category however when I use summarise(mean(column name)) some means come out as NA whereas I’m pretty sure there are no missing values in my dataset. (I’m using the built-in nycflights13 dataset).

I was wondering where the NAs where coming from?

Thanks:)

Hello,

You probably have some sort of missings or NAs getting introduced. So you will have to check each part of your code. You'll need to share code for us to see what's going on. See here: FAQ: How to do a minimal reproducible example ( reprex ) for beginners

If you just simply want to calculate mean with NAs in you can do:

mean(df, na.rm = TRUE)

Recommended to be sure. An easy way might be the skimr packages skim function which gives you column stats and you can look out for NAs

0 will be considered to be a legitimate value in a set of numeric values so it will be part of the calculation :slight_smile:

Ok thanks a lot! :))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Hi, thanks for your answer, is there a way I could check for missing values without downloading packages?

Hi, I tried using what you suggested and that gave me values so I’m assuming I did in fact NA Values.
Thanks:)

Was just wondering if 0s are treated as NA values in the function you gave me?

@JackFrench, no so missing will not be handled as 0's. The calculation will take place with fever values. So lets say one column has 40 eligible values all 40 will be summed and divided by 40 whereas say you have one column with only 34 valid values only those 34 values will be summed and divided by 34 etc. So you will get the correct answer and not deflate your means.

Thanks for your answer:)

I might have misworded my question but what I meant was, say if there’s a 0 in one of my columns, would that be considered as a missing value in the function you gave me? As in I would still want that 0 to be used to calculate the value of the mean

Thanks:)