Problem with a dataset colunms that give NA

Hi everyone!
I Need a really big help, I have a problem with a dataset in xlsx that I have to analyze for an exam at university. Below I have pasted the link of the dataset and the R script that I wrote for the exam.

The problem of my dataset is on the columns 6° (drinks).

I have a problem in:

  1. summary info to calculate the mean of the 345 different values of the column "drinks" (the column with the problems)

  2. when I try to write the summary table, the column "drinks" give me NA results for each different information (like 1st quartile, median, 3rd quartile, mean, min, max)

  3. I can't divide my dataset into 2 groups, one that has drinks under 5 and the other one that has drinks more than 5

  4. I can't plot drinks with abline to compare with the graphs of "CMV"

  5. I can't calculate the t-test

I know that there are 9 rows With a 0: I tried to change them into 0.0001 but nothing changed on my R-script. I also checked that column in format information but nothing of different from the other columns (like "CMV" & co.) I really can't understand why.

I know that probably it's an easy and stupid problem/error but now I can't find the solution about that.

someone can help me?

Thanks everybody and sorry for the trouble

Dataset XLSX:!lFZDjATb


PS. sorry if i didn't selected the right section

Most functions have a na.rm option that will handle NA values ("NA remove"). e.g.

mean(yourdata, na.rm = TRUE)

I'm sorry but nothing changed, also in the mean.

I wrote: mean($drinks, na.rm = TRUE)

Than R said: Warning message:
**In mean.default($drinks, na.rm = TRUE) **
argument is not numeric or logical: NA

I know that is a warning and not an error but I need a result that is not NA

So the error says that$drinks is not numeric or logical. Maybe it is character or factor? You can't calculate the mean of character or factor.

Ok, but how it's possible? I checked all the file, I checked the format of the column. How I can change the column "drinks" as numeric or logical?

When you read from excel sometimes there is character data contaminating a numeric column and then readxl is forced to read the entire column as character. You can see the column types when you look at the data frame summary() or glimpse() or in the environment pane.

You can change the type using as.numeric() for example.

1 Like

Thank a lot Woodward! I solved the problem!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.