Creating new column with new values calculated from other columns

I'm sorry I'm still learning R and if this is a bad question / formatted wrong!

I have a large dataset with demographic data--two columns of which are Weight and Height. Using data for Weight and Height, I want to create a "BMI" column. With my number, the BMI equation would be: ((Weight)/(Height/100)^2).

Here is my sample data--of note, where "NA" is, my data says "NULL"--I could not put NULL and create a reprex so NA it is:

DF <- data.frame(
  ID = c("A", "B", "C", "D", "E", "F"),
  Weight = c(15, 56.3, 56.8, 16, 56.2, 14.5),
  Height = c(103, NULL, NULL, 99, 185.4, 88)) %>%
  replace(.=="NULL", "0")
#> Error in data.frame(ID = c("A", "B", "C", "D", "E", "F"), Weight = c(15, : could not find function "%>%"

For my data with NULL, I tried to make it numerical with this coding below, and added a column with just the first step in my calculation:

DF2<- DF %>% replace(.==="NULL", "0")
DF2$BMI <- with(DF2, Height/100)

What I get is this error message: "Error in Height/100: non-numeric argument to binary operator"

I think my original dataset must not be numeric. Help!

Created on 2020-11-10 by the reprex package (v0.3.0)

I think you are right here. But you can use str(DF) to check the type of each of your columns.

If you are sure that the Height column only contains values that can be converted into numeric, try using DF2$Height<-as.numeric(DF2$Height)

By the way, are you sure that NA wouldn't be a better way to display missing values in your data? NA would alway show up as such while 0 will give results in most operations. The could, for example, skew the results if you want to compute the mean of some of your data. NAs on the other hand can just be excluded, so that you'll get valid results for the rest of your data.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.