Adding a new colomn based on other colomns in a dataset

I have collect wind data from a particular area. (dataset='site', colomn='Wind').

I want to append a new colomn to the dataframe.

This colomn works on the wind data that I have collected.

I am using the mutate function to append but something is off!

The data appended is wrong. Can anyone spot the errors and help me fix it?

attach(site)
f<-function(Wind)
{
  if(Wind>9.7) {exp((Wind+log(1+Wind^3)))/Wind^2}
  else if (Wind==9.7) {10}
  else{(2+Wind^3)/Wind}
}
f(Wind)

site3<-mutate(site,fxn_wnd=f)
site3
detach(site)

In mutate(), you can't just give the function name, you need a fully valid expression:

site3<-mutate(site,fxn_wnd = f(Wind))

Also, mutate(data, f(column)) looks for column in the columns of data. So you don't need to attach(site), it is done automatically inside mutate().

The if, else control structure is not vectorized and whereas you can perform row-wise operations with dplyr I think a more idiomatic solution would be to use a vectorized function like case_when(), see this example:

library(dplyr)

# Made up sample data, replace this with your own dataset
site  <- data.frame(
    Wind = runif(10, 8.0, 10)
) 

# Relevant code
site %>% 
    mutate(fxn_wnd = case_when(
        Wind >9.7 ~ exp((Wind+log(1+Wind^3)))/Wind^2,
        Wind == 9.7 ~ 10,
        TRUE ~ (2+Wind^3)/Wind
        ))
#>        Wind      fxn_wnd
#> 1  9.775140 172138.05909
#> 2  9.621312     92.77752
#> 3  8.967470     80.63855
#> 4  8.701933     75.95347
#> 5  8.876800     79.02289
#> 6  8.889959     79.25634
#> 7  9.183090     84.54694
#> 8  9.307780     86.84965
#> 9  9.124359     83.47313
#> 10 9.896932 196848.12762

Created on 2022-04-29 by the reprex package (v2.0.1)

The value of fxn_wnd did not display 10, when Wind=9.7. This is also an issue in my code. Can you try it on your console and check if this condition is true.

    Ozone Solar.R Wind Temp Month Day      fxn_wnd
1      41     190  7.4   67     5   1 5.503027e+01
2      36     118  8.0   72     5   2 6.425000e+01
3      12     149 12.6   74     5   3 3.738506e+06
4      18     313 11.5   62     5   4 1.135978e+06
5      NA      NA 14.3   56     5   5 2.322179e+07
6      28      NA 14.9   66     5   6 4.408647e+07
7      23     299  8.6   65     5   7 7.419256e+01
8      19      99 13.8   59     5   8 1.359278e+07
9       8      19 20.1   61     5   9 1.077876e+10
10     NA     194  8.6   69     5  10 7.419256e+01
11      7      NA  6.9   74     5  11 4.789986e+01
12     16     256  9.7   69     5  12 1.000000e+01

Here's the output I get; which is not correct

attach(site)
library(dplyr)
site4  <- as.data.frame(site[,3])
colnames(site4)[1] <- "Wind"
site4
# Relevant code
site4 %>% 
  mutate(fxn_wnd = case_when(
    Wind >9.7 ~ exp((Wind+log(1+Wind^3)))/Wind^2,
    Wind == 9.7 ~ 10,
    TRUE ~ (2+Wind^3)/Wind
  ))
detach(site)   
Wind      fxn_wnd
1    7.4 5.503027e+01
2    8.0 6.425000e+01
3   12.6 3.738506e+06
4   11.5 1.135978e+06
5   14.3 2.322179e+07
6   14.9 4.408647e+07
7    8.6 7.419256e+01
8   13.8 1.359278e+07
9   20.1 1.077876e+10
10   8.6 7.419256e+01
11   6.9 4.789986e+01
12   9.7 1.000000e+01
13   9.2 8.485739e+01

This is a general computation issue not specific to R, you are trying to test equality between floating point numbers, to take tolerance into account use all.equal() instead of the == operator.

Thanks Andres; but I am not understanding how to use it. Is there an issue with my coding?

site4 %>% 
  mutate(fxn_wnd = case_when(
    Wind >9.7 ~ exp((Wind+log(1+Wind^3)))/Wind^2,
    all.equal(fxn_wnd,Wind,tolerance=10),
    TRUE ~ (2+Wind^3)/Wind))

Here is an example:

> sqrt(56)
[1] 7.483315
> sqrt(56) == 7.483315
[1] FALSE
> all.equal(sqrt(56), 7.483315, tolerance = 1e-6)
[1] TRUE

Hope it helps

I tried out the commands, it works partially for me. As you can see the table is populated by NA where I wanted 10. I cant answer why?!


site4  <- as.data.frame(site[,3])
colnames(site4)[1] <- "Wind"
site4
e<-all.equal(Wind,9.7, tolerance = 1e-6)
 site4 %>% 
  mutate(fxn_wnd = case_when(
    Wind >9.7 ~ exp((Wind+log(1+Wind^3)))/Wind^2,
    e==TRUE ~ 10,
    Wind <9.7 ~ (2+Wind^3)/Wind
  ))

    Wind      fxn_wnd
1    7.4 5.503027e+01
2    8.0 6.425000e+01
3   12.6 3.738506e+06
4   11.5 1.135978e+06
5   14.3 2.322179e+07
6   14.9 4.408647e+07
7    8.6 7.419256e+01
8   13.8 1.359278e+07
9   20.1 1.077876e+10
10   8.6 7.419256e+01
11   6.9 4.789986e+01
12   9.7           NA
13   9.2 8.485739e+01

Ok, all.equal() turned out not to be a straightforward solution because is also not vectorized, luckily, dplyr has a vectorized version, near().

library(dplyr)

# Sample data on a copy/paste friendly format
site <- data.frame(
       Ozone = c(41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16),
     Solar.R = c(190, 118, 149, 313, NA, NA, 299, 99, 19, 194, NA, 256),
        Wind = c(7.4, 8, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6, 6.9, 9.7),
        Temp = c(67, 72, 74, 62, 56, 66, 65, 59, 61, 69, 74, 69),
       Month = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5),
         Day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
)

# Relevant code
site %>% 
    mutate(fxn_wnd = case_when(
        Wind > 9.7 ~ exp((Wind+log(1+Wind^3)))/Wind^2,
        near(Wind, 9.7) ~ 10,
        TRUE ~ (2+Wind^3)/Wind
    ))
#>    Ozone Solar.R Wind Temp Month Day      fxn_wnd
#> 1     41     190  7.4   67     5   1 5.503027e+01
#> 2     36     118  8.0   72     5   2 6.425000e+01
#> 3     12     149 12.6   74     5   3 3.738506e+06
#> 4     18     313 11.5   62     5   4 1.135978e+06
#> 5     NA      NA 14.3   56     5   5 2.322179e+07
#> 6     28      NA 14.9   66     5   6 4.408647e+07
#> 7     23     299  8.6   65     5   7 7.419256e+01
#> 8     19      99 13.8   59     5   8 1.359278e+07
#> 9      8      19 20.1   61     5   9 1.077876e+10
#> 10    NA     194  8.6   69     5  10 7.419256e+01
#> 11     7      NA  6.9   74     5  11 4.789986e+01
#> 12    16     256  9.7   69     5  12 1.000000e+01

Created on 2022-04-29 by the reprex package (v2.0.1)

Even though I would strongly recommend you to work a little bit on your basic R coding skills, you can't just patch random code together and expect it to work, at the very least you need to read the documentation for the functions you intend to use.

Thanks Andres, I started learning R this month. I am a R newbie. But, did you notice that the values are still pretty off for all 'Wind' conditions. This is not an assignment with a due date. Just a learning experience.and it is frustrating not being able to get the right output, or 'not' being able to correctly find the flaw in this code.

Take a look at row 12,

#>    Ozone Solar.R Wind Temp Month Day      fxn_wnd
#> 12    16     256  9.7   69     5  12 1.000000e+01

Why am I still not getting 10 in the 'fxn_wnd' colomn? I got this answer too...I am going to spend time working on documentations as you said.

I wrote the code another way; this gave me a new set of values.

f<-function(Wind)
{
  if(Wind>9.7) {exp((Wind+log(1+Wind^3)))/Wind^2}
  else if (Wind==9.7) {10}
  else{(2+Wind^3)/Wind}
}
f(Wind)


site3<-mutate(site,f)
site3

    Ozone Solar.R Wind Temp Month Day    f(Wind)
1      41     190  7.4   67     5   1  55.030270
2      36     118  8.0   72     5   2  64.250000
3      12     149 12.6   74     5   3 158.918730
4      18     313 11.5   62     5   4 132.423913
5      NA      NA 14.3   56     5   5 204.629860
6      28      NA 14.9   66     5   6 222.144228
7      23     299  8.6   65     5   7  74.192558
8      19      99 13.8   59     5   8 190.584928
9       8      19 20.1   61     5   9 404.109502
10     NA     194  8.6   69     5  10  74.192558
11      7      NA  6.9   74     5  11  47.899855
12     16     256  9.7   69     5  12  94.296186
13     11     290  9.2   66     5  13  84.857391

That is a 10, just with a lot more precision and scientific notation. If you don't care about that much precision, then round the result to the number of digits you feel comfortable with.

You can't have 10 as an integer in a column full of floating point numeric values, in dataframes, columns can have only a single data class.