dplyr error - works fine with excell import but doesn't with db import

Hi,
I have a project where a testing file I was working on was saved in excel. Now I run the same code for exactly the same data but taken directly from the database and the code is not working.

The code I run is:

result <- merged.comments %>% 
  mutate(Blank = ifelse(test = (is.na(x = merged.comments$all_comment)),yes = 1,
                        no = ifelse((test = (str_length(string = merged.comments$all_comment) < 5) | (str_detect(string = all_comment,pattern = blank_statements))|(str_detect(all_comment, "(.)\\1{3,}"))),yes = 1,no = 0)),
         As_above = if_else(str_detect(all_comment, previous_statements), 1, 0),
         Blank_AComm_1 = ifelse(test = (is.na(x = merged.comments$AComm_1)),yes = 1,
                        no = ifelse((test = (str_length(string = merged.comments$AComm_1) < 5) | (str_detect(string = AComm_1,pattern = blank_statements))|(str_detect(AComm_1, "(.)\\1{3,}"))),yes = 1,no = 0)),
         As_above_AComm_1 = if_else(str_detect(AComm_1, previous_statements), 1, 0)) %>% 
  mutate_if(is.numeric, ~if_else(is.na(.), 0, .))

and error is:

Error: `false` must be a double vector, not an integer vector

Very weird because when I export the source of data to Excel and import again (from excel), the code works fine without any error.
The database is to big to give you access to data or an example.

Does R environment work differently when the same data is taken from different resources?

Can you please check whether this code works for you or not?

result <- merged.comments %>% 
  mutate(Blank = ifelse(test = (is.na(x = merged.comments$all_comment)),yes = 1,
                        no = ifelse((test = (str_length(string = merged.comments$all_comment) < 5) | (str_detect(string = all_comment,pattern = blank_statements))|(str_detect(all_comment, "(.)\\1{3,}"))),yes = 1,no = 0)),
         As_above = if_else(str_detect(all_comment, previous_statements), 1, 0),
         Blank_AComm_1 = ifelse(test = (is.na(x = merged.comments$AComm_1)),yes = 1,
                        no = ifelse((test = (str_length(string = merged.comments$AComm_1) < 5) | (str_detect(string = AComm_1,pattern = blank_statements))|(str_detect(AComm_1, "(.)\\1{3,}"))),yes = 1,no = 0)),
         As_above_AComm_1 = if_else(str_detect(AComm_1, previous_statements), 1, 0)) %>% 
  mutate_if(is.numeric, ~ifelse(is.na(.), 0, .))

The only change I made is that I changed if_else to ifelse, as it seemed to me that the last line is creating the problem because of strict type requirements of if_else.

If this does not work, can you please provide a REPRoducible EXample of your problem? It provides more specifics of your problem, and it helps others to understand what problem you are facing.

If you don't know how to do it, take a look at this thread:

2 Likes

Or if you want to keep if_else as opposed to ifelse then maybe have two separate mutate_ifs - one for integers, one for doubles:

mutate_if(is.double, ~if_else(is.na(.), 0, .)) %>%
mutate_if(is.integer, ~if_else(is.na(.), 0L, .))

I'm guessing that there are columns which are stored as integers in the database, and so R reads them as integers, whereas the Excel file had everything stored as decimals due to number formatting or similar.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.