Problem with case_when

Hello,
I am trying to create two new variables. One indicating the interdependence of Central Asian countries with China (depend_CN) and another with Russia (depend_RU)
For this, I sum Import and Export and divide by gdp, which are three other columns in the same dataset

I wrote this code, but it doesn't give me what I want. I should have NAs in depend_CN when the partner is Russia, but it calculates the interdependence for every case. Same for depend_RU

data_dependence$depend_RU <- dplyr::case_when(
  data_dependence$Partner=="China" ~(data_dependence$depend_CN <- (data_dependence$Import+data_dependence$Export)/data_dependence$gdp),
  data_dependence$Partner=="Russian Federation" ~(data_dependence$depend_RU <- (data_dependence$Import+data_dependence$Export)/data_dependence$gdp)
)

The reprex is:

data_dependence<-data.frame(
  stringsAsFactors = FALSE,
           Country = c("Uzbekistan","Uzbekistan",
                       "Uzbekistan","Uzbekistan","Uzbekistan","Uzbekistan",
                       "Uzbekistan","Uzbekistan","Uzbekistan","Uzbekistan"),
              Year = c(2015,2016,2016,2017,2017,
                       2018,2018,2019,2019,2020),
               gdp = c(81847410182,81779012351,
                       81779012351,59159945321,59159945321,50392607758,
                       50392607758,57921286440,57921286440,NA),
           Partner = c("Russian Federation","China",
                       "Russian Federation","China","Russian Federation",
                       "China","Russian Federation","China","Russian Federation",
                       NA),
            Import = c(2221187873,2007463677,
                       2092454055,2749423215,2858398083,3942096467,3317879453,
                       5044570600,3907969155,NA),
            Export = c(575837496,1607057922,
                       777087867,1471448860,1046191944,2324394704,1063375312,
                       2180633878,1178739024,NA)
)

I appreciate your help on that!

Solved. I believe I misinterpreted the function

data_dependence$depend_CN <- dplyr::case_when(
  data_dependence$Partner=="China" ~((data_dependence$Import+data_dependence$Export)/data_dependence$gdp)
)
data_dependence$depend_RU <- dplyr::case_when(
  data_dependence$Partner=="Russian Federation" ~((data_dependence$Import+data_dependence$Export)/data_dependence$gdp)
)
1 Like

I agree. The assignment operator <- is not a best idea within a case_when() call

What I would recommend for your consideration is using the tidyr::pivot_wider to transform the import & export columns to separate ones per China & Russia. When tackling similar issues myself I have found it helpful to move from "long" to "wide" data frame format.

library(dplyr)

data_dependence %>% 
  mutate(Partner = case_when(Partner == "Russian Federation" ~ "RU",
                             Partner == "China" ~ "CN")) %>% 
  tidyr::pivot_wider(names_from = Partner,
                     values_from = c("Import", "Export")) %>% 
  mutate(depend_ru = (Import_RU + Export_RU) / gdp,
         depend_cn = (Import_CN + Export_CN) / gdp)
2 Likes

Thank you Jindra. It's working perfectly well

1 Like

Hi.
I see you have an answer already but I hope my response is of help to you and any others, as well.

I have learned over time the dplyr implementation of case_when is best used in conjunction with the dplyr mutate function. Also I suggest not using the left assign operator in the way you have done here, it seems like it could confuse other underlying functions that are operating in the background. Also, if you want to create NA values (or more correctly, create/discreate the presence/absence of any value whatsoever...however you want to mentally conceive this concept I guess) in a test dataframe, or any other dataframe, remember to comply with the "data type" of each column. In your posted example, I ran your code and you are getting lucky as your Import and Export columns are not being coerced to character/text data type. So your math calculations still work on those columns in your example. Data type mismatches are a common source of error ... so if you have not used it before, In R you can use NA_REAL_ instead of just NA for putting the absence of anything into numeric columns. And NA_CHARACTER_ for putting nothing into text columns, this will tell the respective data frame column to place an empty cell there but also respect the data type, so it will jive with the rest of the data in that column and not coerce to character, which is the default behavior. Cheers and happy R coding.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.