dplyr NSE - group_by returns error from variables with spaces

shib · October 22, 2018, 3:32am

Hello, I'm seeing a strange dplyr error using group_by with backticked variables that have spaces. [X-post on SO: https://stackoverflow.com/questions/52921846/dplyr-nse-dplyrgroup-by-returns-error-from-variables-with-spaces/52921953?noredirect=1#comment92753287_52921953]

I have the below data frame:

> df_spread
# A tibble: 10 x 5
   `Property ID` year  commodity unit  
           <dbl> <chr> <chr>     <chr>   
 1         63154 1997  U3O8      lbs         
 2         35020 1997  U3O8      lbs         
 3         68077 1997  U3O8      lbs         
 4         68074 1997  U3O8      lbs         
 5         68075 1997  U3O8      lbs         
 6         68076 1997  U3O8      lbs         
 7         66349 1997  U3O8      lbs         
 8         54791 1997  U3O8      lbs         
 9         63065 1997  U3O8      lbs         
10         63063 1997  U3O8      lbs

I want to group this by ID, but I see an error because my variable name contains spaces, which is usually handled in dplyr functions by using backticks. Why instead does it record extra backticks?

> df_spread %>% group_by(`Property ID`, year, commodity, unit)
Error in grouped_df_impl(data, unname(vars), drop) : 
  Column ``Property ID`` is unknown

Why does the above not work while the below, assigning a different variable name without spaces to the same backticked variable, works?

> df_spread %>% group_by(Prop = `Property ID`, year, commodity, unit) 
# A tibble: 10 x 6
# Groups:   Prop, year, commodity, unit 
   `Property ID` year  commodity unit   Prop
           <dbl> <chr> <chr>     <chr>  <dbl>
 1         63154 1997  U3O8      lbs     63154
 2         35020 1997  U3O8      lbs     35020
 3         68077 1997  U3O8      lbs     68077
 4         68074 1997  U3O8      lbs     68074
 5         68075 1997  U3O8      lbs     68075
 6         68076 1997  U3O8      lbs     68076
 7         66349 1997  U3O8      lbs     66349
 8         54791 1997  U3O8      lbs     54791
 9         63065 1997  U3O8      lbs     63065
10         63063 1997  U3O8      lbs     63063

addtl info:

str(df_spread)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   10 obs. of  5 variables:
 $ Property ID: num  63154 35020 68077 68074 68075 ...
 $ year       : chr  "1997" "1997" "1997" "1997" ...
 $ commodity  : chr  "U3O8" "U3O8" "U3O8" "U3O8" ...
 $ unit       : chr  "lbs" "lbs" "lbs" "lbs" ...
 $ nonzero    : num  0 0 0 0 0 0 0 0 0 0 ...

I'm not entirely sure what's going on, but my R version is 3.5.1 and dplyr version 0.7.6

EDIT: Nevermind, solved it by upgrading to dplyr_0.7.99.9000. Don't know why this occurred and would be interested in hearing relevant context, but not urgent and feel free to delete this thread!

mara · October 22, 2018, 11:04am

Just pulling this out into a separate response so you can mark the post as solved (feel free to reply to yourself with the solution and mark it as such).