wrangling the levels of ordered and unordered factors with mutate_at?

Hello #rstats #tidyverse humans!

I am wondering if someone can help me with a wrangling problem. I create two
sets of factor variables below (ordered and regualr factors). I want to change
the levels to lowercase, but after they've been converted.

Any help would be great!! Thank you!

library(tidyverse)
library(magrittr)
SampleData <- tibble::tribble(
          ~location, ~wind_gust_dir, ~wind_dir_9am, ~wind_dir_3pm, ~rain_today,
            "Perth",          "SSW",           "S",         "WSW",        "No",
      "MountGinini",            "W",           "W",          "NW",        "No",
       "Launceston",           "NW",           "E",          "NW",        "No",
           "Albany",             NA,         "ESE",          "NE",        "No",
         "Ballarat",            "N",         "NNE",           "N",       "Yes",
     "MountGambier",          "SSW",         "SSW",         "SSW",       "Yes",
         "Canberra",            "E",          "SE",          "SW",        "No",
         "Richmond",           "SE",           "S",           "E",        "No",
       "Launceston",          "NNW",         "ESE",         "NNW",        "No",
         "Richmond",          "NNE",            NA,         "NNE",        "No")

Then create some levels for factor variables.

factor_levels <- c("N", "NNE", "NE", "ENE",
                  "E", "ESE", "SE", "SSE",
                  "S", "SSW", "SW", "WSW",
                  "W", "WNW", "NW", "NNW")

Use dplyr to convert these to ordered factors.

# convert to orderd factors
SampleData %<>% 
    dplyr::mutate_at(.vars = vars(dplyr::matches("wind")),
                     .funs = funs(factor(., levels = factor_levels, 
                                             ordered = TRUE)))

Then I convert the other variables to regualr factors

# convert to factors
SampleData %<>% 
  mutate_at(.vars = vars("location", "rain_today"),
            .funs = funs(factor))

Then I end up with a tibble that looks like this:

SampleData
#> # A tibble: 10 x 5
#>    location     wind_gust_dir wind_dir_9am wind_dir_3pm rain_today
#>    <fct>        <ord>         <ord>        <ord>        <fct>     
#>  1 Perth        SSW           S            WSW          No        
#>  2 MountGinini  W             W            NW           No        
#>  3 Launceston   NW            E            NW           No        
#>  4 Albany       <NA>          ESE          NE           No        
#>  5 Ballarat     N             NNE          N            Yes       
#>  6 MountGambier SSW           SSW          SSW          Yes       
#>  7 Canberra     E             SE           SW           No        
#>  8 Richmond     SE            S            E            No        
#>  9 Launceston   NNW           ESE          NNW          No        
#> 10 Richmond     NNE           <NA>         NNE          No

And I am wodering how I can convert the factor levels to lowercase using dplyr::mutate_at() or purrr::map_df()? Or is it something else all together?

I think I need something like:
mutate(newvar = tolower(levels(oldvar))) but I need it to iterate over all
the columns...

Created on 2018-11-11 by the reprex package (v0.2.1)

To deal with factor, you can use {forcats}
https://forcats.tidyverse.org/

There is a function to do what you want : fct_relabel. You can apply a function to change the label. forcats::fct_relabel(my_factor, tolower).
Combine with a mutate_if, you can apply to all factor columns.

library(tidyverse)
library(magrittr)
#> 
#> Attachement du package : 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract
SampleData<-tibble::tribble(
  ~location, ~wind_gust_dir, ~wind_dir_9am, ~wind_dir_3pm, ~rain_today, 
  "Perth", "SSW", "S", "WSW", "No", 
  "MountGinini", "W", "W", "NW", "No", 
  "Launceston", "NW", "E", "NW", "No", 
  "Albany", NA, "ESE", "NE", "No", 
  "Ballarat", "N", "NNE", "N", "Yes", 
  "MountGambier", "SSW", "SSW", "SSW", "Yes", 
  "Canberra", "E", "SE", "SW", "No", 
  "Richmond", "SE", "S", "E", "No", 
  "Launceston", "NNW", "ESE", "NNW", "No", 
  "Richmond", "NNE", NA, "NNE", "No")
factor_levels <- c("N", "NNE", "NE", "ENE",
                   "E", "ESE", "SE", "SSE",
                   "S", "SSW", "SW", "WSW",
                   "W", "WNW", "NW", "NNW")
SampleData %<>% 
  dplyr::mutate_at(.vars = vars(dplyr::matches("wind")),
                   .funs = funs(factor(., levels = factor_levels, 
                                       ordered = TRUE)))
SampleData %<>% 
  mutate_at(.vars = vars("location", "rain_today"),
            .funs = funs(factor))

SampleData %>%
  mutate_if(is.factor, funs(forcats::fct_relabel(., tolower)))
#> # A tibble: 10 x 5
#>    location     wind_gust_dir wind_dir_9am wind_dir_3pm rain_today
#>    <fct>        <ord>         <ord>        <ord>        <fct>     
#>  1 perth        ssw           s            wsw          no        
#>  2 mountginini  w             w            nw           no        
#>  3 launceston   nw            e            nw           no        
#>  4 albany       <NA>          ese          ne           no        
#>  5 ballarat     n             nne          n            yes       
#>  6 mountgambier ssw           ssw          ssw          yes       
#>  7 canberra     e             se           sw           no        
#>  8 richmond     se            s            e            no        
#>  9 launceston   nnw           ese          nnw          no        
#> 10 richmond     nne           <NA>         nne          no

Created on 2018-11-12 by the reprex package (v0.2.1)

5 Likes

Thank you @cderv! I saw the fct_relevel in R for data science, but this one escaped me.

The tidyverse has too many great tricks--hard to keep track of them all!

  • Martin
1 Like

If your question's been answered, would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.