Convert a number range into tidy data


#1

I am working with an education data set that describes course offerings. One of the variables describes the range of grades of students who are eligible to take a specific course. For example, grades “9-12”. I want to create a tidy data set that includes a row for each possible grade: 9, 10, 11, 12. My first step was to separate the field into the HighestGrade and LowestGrade, but I’m not sure how to take the next step and create rows for grades that aren’t explicitly listed the data.

Here’s an example of what the data look like:

example <- data.frame(
  "Course.Name" = c("Intro to Underwater Basket Weaving",
                    "Intermediate Underwater Basket Weaving",
                    "Advanced Underwater Basket Weaving",
                    "Practicum: Underwater Basket Weaving"),
  "Grade.Levels" = c("7-10", "8-11", "9-12", "9-12")
)  %>%
separate(Grade.Levels, into = c("LowestGrade", "HighestGrade"), sep = "-")


#2

I see two fairly straightforward options at a glance. Both make a list column with all the grades, but I think the second option with map2 is more readable. Note that I added convert = TRUE to the separate function to avoid having to convert the grades to numbers later.

suppressPackageStartupMessages(library(tidyverse))

example <- data.frame(
  "Course.Name" = c("Intro to Underwater Basket Weaving",
                    "Intermediate Underwater Basket Weaving",
                    "Advanced Underwater Basket Weaving",
                    "Practicum: Underwater Basket Weaving"),
  "Grade.Levels" = c("7-10", "8-11", "9-12", "9-12")
)  %>%
  separate(Grade.Levels, into = c("LowestGrade", "HighestGrade"),
           sep = "-", convert = TRUE)

example %>%
  # Collect in a single column
  gather(Temp, Grade, LowestGrade, HighestGrade) %>%
  select(-Temp) %>%
  group_by(Course.Name) %>%
  # Create a list column that includes all grades between existing
  summarize(Grade = list(full_seq(Grade, 1))) %>%
  # unnest the list
  unnest()
#> # A tibble: 16 x 2
#>                               Course.Name Grade
#>                                    <fctr> <dbl>
#>  1     Advanced Underwater Basket Weaving     9
#>  2     Advanced Underwater Basket Weaving    10
#>  3     Advanced Underwater Basket Weaving    11
#>  4     Advanced Underwater Basket Weaving    12
#>  5 Intermediate Underwater Basket Weaving     8
#>  6 Intermediate Underwater Basket Weaving     9
#>  7 Intermediate Underwater Basket Weaving    10
#>  8 Intermediate Underwater Basket Weaving    11
#>  9     Intro to Underwater Basket Weaving     7
#> 10     Intro to Underwater Basket Weaving     8
#> 11     Intro to Underwater Basket Weaving     9
#> 12     Intro to Underwater Basket Weaving    10
#> 13   Practicum: Underwater Basket Weaving     9
#> 14   Practicum: Underwater Basket Weaving    10
#> 15   Practicum: Underwater Basket Weaving    11
#> 16   Practicum: Underwater Basket Weaving    12

example %>% 
  # Create a sequence between the lowest and highest grades
  mutate(Grade = map2(LowestGrade, HighestGrade,
                      ~ .x:.y)) %>%
  select(-LowestGrade, -HighestGrade) %>%
  # unnest the sequence
  unnest()
#>                               Course.Name Grade
#> 1      Intro to Underwater Basket Weaving     7
#> 2      Intro to Underwater Basket Weaving     8
#> 3      Intro to Underwater Basket Weaving     9
#> 4      Intro to Underwater Basket Weaving    10
#> 5  Intermediate Underwater Basket Weaving     8
#> 6  Intermediate Underwater Basket Weaving     9
#> 7  Intermediate Underwater Basket Weaving    10
#> 8  Intermediate Underwater Basket Weaving    11
#> 9      Advanced Underwater Basket Weaving     9
#> 10     Advanced Underwater Basket Weaving    10
#> 11     Advanced Underwater Basket Weaving    11
#> 12     Advanced Underwater Basket Weaving    12
#> 13   Practicum: Underwater Basket Weaving     9
#> 14   Practicum: Underwater Basket Weaving    10
#> 15   Practicum: Underwater Basket Weaving    11
#> 16   Practicum: Underwater Basket Weaving    12

#3

Thank you, I had come across both full_seq() and map2() in my quest for an answer but hadn’t been able to use them correctly. I’m going to employ the map2() solution to my actual data as I agree that the readability is much better and I need to force myself to learn more about purrr.