Gathering wide columns into multiple long columns using pivot_longer

I am trying to use pivot_longer to gather wide multiple columns with into multiple long columns rather than column by column. The dataset is wide and I want to transform it into a dataset that has some long features but not all columns have long features - refer to Garret's video - Tidy Data and tidyr -- Pt 2 Intro to Data Wrangling with R and the Tidyverse where he talks about the pollution dataset as not being completely rectangular.

Here is the input data from a toy dataset:

df1 <- tribble(
  ~"np_id", ~"np_city_size", ~"cc_hf_1", ~"cc_hf_2", ~"cc_hf_3", ~"cc_hf_4", ~"cc_hf_5", ~"cc_hf_6", ~"cc_ac_1", ~"cc_ac_2", ~"cc_ac_3", ~"cc_ac_4", ~"cc_ac_5", ~"cc_ac_6",
  "81", "village", NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA,
  "82", "village", 1L, NA, NA, NA, 1L, NA, NA, NA, NA, 1L, NA, NA,
  "83", "more than 500k inhabitants", NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
  "85", "more than 500k inhabitants", NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
  "87", "more than 500k inhabitants", NA, 1L, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA,
  "89", "village", 1L, NA, NA, 1L, NA, NA, 1L, NA, NA, NA, NA, NA,
  "90", "village", 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA,
  "91", "village", 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA,
  "92", "village", NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, 1L
)

For example, the columns hf_1, hf_2, hf_3, hf_4, hf_5, hf_6 need to be pivoted into 2 columns (hf_com - this column with values 1,2,3,4,5,6 from wide hf columns) and (hf_com_freq - this column with value 1 or NA).

The same needs to occur for the columns ac_1, ac_2, ac_3, ac_4, ac_5, ac_6. These columns need to be pivoted into 2 columns (ac_com - this column with values 1,2,3,4,5,6 from wide ac columns) and (ac_com_freq - this column with value 1 or NA).

I have tried looking at the example from who dataset in:

https://tidyr.tidyverse.org/articles/pivot.html

but I can't get the values into the multiple longer columns that I need.

This is the code I have at the moment:

df_longer <- df1 %>% pivot_longer(
  cols = -(starts_with("np_")),
  names_to = c("hf_com", "ac_com"), 
  names_pattern = "cc_?(.*)_(.*)",
  values_to = c("hf_com_freq", "ac_com_freq")
)

However, I know that I need to extract the last character in the column heading (eg. 1 from hf_1 2 from hf_2 ) and pass it as the .value to each column but I having problems using regular expressions and pivot_longer parameters such as names_patterns to solve this. I feel that I am very close to the solution but can't see the forest for the trees!!!!

I have seen examples with gather and spread that convert the data to long and then partially back to wide (e.g https://stackoverflow.com/questions/25925556/gather-multiple-sets-of-columns?noredirect=1&lq=1).
My questions are: how to do this with pivot_longer and pivot_longer_spec? Or do I have to a pivot_long first and then pivot_wide to mimic gather and spread? Or should I create a manual spec?
Here are the actual results with my code:

df_longer <- structure(list(np_id = c("81", "81", "81", "81", "81", "81"), 
    np_city_size = c("village", "village", "village", "village", 
    "village", "village"), hf_com = c("hf", "hf", "hf", "hf", 
    "hf", "hf"), ac_com = c("1", "2", "3", "4", "5", "6"), hf_com_freq = c(NA, 
    NA, 1L, NA, NA, NA), ac_com_freq = c(NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Here are the expected results:

df_longer <- structure(list(np_id = c("81", "81", "81", "81", "81", "81"), 
    np_city_size = c("village", "village", "village", "village", 
    "village", "village"), hf_com = c("1", "2", "3", "4", 
    "5", "6"), ac_com = c("1", "2", "3", "4", "5", "6"), hf_com_freq = c(NA, 
    NA, 1L, NA, NA, NA), ac_com_freq = c(NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

It's not clear to me how the expected output relates to the input. Is it necessary that the hf value match the ac value? The output's rows all show matching values, but the input often has values in hf columns that are different from the ac columns.

Does this sort of solution with separate and a combination of pivot_longer and pivot_wider help?

df1 %>%
  pivot_longer(cols = -(starts_with("np_"))) %>%
  separate(name, into = c("cc", "type", "num"), sep = "_") %>% 
  select(-cc) %>%
  pivot_wider(names_from = type, values_from = value)
# A tibble: 54 x 5
   np_id np_city_size num      hf    ac
   <chr> <chr>        <chr> <int> <int>
 1 81    village      1        NA    NA
 2 81    village      2        NA    NA
 3 81    village      3         1    NA
 4 81    village      4        NA    NA
 5 81    village      5        NA    NA
 6 81    village      6        NA    NA
 7 82    village      1         1    NA
 8 82    village      2        NA    NA
 9 82    village      3        NA    NA
10 82    village      4        NA     1
# ... with 44 more rows
1 Like

Hello,

Thank you for the reply. No, the hf values do not have to match the ac values. However, the values for hf_1, hf_2, hf_3, etc need to be pivoted into the hf_com column and not as a separate number column as in your results.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.