How to reverse pivot_longer ?

Andrzej · December 23, 2022, 12:46pm

Hi,

this is df and pivot_longer code:

df <- structure(list(Sex = c(1L, 2L, 1L, 2L), `Age Group` = structure(c(1L, 
1L, 2L, 2L), .Label = c("30-39", "40-49"), class = "factor"), 
    Q31 = c(1L, 3L, 4L, 2L), Q32 = c(7L, 5L, 6L, 2L), Q33 = 1:4, 
    Q34 = c(5L, 6L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-4L))

df %>% 
  pivot_longer(cols = -c(Sex, `Age Group`),
               names_to = "Question",
               values_to = "Value") %>%
  group_by(Question, Sex, `Age Group`) %>%
  summarise(`Strongly Agree` = sum(Value == 7)/n(),
            `Slightly Agree` = sum(Value == 6)/n(),
            Agree = sum(Value == 5)/n(),
            Neutral = sum(Value == 4)/n(),
            Disagree = sum(Value == 3)/n(),
            `Slightly Disagree` = sum(Value == 2)/n(),
            `Strongly Disagree` = sum(Value == 1)/n())

How to go back to df with pivot_wider or something else if possible ?

Flm · December 23, 2022, 1:03pm

Is this what you want?

df <- structure(list(Sex = c(1L, 2L, 1L, 2L), `Age Group` = structure(c(1L, 
                                                                        1L, 2L, 2L), .Label = c("30-39", "40-49"), class = "factor"), 
                     Q31 = c(1L, 3L, 4L, 2L), Q32 = c(7L, 5L, 6L, 2L), Q33 = 1:4, 
                     Q34 = c(5L, 6L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
                                                                                   -4L))

df1 <- df %>% 
  pivot_longer(cols = -c(Sex, `Age Group`),
               names_to = "Question",
               values_to = "Value") %>%
  group_by(Question, Sex, `Age Group`) %>%
  summarise(`Strongly Agree` = sum(Value == 7)/n(),
            `Slightly Agree` = sum(Value == 6)/n(),
            Agree = sum(Value == 5)/n(),
            Neutral = sum(Value == 4)/n(),
            Disagree = sum(Value == 3)/n(),
            `Slightly Disagree` = sum(Value == 2)/n(),
            `Strongly Disagree` = sum(Value == 1)/n()) 

df1 %>% 
  pivot_longer(cols = 4:10) %>% 
  pivot_wider(names_from = "Question", values_from = "value")

  Sex `Age Group` name                Q31   Q32   Q33   Q34
   <int> <fct>       <chr>             <dbl> <dbl> <dbl> <dbl>
 1     1 30-39       Strongly Agree        0     1     0     0
 2     1 30-39       Slightly Agree        0     0     0     0
 3     1 30-39       Agree                 0     0     0     1
 4     1 30-39       Neutral               0     0     0     0
 5     1 30-39       Disagree              0     0     0     0
 6     1 30-39       Slightly Disagree     0     0     0     0
 7     1 30-39       Strongly Disagree     1     0     1     0
 8     1 40-49       Strongly Agree        0     0     0     0
 9     1 40-49       Slightly Agree        0     1     0     0
10     1 40-49       Agree                 0     0     0     0
# … with 18 more rows

Andrzej · December 23, 2022, 1:11pm

Almost, almost, thank you for your help, but I want to just go back to df, which looks like that below:

obraz

Andrzej · December 24, 2022, 9:12am

What actually here is done by group_by function ?
Is it grouping by Question, Sex, Age Group ? In summarise there is ".group = " argument as well. How are they related to one another ?

FJCC · December 24, 2022, 2:27pm

Yes, the group_by() causes the calculation of summarize() to be done on the unique combinations of Question, Sex, and Age Group. The .groups argument of summarize() controls how the output of summarize is grouped. The Help of summarize explains the various options. It does not affect the calculated values of summarize, only how the tibble is grouped.

Here is a start on reversing the original pivot_longer. You have to reverse the coding from numbers to text. I ignored the division by n(). Since there is only one member in each group, it does not make a difference. If there were multiple members, you would have to store the value of n(), since it cannot be determined from the summarized data.

library(tidyverse)
df <- structure(list(Sex = c(1L, 2L, 1L, 2L), `Age Group` = structure(c(1L, 
                                                                        1L, 2L, 2L), .Label = c("30-39", "40-49"), class = "factor"), 
                     Q31 = c(1L, 3L, 4L, 2L), Q32 = c(7L, 5L, 6L, 2L), Q33 = 1:4, 
                     Q34 = c(5L, 6L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
                                                                                   -4L))

df
#>   Sex Age Group Q31 Q32 Q33 Q34
#> 1   1     30-39   1   7   1   5
#> 2   2     30-39   3   5   2   6
#> 3   1     40-49   4   6   3   2
#> 4   2     40-49   2   2   4   2
df1 <- df %>% 
  pivot_longer(cols = -c(Sex, `Age Group`),
               names_to = "Question",
               values_to = "Value") %>%
  group_by(Question, Sex, `Age Group`) %>%
  summarise(`Strongly Agree` = sum(Value == 7)/n(),
            `Slightly Agree` = sum(Value == 6)/n(),
            Agree = sum(Value == 5)/n(),
            Neutral = sum(Value == 4)/n(),
            Disagree = sum(Value == 3)/n(),
            `Slightly Disagree` = sum(Value == 2)/n(),
            `Strongly Disagree` = sum(Value == 1)/n()) 
#> `summarise()` has grouped output by 'Question', 'Sex'. You can override using
#> the `.groups` argument.

df1 |> pivot_longer(cols = -c("Question","Sex","Age Group"),
                    names_to = "Category", values_to = "Value") |> 
  filter(Value > 0) |> 
  mutate(Score = case_when(
Category == "Strongly Agree" ~ 7,
Category == "Slightly Agree" ~ 6,
Category == "Agree" ~ 5,
Category == "Neutral" ~ 4,
Category == "Disagree" ~ 3,
Category == "Slightly Disagree" ~ 2,
Category == "Strongly Disagree" ~ 1
  )) |> 
  select(-Category, -Value) |> 
  pivot_wider(names_from = "Question", values_from = "Score") |> 
  arrange(`Age Group`, Sex)
#> # A tibble: 4 × 6
#> # Groups:   Sex [2]
#>     Sex `Age Group`   Q31   Q32   Q33   Q34
#>   <int> <fct>       <dbl> <dbl> <dbl> <dbl>
#> 1     1 30-39           1     7     1     5
#> 2     2 30-39           3     5     2     6
#> 3     1 40-49           4     6     3     2
#> 4     2 40-49           2     2     4     2

^{Created on 2022-12-24 with reprex v2.0.2}

Andrzej · December 24, 2022, 3:55pm

Thank you @FJCC very much indeed,

My trivial question to understand the code would be: how do I know what does n() mean here ?
Is it a way to see it, I mean a real value ?
As help says: ?n()

n() gives the current group size.
or
number of rows (number of observations) in the summarized data

But how to get to know what is it here, for example ?

Agree = sum(Value == 5)/n()

so actually sum of Value that equals to 5 is divided by what ?

Thank you in advance for your explanation.

FJCC · December 24, 2022, 4:13pm

I added a row with a second set of entries for Sex = 2 and Age Group = 40-49. Some of the values in df1 are now 0.5 since n() is 2. The last column in df1 shows n(). Does that make sense?

library(tidyverse)
df <- structure(list(Sex = c(1L, 2L, 1L, 2L,2L), 
                     `Age Group` = structure(c(1L, 1L, 2L, 2L, 2L), 
                                             .Label = c("30-39", "40-49"), class = "factor"), 
                     Q31 = c(1L, 3L, 4L, 2L,3L), 
                     Q32 = c(7L, 5L, 6L, 2L,5L), 
                     Q33 = 1:5, 
                     Q34 = c(5L, 6L, 2L, 2L,6L)), 
                class = "data.frame", row.names = c(NA, -5L))

df
df1 <- df %>% 
  pivot_longer(cols = -c(Sex, `Age Group`),
               names_to = "Question",
               values_to = "Value") %>%
  group_by(Question, Sex, `Age Group`) %>%
  summarise(`Strongly Agree` = sum(Value == 7)/n(),
            `Slightly Agree` = sum(Value == 6)/n(),
            Agree = sum(Value == 5)/n(),
            Neutral = sum(Value == 4)/n(),
            Disagree = sum(Value == 3)/n(),
            `Slightly Disagree` = sum(Value == 2)/n(),
            `Strongly Disagree` = sum(Value == 1)/n(),
            N = n())

Andrzej · December 24, 2022, 4:29pm

Yes, it does beautifuly, thank you. This is very didactic.

In the meantime I have tried myself and come up with this:

df %>%
  pivot_longer(cols = -c(Sex, `Age Group`),
               names_to = "Question",
               values_to = "Value") %>%
  group_by(Question, Sex, `Age Group`) %>% count()

I hope that's the same and correct. Obviously the pipe road must stop at count() because
it replaces "Value" with "n" column and errors down the road.
Once again thank you for your time and effort put in helping me.

According to ".group =" argument in summarise, there was no need to specify it over there ? I would like to learn when should I use it and when is it necessary to define it.

system · December 31, 2022, 4:29pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.