Transposing a selected columns in a table

Hi All,
here is my dataframe:

My_OLR_dataframe <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20), Group = c(1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2), Question_1 = c(4, 
2, 3, 4, 1, 1, 2, 4, 4, 3, 2, 4, 1, 2, 3, 5, 3, 4, 2, 1), Question_2 = c(2, 
4, 4, 1, 3, 2, 2, 4, 1, 3, 2, 5, 5, 3, 4, 1, 1, 2, 3, 2), Question_3 = c(2, 
3, 1, 5, 5, 1, 1, 2, 5, 2, 1, 1, 4, 2, 5, 1, 3, 3, 4, 4), Question_4 = c(2, 
1, 3, 3, 5, 4, 5, 5, 1, 3, 4, 1, 3, 5, 2, 4, 3, 1, 2, 1), Question_5 = c(4, 
3, 4, 1, 4, 4, 3, 2, 4, 1, 1, 1, 4, 4, 3, 3, 3, 4, 1, 3), Question_6 = c(5, 
2, 1, 4, 1, 2, 2, 2, 1, 2, 1, 4, 2, 4, 3, 4, 5, 3, 5, 1), Question_7 = c(1, 
5, 4, 2, 1, 4, 2, 4, 3, 2, 2, 3, 3, 4, 4, 3, 4, 4, 4, 1), Question_8 = c(4, 
1, 1, 1, 1, 4, 2, 1, 5, 2, 3, 4, 3, 1, 2, 2, 5, 4, 1, 2), Question_9 = c(2, 
3, 1, 4, 4, 3, 5, 4, 1, 3, 2, 1, 5, 3, 3, 4, 1, 1, 1, 5), Question_10 = c(1, 
2, 4, 4, 4, 3, 1, 2, 3, 1, 3, 2, 2, 2, 3, 3, 3, 5, 3, 3), Question_11 = c(3, 
1, 3, 3, 5, 4, 2, 4, 1, 4, 5, 4, 1, 2, 2, 3, 2, 1, 2, 3), Question_12 = c(1, 
1, 1, 5, 5, 3, 3, 3, 2, 3, 4, 4, 4, 1, 3, 1, 1, 2, 2, 2), Question_13 = c(2, 
5, 5, 1, 1, 3, 1, 4, 5, 1, 2, 4, 1, 3, 2, 5, 2, 5, 5, 5), Question_14 = c(4, 
1, 4, 1, 3, 1, 5, 4, 3, 1, 3, 4, 1, 4, 5, 3, 1, 4, 2, 1), Question_15 = c(2, 
4, 1, 3, 1, 2, 2, 3, 3, 1, 5, 2, 3, 5, 5, 2, 2, 1, 2, 3)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

How can I change a layout of my dataframe to look like this:

I tried with this:

library(reshape2)
melt(My_OLR_dataframe, id.vars = c("ID", "Group"))

but this is not giving me what I want. So basically I want a values from my first dataframe to become a column names and obviously to have all frequency (sum of occurences of indyvidual Likert responses) in a cells in my new dataframe. I hope this make sense. I feel that I should use mutate() and case_when() function somehow.
Thank you for your help.

I believe the following reprex should produce the results you're expecting. First, I load the requisite libraries. Then I define the My_OLR_dataframe.

# load libraries
library(tidyverse)

# define data
My_OLR_dataframe <- structure(
  list(
    ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
           13, 14, 15, 16, 17, 18, 19, 20),
    Group = c(1, 1, 1, 1, 1, 1,
              1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2),
    Question_1 = c(4,
                   2, 3, 4, 1, 1, 2, 4, 4, 3, 2, 4, 1, 2, 3, 5, 3, 4, 2, 1),
    Question_2 = c(2,
                   4, 4, 1, 3, 2, 2, 4, 1, 3, 2, 5, 5, 3, 4, 1, 1, 2, 3, 2),
    Question_3 = c(2,
                   3, 1, 5, 5, 1, 1, 2, 5, 2, 1, 1, 4, 2, 5, 1, 3, 3, 4, 4),
    Question_4 = c(2,
                   1, 3, 3, 5, 4, 5, 5, 1, 3, 4, 1, 3, 5, 2, 4, 3, 1, 2, 1),
    Question_5 = c(4,
                   3, 4, 1, 4, 4, 3, 2, 4, 1, 1, 1, 4, 4, 3, 3, 3, 4, 1, 3),
    Question_6 = c(5,
                   2, 1, 4, 1, 2, 2, 2, 1, 2, 1, 4, 2, 4, 3, 4, 5, 3, 5, 1),
    Question_7 = c(1,
                   5, 4, 2, 1, 4, 2, 4, 3, 2, 2, 3, 3, 4, 4, 3, 4, 4, 4, 1),
    Question_8 = c(4,
                   1, 1, 1, 1, 4, 2, 1, 5, 2, 3, 4, 3, 1, 2, 2, 5, 4, 1, 2),
    Question_9 = c(2,
                   3, 1, 4, 4, 3, 5, 4, 1, 3, 2, 1, 5, 3, 3, 4, 1, 1, 1, 5),
    Question_10 = c(1,
                    2, 4, 4, 4, 3, 1, 2, 3, 1, 3, 2, 2, 2, 3, 3, 3, 5, 3, 3),
    Question_11 = c(3,
                    1, 3, 3, 5, 4, 2, 4, 1, 4, 5, 4, 1, 2, 2, 3, 2, 1, 2, 3),
    Question_12 = c(1,
                    1, 1, 5, 5, 3, 3, 3, 2, 3, 4, 4, 4, 1, 3, 1, 1, 2, 2, 2),
    Question_13 = c(2,
                    5, 5, 1, 1, 3, 1, 4, 5, 1, 2, 4, 1, 3, 2, 5, 2, 5, 5, 5),
    Question_14 = c(4,
                    1, 4, 1, 3, 1, 5, 4, 3, 1, 3, 4, 1, 4, 5, 3, 1, 4, 2, 1),
    Question_15 = c(2,
                    4, 1, 3, 1, 2, 2, 3, 3, 1, 5, 2, 3, 5, 5, 2, 2, 1, 2, 3)
  ),
  row.names = c(NA,-20L),
  class = c("tbl_df", "tbl", "data.frame")
)

The main nuts and bolts of the subsequent routine are as follows :

  1. Take My_OLR_dataframe. Then
  2. use pivot_longer to transform the data into long format by specifying that all of the columns starting with the prefix Question_ should be changed from wide to long. Then
  3. sort the data by the value column automatically created during the pivot_longer step. This column contains the actual question scores. Then
  4. Use a mutate call to change to contents of the value column. Specifically, we use str_replace_all to change the number values to your preferred character representation (i.e. the column names you ultimately want). Lastly, we convert the resultant column to a factor using forcats::as_factor. (The reason I use this function specifically is because it allows one to create factor levels based on the current sort-order of the data. By contrast, base::as.factor will always create levels based on the alphabetical sort-order of the input variable.) Then
  5. Perform a count by the relevant groupings. Then
  6. Reshape the data to wide format again using pivot_wider, specifying that any NA values in the newly created columns should be replaced by zeros.

(if you're interested in better understanding the process, it may be helpful to see what the data looks like after each distinct step)

My_OLR_dataframe %>%
  pivot_longer(cols = starts_with('Question_')) %>%
  arrange(value) %>%
  mutate(value = value %>%
           str_replace_all(
             c(
               '1' = '1=very bad',
               '2' = '2=bad',
               '3' = '3=neutral',
               '4' = '4=good',
               '5' = '5=very good'
             )
           ) %>%
           forcats::as_factor()) %>% 
  count(name, Group, value) %>% 
  pivot_wider(names_from = value, values_from = n, values_fill = list(n = 0))
#> # A tibble: 30 x 7
#>    name        Group `1=very bad` `2=bad` `3=neutral` `4=good` `5=very good`
#>    <chr>       <dbl>        <int>   <int>       <int>    <int>         <int>
#>  1 Question_1      1            2       3           2        4             0
#>  2 Question_1      2            2       2           2        2             1
#>  3 Question_10     1            3       2           3        3             0
#>  4 Question_10     2            0       3           5        0             1
#>  5 Question_11     1            2       1           3        3             2
#>  6 Question_11     2            2       4           2        1             0
#>  7 Question_12     1            3       1           4        1             2
#>  8 Question_12     2            3       3           1        2             0
#>  9 Question_13     1            4       2           1        1             3
#> 10 Question_13     2            1       2           1        1             4
#> # … with 20 more rows
1 Like

Hi, it looks like your dataframes got a little jumbled in the pasting. Here they are separated

My_OLR_dataframe <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20), Group = c(1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2), Question_1 = c(4, 
2, 3, 4, 1, 1, 2, 4, 4, 3, 2, 4, 1, 2, 3, 5, 3, 4, 2, 1), Question_2 = c(2, 
4, 4, 1, 3, 2, 2, 4, 1, 3, 2, 5, 5, 3, 4, 1, 1, 2, 3, 2), Question_3 = c(2, 
3, 1, 5, 5, 1, 1, 2, 5, 2, 1, 1, 4, 2, 5, 1, 3, 3, 4, 4), Question_4 = c(2, 
1, 3, 3, 5, 4, 5, 5, 1, 3, 4, 1, 3, 5, 2, 4, 3, 1, 2, 1), Question_5 = c(4, 
3, 4, 1, 4, 4, 3, 2, 4, 1, 1, 1, 4, 4, 3, 3, 3, 4, 1, 3), Question_6 = c(5, 
2, 1, 4, 1, 2, 2, 2, 1, 2, 1, 4, 2, 4, 3, 4, 5, 3, 5, 1), Question_7 = c(1, 
5, 4, 2, 1, 4, 2, 4, 3, 2, 2, 3, 3, 4, 4, 3, 4, 4, 4, 1), Question_8 = c(4, 
1, 1, 1, 1, 4, 2, 1, 5, 2, 3, 4, 3, 1, 2, 2, 5, 4, 1, 2), Question_9 = c(2, 
3, 1, 4, 4, 3, 5, 4, 1, 3, 2, 1, 5, 3, 3, 4, 1, 1, 1, 5), Question_10 = c(1, 
2, 4, 4, 4, 3, 1, 2, 3, 1, 3, 2, 2, 2, 3, 3, 3, 5, 3, 3), Question_11 = c(3, 
1, 3, 3, 5, 4, 2, 4, 1, 4, 5, 4, 1, 2, 2, 3, 2, 1, 2, 3), Question_12 = c(1, 
1, 1, 5, 5, 3, 3, 3, 2, 3, 4, 4, 4, 1, 3, 1, 1, 2, 2, 2), Question_13 = c(2, 
5, 5, 1, 1, 3, 1, 4, 5, 1, 2, 4, 1, 3, 2, 5, 2, 5, 5, 5), Question_14 = c(4, 
1, 4, 1, 3, 1, 5, 4, 3, 1, 3, 4, 1, 4, 5, 3, 1, 4, 2, 1), Question_15 = c(2, 
4, 1, 3, 1, 2, 2, 3, 3, 1, 5, 2, 3, 5, 5, 2, 2, 1, 2, 3)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

It looks like aaa was identical to My_OLR_dataframe so I've omitted it.

str(My_OLR_dataframe)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    20 obs. of  17 variables:
#>  $ ID         : num  1 2 3 4 5 6 7 8 9 10 ...
#>  $ Group      : num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ Question_1 : num  4 2 3 4 1 1 2 4 4 3 ...
#>  $ Question_2 : num  2 4 4 1 3 2 2 4 1 3 ...
#>  $ Question_3 : num  2 3 1 5 5 1 1 2 5 2 ...
#>  $ Question_4 : num  2 1 3 3 5 4 5 5 1 3 ...
#>  $ Question_5 : num  4 3 4 1 4 4 3 2 4 1 ...
#>  $ Question_6 : num  5 2 1 4 1 2 2 2 1 2 ...
#>  $ Question_7 : num  1 5 4 2 1 4 2 4 3 2 ...
#>  $ Question_8 : num  4 1 1 1 1 4 2 1 5 2 ...
#>  $ Question_9 : num  2 3 1 4 4 3 5 4 1 3 ...
#>  $ Question_10: num  1 2 4 4 4 3 1 2 3 1 ...
#>  $ Question_11: num  3 1 3 3 5 4 2 4 1 4 ...
#>  $ Question_12: num  1 1 1 5 5 3 3 3 2 3 ...
#>  $ Question_13: num  2 5 5 1 1 3 1 4 5 1 ...
#>  $ Question_14: num  4 1 4 1 3 1 5 4 3 1 ...
#>  $ Question_15: num  2 4 1 3 1 2 2 3 3 1 ...

Now the next step requires some careful thought about how you want to use the new object. The layout is a poor choice for further analysis because it it untidy (the concept that a data frame should consist of a single observation per row of one or more variables, which is what My_OLR_dataframe` is now.

On the other hand, if you want to produce a report like the screenshot, what I would suggest is to divide it into four sections, one for each group.

To get there

# OMITTED My_OLR_dataframe <- structure ...
suppressPackageStartupMessages(library(dplyr)) 
grp1 <- My_OLR_dataframe %>% filter(Group == 1) %>% select(-ID,-Group)%>% t() %>% as.data.frame()
grp1
#>             V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
#> Question_1   4  2  3  4  1  1  2  4  4   3   2
#> Question_2   2  4  4  1  3  2  2  4  1   3   2
#> Question_3   2  3  1  5  5  1  1  2  5   2   1
#> Question_4   2  1  3  3  5  4  5  5  1   3   4
#> Question_5   4  3  4  1  4  4  3  2  4   1   1
#> Question_6   5  2  1  4  1  2  2  2  1   2   1
#> Question_7   1  5  4  2  1  4  2  4  3   2   2
#> Question_8   4  1  1  1  1  4  2  1  5   2   3
#> Question_9   2  3  1  4  4  3  5  4  1   3   2
#> Question_10  1  2  4  4  4  3  1  2  3   1   3
#> Question_11  3  1  3  3  5  4  2  4  1   4   5
#> Question_12  1  1  1  5  5  3  3  3  2   3   4
#> Question_13  2  5  5  1  1  3  1  4  5   1   2
#> Question_14  4  1  4  1  3  1  5  4  3   1   3
#> Question_15  2  4  1  3  1  2  2  3  3   1   5

Created on 2020-03-18 by the reprex package (v0.3.0)

Of course, you can replace with column names with

colnames(grp1) <- c("Some","Data","Shown","Here"...)

It's not clear what the summary represents, and grp1 has 11 columns, which are presumably the responses. The screenshot has only the 5. Are these the counts of each category (1...5) of response? If so, work should be done back in My_OLR_dataframe with group_by and count to do the summary first.

1 Like

Thank you @hendrikvanb and @technocrat for your reply and quick solutions.
So it turned out that it wouldn't be case_when ?
And in @technocrat answer ID column became V1 to V11 variables (not responses).
So to change columns names I did this:

colnames(grp1) <- c("ID_1","ID_2","ID_3","ID_4", "ID_5","ID_6","ID_7","ID_8","ID_9", "ID_10", "ID_11")

and not it looks like that:

I don't know it is it a faster way (automatic sort of) to change a column names in grp1 in order not to write it down in a character vector as I did above ?

You may have heard of R's lazy evaluation. Taking inspiration, then

tag <- "ID_"
(paste(tag,seq(1:11), sep = "")) # assign to header
#>  [1] "ID_1"  "ID_2"  "ID_3"  "ID_4"  "ID_5"  "ID_6"  "ID_7"  "ID_8"  "ID_9" 
#> [10] "ID_10" "ID_11"
# colnames(df) <- header

Created on 2020-03-18 by the reprex package (v0.3.0)

Thank you @technocrat,

I am learning what lazy evaluation is and NSE as well. It gives me headaches as topics are very difficult.
For the sake of completeness, would it be possible to transpose My_OLR_dataframe using reshape2 package as @hendrikvanb did it with tidyr ? To me reshape2 is slightly easier to use.
Once again thank you to both of you.

Hi @hendrikvanb,

Thank you for the hint, I will use this:

GitHub - daranzolin/ViewPipeSteps: Create tabs of View() output for each chained pipe](GitHub - daranzolin/ViewPipeSteps: Create tabs of View() output for each chained pipe)

1 Like

tidyr::pivot_xxx() is probably where to start

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.