pivot_longer to multiple rows

floris508 · December 29, 2020, 3:12pm

My real dataset is larger, but I tried to shorten it here. What I have:

Income_100m_total_df <- data.frame(Name = c("Amstelpark", 
                                        "Amstelpark",
                                        "Amstelpark",
                                        "Amstelpark",
                                        "Amstelpark",
                                        "Amstelpark",
                                        "Amstelpark",
                                        "Amstelpark",
                                        "Amstelpark"),
            Income_type_households = c("Mean_low_income_households",
                             "Mean_low_income_households", 
                             "Mean_low_income_households",
                             "Mean_middle_income_households",
                             "Mean_middle_income_households",
                             "Mean_middle_income_households",
                             "Mean_high_income_households",
                             "Mean_high_income_households",
                             "Mean_high_incom_households"),
             Income_households = c(42.333, 42.333, 42.333, 32, 32, 32, 
                                   25.667, 25.667, 25.667),
             Income_type_persons = c("Mean_low_income_persons",
                                     "Mean_middle_income_persons", 
                                     "Mean_high_income_persons",
                                     "Mean_low_income_persons",
                                     "Mean_middle_income_persons",
                                     "Mean_high_income_persons",
                                     "Mean_low_income_persons",
                                     "Mean_middle_income_persons",
                                     "Mean_high_income_persons"),
            Income_households = c(33, 32, 35, 33, 32, 35, 33, 32, 35))

What I want

Income_100m_total_df_2 <- data.frame(Name = c("Amstelpark", 
                                        "Amstelpark",
                                        "Amstelpark"),
            Income_type_households = c("Mean_low_income_households",
                                       "Mean_middle_income_households",
                                       "Mean_high_incom_households"),
            Income_households = c(42.333, 32, 25.667),
            Income_type_persons = c("Mean_low_income_persons",
                                    "Mean_middle_income_persons", 
                                    "Mean_high_income_persons"),
             Income_households = c(33, 32, 35))

I tried to use pivot_longer to achieve this

pivot_longer(c(Mean_low_income_household,
             Mean_middle_income_household,
             Mean_high_income_household),
           names_to = "Income_type_households", 
           values_to = "Income_households") %>% 
pivot_longer(c(Mean_low_income_persons, 
             Mean_middle_income_persons, 
             Mean_high_income_persons), 
           names_to = "Income_type_persons", values_to = "Income_persons")

However, it only seems to apply to persons and not household, how do I achieve this?

sgenter · December 29, 2020, 6:23pm

Hi. I think you want pivot_wider. Does this get you in the neighborhood?

library(tidyverse)

Income_100m_total_df %>% 
  pivot_wider(names_from = "Income_type_households",
              values_from = "Income_households")

floris508 · December 29, 2020, 7:43pm

Hi sgenter unfortunately this didn't quite got me what I wanted

sgenter · December 29, 2020, 8:11pm

Here's another workaround that produces a table that looks like what you want. I'm not sure if it will be practical with your actual dataset though. I found a small typo in your reprex, so I've corrected it and repasted it below along with my code.

library(tidyverse)

Income_100m_total_df <- data.frame(Name = c("Amstelpark", 
                                            "Amstelpark",
                                            "Amstelpark",
                                            "Amstelpark",
                                            "Amstelpark",
                                            "Amstelpark",
                                            "Amstelpark",
                                            "Amstelpark",
                                            "Amstelpark"),
                                   Income_type_households = c("Mean_low_income_households",
                                                              "Mean_low_income_households", 
                                                              "Mean_low_income_households",
                                                              "Mean_middle_income_households",
                                                              "Mean_middle_income_households",
                                                              "Mean_middle_income_households",
                                                              "Mean_high_income_households",
                                                              "Mean_high_income_households",
                                                              "Mean_high_income_households"),
                                   Income_households = c(42.333, 42.333, 42.333, 32, 32, 32, 
                                                         25.667, 25.667, 25.667),
                                   Income_type_persons = c("Mean_low_income_persons",
                                                           "Mean_middle_income_persons", 
                                                           "Mean_high_income_persons",
                                                           "Mean_low_income_persons",
                                                           "Mean_middle_income_persons",
                                                           "Mean_high_income_persons",
                                                           "Mean_low_income_persons",
                                                           "Mean_middle_income_persons",
                                                           "Mean_high_income_persons"),
                                   Income_households = c(33, 32, 35, 33, 32, 35, 33, 32, 35))



inc_house <- Income_100m_total_df %>% 
  group_by(Income_type_households) %>% 
  summarize(mean = mean(Income_households))

inc_person <- Income_100m_total_df %>% 
  group_by(Income_type_persons) %>% 
  summarize(mean = mean(Income_households.1))

inc_both <- bind_cols(inc_house, inc_person)

floris508 · December 31, 2020, 11:08am

Hi sgenter I used your code and it got what I wanted! Thanks very much!
However, in the final data frame (inc_both) I do not have the name of the park, do you have a way to work around this?
In the real dataset I have 20 parks for which I have data so it is important that the Name is still included in the final data frame. Also, is there a way to include all your lines of code in one piece (by using the pipe?) since I already have around 300 lines of code I am trying to keep things as minimal as possible.

Thanks very much for your help and looking forward to your reply!

sgenter · December 31, 2020, 5:53pm

Hi, you should be able to add "Name" to the group_by function as in the code below and it will keep the park names. Of course, doing so will calculate household and person level income for all 20 parks, so the resulting dataframe will be 60 rows instead of 3. As an aside, if you want to keep only the first Name column, you can use select(!Name...4) to remove the second Name column and clean up the dataframe's appearance.

I understand the problem of having a long script, but hopefully adding 7 more lines will be okay. I think the pipe operator is used while working within a particular tibble or dataframe, not across them. In my solution, I've created two tibbles to calculate the means of households on the one-hand and persons on the other, then combined them into a third. I used the pipe to help create the separate dataframes, but I'm not sure of how to use it across them. There may be a way, but I don't know how. Maybe someone else can help make it shorter?

inc_house <- Income_100m_total_df %>% 
  group_by(Name, Income_type_households) %>% 
  summarize(mean = mean(Income_households))

inc_person <- Income_100m_total_df %>% 
  group_by(Name, Income_type_persons) %>% 
  summarize(mean = mean(Income_households.1))

inc_both <- bind_cols(inc_house, inc_person)

inc_both

system · January 7, 2021, 5:53pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.