Dataset Normalization/Pivot Longer

jeremyz · February 22, 2022, 11:14pm

Hello! I am trying to modify the shape of my dataset so that the "Topic" columns get carried over into their own rows.

I want df1 to look like df2:

df1 <- data.frame (ID  = c(100, 101, 102, 103),
                   Topic1 = c("Cats", "Dogs", "Horses", "Pigs"),
                   Topic2 = c("Dogs", "Cats", "",  "Cats"),
                   Topic3 = c("Pigs", "Goats", "", "Cats"),
                   Year = c(2011, 2012, 2011, 2014)
                   )

df2 <- data.frame(ID = c(100, 100, 100,
                         101, 101, 101,
                         102, 102, 102,
                         103, 103, 103),
                  Topic = c("Cats", "Dogs", "Pigs",
                            "Dogs", "Cats", "Goats",
                            "Horses", "", "",
                            "Pigs", "Cats", "Cats"),
                  Year = c(2011, 2011, 2011,
                           2012, 2012, 2012,
                           2011, 2011, 2011,
                           2014, 2014, 2014)
                           )

I am not sure if there is a normalization function in R? This article gives an example of what I'm trying to do. Maybe the solution is a pivot_longer?

Any help appreciated!

FJCC · February 23, 2022, 12:48am

Here is how I would use pivot_longer. You can drop the name column if you want to.

library(tidyr)
df1 <- data.frame (ID  = c(100, 101, 102, 103),
                   Topic1 = c("Cats", "Dogs", "Horses", "Pigs"),
                   Topic2 = c("Dogs", "Cats", "",  "Cats"),
                   Topic3 = c("Pigs", "Goats", "", "Cats"),
                   Year = c(2011, 2012, 2011, 2014)
)

df2 <- data.frame(ID = c(100, 100, 100,
                         101, 101, 101,
                         102, 102, 102,
                         103, 103, 103),
                  Topic = c("Cats", "Dogs", "Pigs",
                            "Dogs", "Cats", "Goats",
                            "Horses", "", "",
                            "Pigs", "Cats", "Cats"),
                  Year = c(2011, 2011, 2011,
                           2012, 2012, 2012,
                           2011, 2011, 2011,
                           2014, 2014, 2014)
)
df1
#>    ID Topic1 Topic2 Topic3 Year
#> 1 100   Cats   Dogs   Pigs 2011
#> 2 101   Dogs   Cats  Goats 2012
#> 3 102 Horses               2011
#> 4 103   Pigs   Cats   Cats 2014
df2
#>     ID  Topic Year
#> 1  100   Cats 2011
#> 2  100   Dogs 2011
#> 3  100   Pigs 2011
#> 4  101   Dogs 2012
#> 5  101   Cats 2012
#> 6  101  Goats 2012
#> 7  102 Horses 2011
#> 8  102        2011
#> 9  102        2011
#> 10 103   Pigs 2014
#> 11 103   Cats 2014
#> 12 103   Cats 2014
df2_calc <- df1 |> pivot_longer(Topic1:Topic3,values_to = "Topic")
df2_calc
#> # A tibble: 12 x 4
#>       ID  Year name   Topic   
#>    <dbl> <dbl> <chr>  <chr>   
#>  1   100  2011 Topic1 "Cats"  
#>  2   100  2011 Topic2 "Dogs"  
#>  3   100  2011 Topic3 "Pigs"  
#>  4   101  2012 Topic1 "Dogs"  
#>  5   101  2012 Topic2 "Cats"  
#>  6   101  2012 Topic3 "Goats" 
#>  7   102  2011 Topic1 "Horses"
#>  8   102  2011 Topic2 ""      
#>  9   102  2011 Topic3 ""      
#> 10   103  2014 Topic1 "Pigs"  
#> 11   103  2014 Topic2 "Cats"  
#> 12   103  2014 Topic3 "Cats"

^{Created on 2022-02-22 by the reprex package (v2.0.1)}

jeremyz · February 23, 2022, 2:13pm

Thanks so much @FJCC . For future solution hunters, I was also able to get the desired result using splitstackshape's cSplit function--making the "direction" argument set to "long"

system · March 2, 2022, 2:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.