Changing DF structure

Hady · May 31, 2020, 11:55am

Hello dears,
I know that this is an old issue that already solved a lot with your codes , i searched a lot to find how to do it but couldn't find the appropriate code
i need to reshape a data frame transfering some columns into rows and adding a flag to then as follows in the following simple code

Product = c("abc","def","ghi","xyz","SHL","Hdy")
Orders_New_Customer = c(2,3,5,4,7,6)
Value_New_Customer = c(65,45,78,65,48,54)
Orders_Old_Customer = c(4,5,8,78,45,47)
Value_Old_Customer = c(85,98,47,92,48,66)
DF = data.frame(Product,Orders_New_Customer,Value_New_Customer,Orders_Old_Customer,Value_Old_Customer)
View(DF)
##Needed to be as follows
Product = c("abc","def","ghi","xyz","SHL","Hdy")
Orders = c(2,3,5,4,7,6)
Values = c(65,45,78,65,48,54)
Customer_Type = c("New","New","New","New","New","New")
DF_New = data.frame(Product,Orders,Values,Customer_Type)

Product = c("abc","def","ghi","xyz","SHL","Hdy")
Orders = c(4,5,8,78,45,47)
Values = c(85,98,47,92,48,66)
Customer_Type = c("Old","Old","Old","Old","Old","Old")
DF_Old = data.frame(Product,Orders,Values,Customer_Type)
Total_DF = rbind(DF_New,DF_Old)
View(Total_DF

How can i do that especially that the desired dataframe size is very big and there are many dimensions like customer type i will add them.
Thanks in advance

FJCC · May 31, 2020, 2:29pm

I can only do this with a two step process. Maybe someone else can do better.

library(tidyr)
library(dplyr)

Product = c("abc","def","ghi","xyz","SHL","Hdy")
Orders_New_Customer = c(2,3,5,4,7,6)
Value_New_Customer = c(65,45,78,65,48,54)
Orders_Old_Customer = c(4,5,8,78,45,47)
Value_Old_Customer = c(85,98,47,92,48,66)
DF = data.frame(Product,Orders_New_Customer,Value_New_Customer,Orders_Old_Customer,Value_Old_Customer)

STEP1 <- pivot_longer(DF, cols = Orders_New_Customer:Value_Old_Customer, 
             names_to = c("ValueType", "Customer_Type"), 
             names_pattern = "(^[^_]+)_(.+)", values_to = "Values")
STEP1
#> # A tibble: 24 x 4
#>    Product ValueType Customer_Type Values
#>    <fct>   <chr>     <chr>          <dbl>
#>  1 abc     Orders    New_Customer       2
#>  2 abc     Value     New_Customer      65
#>  3 abc     Orders    Old_Customer       4
#>  4 abc     Value     Old_Customer      85
#>  5 def     Orders    New_Customer       3
#>  6 def     Value     New_Customer      45
#>  7 def     Orders    Old_Customer       5
#>  8 def     Value     Old_Customer      98
#>  9 ghi     Orders    New_Customer       5
#> 10 ghi     Value     New_Customer      78
#> # ... with 14 more rows
STEP2 <- pivot_wider(STEP1, names_from = "ValueType", values_from = "Values") %>% 
  arrange(Customer_Type, Product) %>% 
  mutate(Customer_Type = stringr::str_extract(Customer_Type, "^[^_]+"))
STEP2
#> # A tibble: 12 x 4
#>    Product Customer_Type Orders Value
#>    <fct>   <chr>          <dbl> <dbl>
#>  1 abc     New                2    65
#>  2 def     New                3    45
#>  3 ghi     New                5    78
#>  4 Hdy     New                6    54
#>  5 SHL     New                7    48
#>  6 xyz     New                4    65
#>  7 abc     Old                4    85
#>  8 def     Old                5    98
#>  9 ghi     Old                8    47
#> 10 Hdy     Old               47    66
#> 11 SHL     Old               45    48
#> 12 xyz     Old               78    92

^{Created on 2020-05-31 by the reprex package (v0.3.0)}

andresrcs · May 31, 2020, 3:29pm

A little shorter solution

library(tidyverse)

DF <- data.frame(
    stringsAsFactors = FALSE,
    Product = c("abc", "def", "ghi", "xyz", "SHL", "Hdy"),
    Orders_New_Customer = c(2, 3, 5, 4, 7, 6),
    Value_New_Customer = c(65, 45, 78, 65, 48, 54),
    Orders_Old_Customer = c(4, 5, 8, 78, 45, 47),
    Value_Old_Customer = c(85, 98, 47, 92, 48, 66)
)

DF %>%
    pivot_longer(-Product,
                 names_to = c(".value", "Customer_Type"),
                 names_pattern = "(^[OV].+)_(.+)_"
    ) %>% 
    arrange(Customer_Type)
#> # A tibble: 12 x 4
#>    Product Customer_Type Orders Value
#>    <chr>   <chr>          <dbl> <dbl>
#>  1 abc     New                2    65
#>  2 def     New                3    45
#>  3 ghi     New                5    78
#>  4 xyz     New                4    65
#>  5 SHL     New                7    48
#>  6 Hdy     New                6    54
#>  7 abc     Old                4    85
#>  8 def     Old                5    98
#>  9 ghi     Old                8    47
#> 10 xyz     Old               78    92
#> 11 SHL     Old               45    48
#> 12 Hdy     Old               47    66

^{Created on 2020-05-31 by the reprex package (v0.3.0)}

Hady · June 1, 2020, 11:26am

thanks for your support
it is very useful

Hady · June 1, 2020, 11:27am

very awesome , thanks for the support @andresrcs

Hady · June 2, 2020, 8:41pm

@andresrcs
i'm trying to understand the symbols of names_pattern argument

andresrcs · June 3, 2020, 1:00am

Those symbols are a "Regular Expression" that describes a text pattern, in this case, only the patterns contained within parentheses are going to be kept.

^(.+)_(.+)_
^ Start of the string
.+ Any character one or more times
_ A literal underscore

Regular Expressions are not R specific, they are common to many programming languages, so you can simply google for that term and get a lot of information about it. Although, stringr uses a specific implementation (different than base R).

system · June 10, 2020, 1:00am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.