Restructure data layout (partial record rows into column)

JStanley1989 · September 6, 2020, 2:48am

I have a weird view I'm trying to create. The reason of doing this is an old system we use needs the input to look this way, but the data comes in a different way. Here it is:

Data comes in that looks like this. Very normal. 1 Record (row) shows all the attributes for same record. See As-Is in picture

The view I need to make it in splits part of the record into a column. See To-Be in picture.

Basically the part that splits needs to be transposed into 1 column. Example, the top record has 1, 2, 3, 4, 5 going across. It would need to be one column and go all the way down until complete. Then the next record would repeat. I couldn't find (or didn't know the true question to ask) an answer online that went this certain direction. It's somewhat unique. Any help would be awesome!

EDIT: I'm guessing something in the tidyverse library could do this could do this, but I'm open to anything!

FJCC · September 6, 2020, 3:35am

Here is a rather inelegant solution. I used NA for all of the values that you show as empty in your image.

library(tidyr)
library(dplyr, warn.conflicts = FALSE)

DF <- data.frame(Column1 = c("A", "E"),
                 Column2 = c("B", "F"),
                 Column3 = c("C", "G"),
                 Column4 = c("D", "H"),
                 Column5 = c(1, 6),
                 Column6 = c(2, 7),
                 Column7 = c(3, 8),
                 Column8 = c(4, 9),
                 Column9 = c(5, 10))
DF
#>   Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
#> 1       A       B       C       D       1       2       3       4       5
#> 2       E       F       G       H       6       7       8       9      10
DF$Dummy <- NA
DF <- DF %>% select(Column1:Column4, Dummy, Column5:Column9) %>% 
  pivot_longer(Dummy:Column9) %>% 
  select(-name)
DF
#> # A tibble: 12 x 5
#>    Column1 Column2 Column3 Column4 value
#>    <chr>   <chr>   <chr>   <chr>   <dbl>
#>  1 A       B       C       D          NA
#>  2 A       B       C       D           1
#>  3 A       B       C       D           2
#>  4 A       B       C       D           3
#>  5 A       B       C       D           4
#>  6 A       B       C       D           5
#>  7 E       F       G       H          NA
#>  8 E       F       G       H           6
#>  9 E       F       G       H           7
#> 10 E       F       G       H           8
#> 11 E       F       G       H           9
#> 12 E       F       G       H          10
DF <- DF %>% mutate(Column1 = ifelse(is.na(value), Column1, NA),
              Column2 = ifelse(is.na(value), Column2, NA),
              Column3 = ifelse(is.na(value), Column3, NA),
              Column4 = ifelse(is.na(value), Column4, NA))
DF
#> # A tibble: 12 x 5
#>    Column1 Column2 Column3 Column4 value
#>    <chr>   <chr>   <chr>   <chr>   <dbl>
#>  1 A       B       C       D          NA
#>  2 <NA>    <NA>    <NA>    <NA>        1
#>  3 <NA>    <NA>    <NA>    <NA>        2
#>  4 <NA>    <NA>    <NA>    <NA>        3
#>  5 <NA>    <NA>    <NA>    <NA>        4
#>  6 <NA>    <NA>    <NA>    <NA>        5
#>  7 E       F       G       H          NA
#>  8 <NA>    <NA>    <NA>    <NA>        6
#>  9 <NA>    <NA>    <NA>    <NA>        7
#> 10 <NA>    <NA>    <NA>    <NA>        8
#> 11 <NA>    <NA>    <NA>    <NA>        9
#> 12 <NA>    <NA>    <NA>    <NA>       10

^{Created on 2020-09-05 by the reprex package (v0.3.0)}

JStanley1989 · September 6, 2020, 5:20am

@FJCC

This works really well to my initial post. I did forget a few things on the initial post. In the example below, I would like to have the column headers come down with the numbers and if possible, I was going to make a Line Item No column that counts each line if it's >0. I Here's a more closer representation of what the data output would look like.

The code you provided me brought the quantities down. The column headers are the Item Descriptions. So, I would need the column headers to match the quantities. If the qty ordered was 0 for a certain column, it wouldn't show on the "value" list and I would just count all the line items > 0 and it would automatically have a counter.

Hopefully that makes sense... Thanks for the initial code. It's definitely getting closer to my solution.

FJCC · September 6, 2020, 1:36pm

I think this is closer to what you want. It is a little easier than your original request because the pivot_longer() function uses the column headers as labels and I added code to remove that column in the previous code.
As I commented in the code, replacing data in rows with NA so that only the first row has the labels is a bad idea and should only be done if another tool requires it or for display purposes after you are done processing the data.

library(tidyr)
library(dplyr, warn.conflicts = FALSE)

DF <- data.frame(Column1 = c("A", "E"),
                 Column2 = c("B", "F"),
                 Column3 = c("C", "G"),
                 Column4 = c("D", "H"),
                 Column5 = c(1, 6),
                 Column6 = c(0, 7),
                 Column7 = c(3, 8),
                 Column8 = c(4, 9),
                 Column9 = c(5, 10))
DF
#>   Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
#> 1       A       B       C       D       1       0       3       4       5
#> 2       E       F       G       H       6       7       8       9      10
#Pivot columns 5 - 9 into two columns, one for the Item & one for Quantity
DF <- DF %>% pivot_longer(Column5:Column9, names_to = "Item", values_to = "Quantity")
DF
#> # A tibble: 10 x 6
#>    Column1 Column2 Column3 Column4 Item    Quantity
#>    <chr>   <chr>   <chr>   <chr>   <chr>      <dbl>
#>  1 A       B       C       D       Column5        1
#>  2 A       B       C       D       Column6        0
#>  3 A       B       C       D       Column7        3
#>  4 A       B       C       D       Column8        4
#>  5 A       B       C       D       Column9        5
#>  6 E       F       G       H       Column5        6
#>  7 E       F       G       H       Column6        7
#>  8 E       F       G       H       Column7        8
#>  9 E       F       G       H       Column8        9
#> 10 E       F       G       H       Column9       10

#Filter out rows where the Quantity is zero then add row numbers.
DF <- DF %>% filter(Quantity > 0) %>%
  group_by(Column1, Column2, Column3, Column4) %>% 
  mutate(Line_Number = row_number())
DF
#> # A tibble: 9 x 7
#> # Groups:   Column1, Column2, Column3, Column4 [2]
#>   Column1 Column2 Column3 Column4 Item    Quantity Line_Number
#>   <chr>   <chr>   <chr>   <chr>   <chr>      <dbl>       <int>
#> 1 A       B       C       D       Column5        1           1
#> 2 A       B       C       D       Column7        3           2
#> 3 A       B       C       D       Column8        4           3
#> 4 A       B       C       D       Column9        5           4
#> 5 E       F       G       H       Column5        6           1
#> 6 E       F       G       H       Column6        7           2
#> 7 E       F       G       H       Column7        8           3
#> 8 E       F       G       H       Column8        9           4
#> 9 E       F       G       H       Column9       10           5

#Replace labels with NA when Line_Number is > 1.
#This is a bad practice and should only be done if another tool requires it
DF <- DF %>% mutate(Column1 = ifelse(Line_Number == 1, Column1, NA),
              Column2 = ifelse(Line_Number == 1, Column2, NA),
              Column3 = ifelse(Line_Number == 1, Column3, NA),
              Column4 = ifelse(Line_Number == 1, Column4, NA))
DF
#> # A tibble: 9 x 7
#> # Groups:   Column1, Column2, Column3, Column4 [3]
#>   Column1 Column2 Column3 Column4 Item    Quantity Line_Number
#>   <chr>   <chr>   <chr>   <chr>   <chr>      <dbl>       <int>
#> 1 A       B       C       D       Column5        1           1
#> 2 <NA>    <NA>    <NA>    <NA>    Column7        3           2
#> 3 <NA>    <NA>    <NA>    <NA>    Column8        4           3
#> 4 <NA>    <NA>    <NA>    <NA>    Column9        5           4
#> 5 E       F       G       H       Column5        6           1
#> 6 <NA>    <NA>    <NA>    <NA>    Column6        7           2
#> 7 <NA>    <NA>    <NA>    <NA>    Column7        8           3
#> 8 <NA>    <NA>    <NA>    <NA>    Column8        9           4
#> 9 <NA>    <NA>    <NA>    <NA>    Column9       10           5

^{Created on 2020-09-06 by the reprex package (v0.3.0)}

JStanley1989 · September 7, 2020, 5:53am

@FJCC This is very close, but I'm getting an error when I try to remove the duplicate rows. It says: "Column 'ID' can't be modified because it's a grouping variable

Here is current Code:

install.packages("tidyverse")
install.packages("readxl")
library("tidyverse")
library("readxl")
library(dplyr, warn.conflicts = FALSE)

df <- read_excel("COVID19Supply.xlsx")

df <- select(df, -c("Created By", "Modified By", "Submission", "Tracking Number", "MaterialShipped?",
"Status", "Created","Request Week", "Service Area", "District", "Facility", "Secondary POC", "Secondary POC Phone No.", "Facility Name", "Additional Comments", "Item Type", "Path"))

df$Paper Mask<- as.double(as.character(df$Paper Mask))
df$ID <- as.character(df$ID)
df$Primary POC <- as.character(df$Primary POC)
df$Primary POC Phone No. <- as.character(df$Primary POC Phone No.)
df$ShipAddress <- as.character(df$ShipAddress)
df$Address Line 2 <- as.character(df$Address Line 2)
df$Additional Information <- as.character(df$Additional Information)
df$ShipCity <- as.character(df$ShipCity)
df$ShipState <- as.character(df$ShipState)
df$ShipZip <- as.character(df$ShipZip)

df <- df %>% pivot_longer(Small Gloves:Telescopic Pointer, names_to = "Item", values_to = "Quantity")

df <- df %>% filter(Quantity > 0) %>%
group_by(ID, Primary POC, Primary POC Phone No., ShipAddress, Address Line 2, Additional Information, ShipCity, ShipState, ShipZip, Country) %>%
mutate(Line_Number = row_number())

df <- df %>% mutate(ID = ifelse(Line_Number == 1, ID, NA),
Primary POC = ifelse(Line_Number == 1, Primary POC, NA),
Primary POC Phone No. = ifelse(Line_Number == 1, Primary POC Phone No., NA),
ShipAddress = ifelse(Line_Number == 1, ShipAddress, NA),
Address Line 2 = ifelse(Line_Number == 1, Address Line 2, NA),
Additional Information = ifelse(Line_Number == 1, Additional Information, NA),
ShipCity = ifelse(Line_Number == 1, ShipCity, NA),
ShipState = ifelse(Line_Number == 1, ShipState, NA),
ShipZip = ifelse(Line_Number == 1, ShipZip, NA),
Country = ifelse(Line_Number == 1, Country, NA))

JStanley1989 · September 7, 2020, 5:58am

Also to note, the ID, Primary POC, Phone No. etc. relate to the column 1, 2, 3, and 4 example we were using. I just made the example short, but the real data has many more columns of data. But, I still used yours as a template. It's just giving me that error. The current output does everything but make the rows after row 1 NA. It is very close to the solution!!!

nirgrahamuk · September 7, 2020, 10:03am

this implies you should use ungroup() before altering the variable

JStanley1989 · September 7, 2020, 8:58pm

Perfect, that worked! Thank you both for the help!

system · September 14, 2020, 8:58pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.