Show data based on column values match

Hi folks,

I am a newbie to R and stuck in brining the data in analyzable format. I have the data in the format as below:

Item Item # Date Type Description
A 100 22/01/2022 511 Desc1
B 200 15/01/2022 511 Desc2
A 100 20/03/2022 711 Desc1
B 200 16/02/2022 711 Desc2
A 300 10/02/2022 511 Desc3

Required output format:

Item Item # Out Date In Date Description
A 100 22/01/2022 20/03/2022 Desc1
B 200 15/01/2022 16/02/2022 Desc2

These are the outbound and inbound transactions. Type 511 are outbound and Type 711 are inbound. I need to show the data as against "Item" and "Item #" match. Hope I am able to explain the requirement.

Thanks in advance...

It is a typical application scenario for 'pivot()' functions from tidyr.

here I replaced the Type code with some specified words at first, then used the pivot_longer() function to expand the dataframe according to the Type key:

library(tidyverse)

df1 <- tibble::tribble(
  ~Item,~`Item#`,	~Date,	~Type,	~Description,
  'A',	100,	'22/01/2022',	511,	'Desc1',
  'B',	200,	'15/01/2022',	511,	'Desc2',
  'A',	100,	'20/03/2022',	711,	'Desc1',
  'B',	200,	'16/02/2022',	711,	'Desc2',
  'A',	300,	'10/02/2022',	511,	'Desc3',
)

df1 %>% mutate(Type = case_when(
  Type == 511 ~ 'outbound',
  Type == 711 ~ 'inbound',
  TRUE ~ 'NA') # this row point to Type code for other situations
) %>% pivot_wider(
  names_from = 'Type',
  values_from = 'Date'
)
# A tibble: 3 x 5
  Item  `Item#` Description outbound   inbound   
  <chr>   <dbl> <chr>       <chr>      <chr>     
1 A         100 Desc1       22/01/2022 20/03/2022
2 B         200 Desc2       15/01/2022 16/02/2022
3 A         300 Desc3       10/02/2022 NA     
1 Like

Let me give a crisper solution using data.table & tidyverse without using specialised functions like pivot_longer.
Load the data:

library(data.table)
library(tidyverse)
dt <- data.table::data.table(
         Item = c("A", "B", "A", "B", "A"),
         ItemN = c(100L, 200L, 100L, 200L, 300L),
         Date = c("22/01/2022","15/01/2022","20/03/2022",
                  "16/02/2022","10/02/2022"),
         Type = c(511L, 511L, 711L, 711L, 511L),
  Description = c("Desc1", "Desc2", "Desc1", "Desc2", "Desc3")
)

And now the simple piped code giving the output:

dt[dt,on = .(Item,ItemN)][Type == 511 & Type !=i.Type] %>% 
setnames(c("Date","i.Date"), c("outDate","inDate")) %>% 
dplyr::select(-ends_with("Type"),-starts_with("Desc"))

The solution provided by @yifanliu is also correct. I just posted another one for fun :slight_smile:

Cheers

1 Like

Many Thanks for providing the solution. Both your solutions are working fine for the provided dataset. Now I have got more columns in the dataset with millions of rows and I need to bring-in quantity as well in the output.
My bad, I am not been able to use your code to get the required output :frowning_face: Can you please help?

Material Type Plant Quantity Item ItemN Date Region Description User
MatA 511 ME11 10 A 100 22/01/2022 EMEA Desc1 WL
MatB 511 ME11 5 B 200 15/01/2022 EMEA Desc2 WI
MatA 711 ME21 5 A 100 20/03/2022 EMEA Desc1 WL
MatB 711 ME21 5 B 200 16/02/2022 EMEA Desc2 Wl
MatC 511 ME11 3 A 300 10/02/2022 EMEA Desc3 WL
MatA 711 ME21 3 A 100 20/03/2022 EMEA Desc1 WL
MatA 711 ME21 1 A 100 20/03/2022 EMEA Desc1 WL

Required Output:

Item ItemN Description Out_Date In_Date Out_Qty In_Qty
A 100 Desc1 22/01/2022 20/03/2022 10 5
A 100 Desc1 22/01/2022 20/03/2022 10 5
A 100 Desc1 22/01/2022 20/03/2022 10 3
A 100 Desc1 22/01/2022 20/03/2022 10 1
B 200 Desc2 15/01/2022 16/02/2022 5 5

Thanks in advance...

extending yifanliu's solution:

library(tidyverse)

tibble::tribble(
  ~Item, ~`Item#`,        ~Date, ~Type, ~Description, ~quantity,
  "A",      100, "22/01/2022",   511,      "Desc1",        1L,
  "B",      200, "15/01/2022",   511,      "Desc2",        2L,
  "A",      100, "20/03/2022",   711,      "Desc1",        3L,
  "B",      200, "16/02/2022",   711,      "Desc2",        4L,
  "A",      300, "10/02/2022",   511,      "Desc3",        5L
)
df1 %>% mutate(Type = case_when(
  Type == 511 ~ 'outbound',
  Type == 711 ~ 'inbound',
  TRUE ~ 'NA') # this row point to Type code for other situations
) %>% pivot_wider(
  names_from = 'Type',
  values_from = c('Date','quantity')
)
1 Like

Thanks @nirgrahamuk for your response. I am getting an error as:

Error in app$vspace(new_style$margin-top %||% 0) :
attempt to apply non-function

library(tidyverse)

df1 <- as_tibble(read.csv("sample.csv"))

df1 %>% mutate(Type = case_when(
  Type == 511 ~ 'outbound',
  Type == 711 ~ 'inbound',
  TRUE ~ 'NA') # this row point to Type code for other situations
) %>% pivot_wider(
  names_from = 'Type',
  values_from = c('Date','quantity')
)

The "sample.csv" file contains the data in csv format as the image attached. Not getting on what I am missing.

Your sample csv doesnt contain quality as my example did. You have Quantity.
I cant comment on the error you shared as it doesnt seem to relate to any of the code under discussion, or at least, you havent provided me a way to trigger that error.

nirgrahamuk's solution is enough for use in my opinion, unless you only want a clean output as you required, some additional selection of cols and unnest of the output is needed.

df1 <- read_csv('sample.csv')

df1 %>% 
  select(-c(Material,Plant,Region,User)) %>% 
  mutate(Type = case_when(
  Type == 511 ~ 'outbound',
  Type == 711 ~ 'inbound',
  TRUE ~ 'NA')
) %>% pivot_wider(
  values_fn = list,
  names_from = 'Type',
  values_from = c('Date','Quantity')
) %>% unnest(matches("inbound|outbound"))

# A tibble: 5 x 7
  Item  ItemN Description Date_outbound Date_inbound Quantity_outbound Quantity_inbound
  <chr> <dbl> <chr>       <chr>         <chr>                    <dbl>            <dbl>
1 A       100 Desc1       22/01/2022    20/03/2022                  10                5
2 A       100 Desc1       22/01/2022    20/03/2022                  10                3
3 A       100 Desc1       22/01/2022    20/03/2022                  10                1
4 B       200 Desc2       15/01/2022    16/02/2022                   5                5
5 A       300 Desc3       10/02/2022    NA                           3               NA

In addition, because the sample data is getting more complex, I strongly recommend you figure out key data fields that are required to construct the result. For example, I see a column named Plant, constructing the inbound/outbound record for each plant may be more meaningful.

And I don't think that the error relates to code from our provided solution, too.

1 Like

Thanks @yifanliu for the solution. This is what i need.
I got the error resolved. I was using the old package which I have updated now...

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.