Show data as node to node based on key match

I am dealing with complex dataset with millions of rows. This dataset is a combination of shipments from plant to plant. Movement Types 555 & 666 are outbound shipments and 100 are inbound shipments.

Here is the sample of the dataset:

Tx_ref Movement_Type Quantity Delivery_Item Delivery_Code Date Description Plant Region
1 555 100 A 123 04/01/2022 DescA123 Plant1 EMEA
2 555 200 A 123 10/01/2022 DescA123 Plant1 EMEA
3 100 300 A 123 11/02/2022 DescA123 Plant2 LATAM
4 555 10 B 256 04/03/2022 DescB256 Plant1 EMEA
5 555 10 B 256 04/03/2022 DescB256 Plant1 EMEA
6 666 300 A 123 22/02/2022 DescA123 Plant2 LATAM
7 100 20 B 256 22/03/2022 DescB256 Plant3 APAC
8 100 300 A 123 12/04/2022 DescA123 Plant3 APAC
9 666 5 C 123 12/04/2022 DescC123 Plant1 EMEA
10 555 5 D 123 12/04/2022 DescD123 Plant1 EMEA
11 666 5 C 123 12/04/2022 DescC123 Plant1 EMEA
12 100 5 C 123 20/04/2022 DescC123 Plant2 LATAM

For outbound transactions, the date/plant/region refers the shipping date/plant/region and for inbound transactions, the date/plant/region refers the receiving date/plant/region.
The key to matching the shipment to the receipt is by the combination of Delivery_Item and the Delivery_Code.

I need to show the data as node to node i.e. starting from shipping plant to end receiving plant. For example: 100 shipped from plant1 on 04/01, 200 shipped from plant1 on 10/01, 300 received at plant2 on 11/02, again this 300 shipped from plant2 on 22/02 and received at plant3 on 12/04. Over here, plant2 is the intermediary plant which I don't want to show. plant1 is the starting plant and plant3 is the end plant.

This is the output I required from this sample data. Just wondering how could I do:

Delivery_Item Delivery_Code Description Start_Plant Start_Region End_Plant End_Region Start_Date End_Date Total Qty Shipped Total Qty Received
A 123 DescA123 Plant1 EMEA Plant3 APAC 04/01/2022 12/04/2022 300 300
B 256 DescB256 Plant1 EMEA Plant3 APAC 04/03/2022 22/03/2022 20 20
C 123 DescC123 Plant1 EMEA Plant2 LATAM 12/04/2022 20/04/2022 10 5

Thanks in advance....

what have you tried so far ? where do you get stuck ?
some initial things you could think about doing.
You have 3 movement codes, but you interpret them as two types (important for structuring your data)
so write some code to make a 2 level movement category.
The relationships between entries in your table are defined by matching Delivery_Item and Delivery_Codes, therefore in the tidyverse/dplyr world, we might start by grouping on these , and then summarising.

To start with, I have used the code from one of the previous topic;

Now I have to find lookup the first date/plant/region matching Delivery_Item and Delivery_Code and aggregate the shipping/receiving quantities. Got stuck over here. Hope I am going in the right direction.

library(tidyverse)

df1 <- read_csv('sample.csv')

df1 %>% 
  select(-c(Tx_ref)) %>% 
  mutate(Movement_Type = case_when(
    (Movement_Type == 555 | Movement_Type == 666)  ~ 'Outbound',
    Movement_Type == 100 ~ 'Inbound',
    TRUE ~ 'NA')
  ) %>% pivot_wider(
    values_fn = list,
    names_from = 'Movement_Type',
    values_from = c('Date','Quantity')
  ) %>% unnest(matches("Outbound|Inbound"))

I think you should add Plant and region to the values_from as you seemed to want to know which plants are inbound and which are outbound to pick them for the start and end etc.

Need to think about in theory what logic could be used to determine what plant is intermediary vs a start or end plant ?
also its not clear to me whether there could be / would be multiple starts and ends for a delivery, or guaranteed just one for those roles ?

Yes, there could be multiple starts or ends for a delivery but need to consider only the first start and last end matching Delivery_Item and Delivery_Code.
If there is any intermediary plant, need to be ignored.

A hint for determining what info to pick from which records is to interpret the interim for terminal nodes by what seems to me the logic of if it sends and receives then its in the middle / interim.

df2 %>% arrange(Delivery_Item,
                     Delivery_Code) %>% rowwise() %>% 
  mutate(interim_or_term = 
           case_when(any(is.na(c(Date_Outbound,Date_Inbound))) ~ 'term',
                     TRUE ~ 'interim')) %>% relocate(interim_or_term)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.