Filling missing rows in a data table / frame

Dear all,

I have the following question : is it possible to fill some missing rows in a database ?

Please find below an exemple of what I attempt to do. I have a database with some sellers and buyers and monthly transactions

library(data.table)
set.seed(1)
Data <- data.frame(
  Month = c(1,5,3,4,5,3,4,5,6),
  Code_ID_Buy = c("100D","100D","102D","102D","102D","100D","100D","100D","100D"),
  Code_ID_Sell = c("98C","98C","99C","99C","99C","25A","25A","25A","25A"),
  Contract_Size = c(100,20,120,300,120,30,25,150,250)
)

View(Data)

The issue is that some months are missing, hence I would need to insert one row each time there is a missing row (an after the last occurence) with the ID of the buyer the seller and a contract value of 0, so that the data looks like :

Data <- data.frame(
  Month = c(1,2,3,4,5,6,3,4,5,6,3,4,5,6,7),
  Code_ID_Buy = c("100D","100D","100D","100D","100D","100D","102D","102D","102D","102D","100D","100D","100D","100D","100D"),
  Code_ID_Sell = c("98C","98C","98C","98C","98C","98C","99C","99C","99C","99C","25A","25A","25A","25A","25A"),
  Contract_Size = c(100,0,0,0,20,0,120,300,120,0,30,25,150,250,0)
)
View(Data)

Thank you in advance,

Kind regards,

You can do it like this

library(dplyr)
library(tidyr)

Data <- data.frame(
    Month = c(1,5,3,4,5,3,4,5,6),
    Code_ID_Buy = c("100D","100D","102D","102D","102D","100D","100D","100D","100D"),
    Code_ID_Sell = c("98C","98C","99C","99C","99C","25A","25A","25A","25A"),
    Contract_Size = c(100,20,120,300,120,30,25,150,250)
)

Data %>%
    group_by(Code_ID_Buy, Code_ID_Sell) %>% 
    complete(Month = min(Month):max(Month)+1, fill = list(Contract_Size = 0)) %>% 
    arrange(Code_ID_Buy, Code_ID_Sell, Month)
#> # A tibble: 15 x 4
#> # Groups:   Code_ID_Buy, Code_ID_Sell [3]
#>    Code_ID_Buy Code_ID_Sell Month Contract_Size
#>    <chr>       <chr>        <dbl>         <dbl>
#>  1 100D        25A              3            30
#>  2 100D        25A              4            25
#>  3 100D        25A              5           150
#>  4 100D        25A              6           250
#>  5 100D        25A              7             0
#>  6 100D        98C              1           100
#>  7 100D        98C              2             0
#>  8 100D        98C              3             0
#>  9 100D        98C              4             0
#> 10 100D        98C              5            20
#> 11 100D        98C              6             0
#> 12 102D        99C              3           120
#> 13 102D        99C              4           300
#> 14 102D        99C              5           120
#> 15 102D        99C              6             0

Created on 2021-01-29 by the reprex package (v1.0.0)

2 Likes

This is brilliant thank you !

Just a small question, what if I have a variable let us say "quality" and I want the quality for the rows inserted to be equal to the last "quality" registered so that :

Data <- data.frame(
Month = c(1,2,3,4,5,6,3,4,5,6,3,4,5,6,7),
Code_ID_Buy = c("100D","100D","100D","100D","100D","100D","102D","102D","102D","102D","100D","100D","100D","100D","100D"),
Code_ID_Sell = c("98C","98C","98C","98C","98C","98C","99C","99C","99C","99C","25A","25A","25A","25A","25A"),
Contract_Size = c(100,0,0,0,20,0,120,300,120,0,30,25,150,250,0),
Quality = c(1,1,1,1,2,2,6,3,3,3,4,5,6,4,4)
)
View(Data)

Thank you in advance;

You can add fill(Quality) like so:

Data %>%
  group_by(Code_ID_Buy, Code_ID_Sell) %>%
  complete(Month = min(Month):max(Month)+1, fill = list(Contract_Size = 0)) %>% 
  fill(Quality) %>%
  arrange(Code_ID_Buy, Code_ID_Sell, Month) %>% 
  select(Month, everything())

If needed, you can also specify the direction (by default it replaces NA downwards).

Thank you, out of curiosity why do we need : arrange(Code_ID_Buy, Code_ID_Sell, Month) %>%
select(Month, everything()). It seems to work without it

Indeed, that is correct. It's just presenting the results in a sorted way, as specified with arrange(). The ordering of the column variables, with select() is done to reflect the same ordering as the input data.

Perfect, very clear and instructive, thank you.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.