Add new blank rows into dataset by group (in R)

I use R. I have dataframe like this:

dat <- data.frame(
  group = c(1,1,1,1,1,1,2,2,2,2,2),
  horizon  = c(1,3,5,6,7,10,1,3,5,9,10),
  value = c(1.0,0.9,0.8,0.6,0.3,0.0,0.5,0.6,0.8,0.9,0.8)
)

And i would like to add row for every horizon that is missing (2,4,8 and 9 for the first group and 2,4,6,7,8 for the second group). Values (value) for the missing horizons would be blank.

I would like to get something like this:

datx <- data.frame(
  group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
  horizon  = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10),
  value = c(1.0,"na",0.9,"na",0.8,0.6,0.3,"na","na",0.0,0.5,"na",0.6,"na",0.8,"na","na","na",0.9,0.8)
)

This is just an example. I am actually working with a much larger dataset.

Without the groups, the problem would be much easier to solve, i would use something like this:

newdat <- merge(data.frame(horizon=seq(1,10,1)),dat,all=TRUE)
newdat <- newdat[order(newdat$horizon),]

Thanks for help!

I made a few assumption with this method.

  • I assume you want a full table with all combinations of "group" and "horizon" values, creating the dat_full table.
  • And I converted the "na" you had in your datx table with R's NA missing values.

But given that, the dat_full table will create table with all combinations of listed group and horizon values, and input an NA for any which have a missing "value".

library(dplyr)
dat <- data.frame(
  group = c(1,1,1,1,1,1,2,2,2,2,2),
  horizon  = c(1,3,5,6,7,10,1,3,5,9,10),
  value = c(1.0,0.9,0.8,0.6,0.3,0.0,0.5,0.6,0.8,0.9,0.8)
)
dat_full <- merge(
  tibble(group = 1:2), 
  tibble(horizon = 1:10)
)

dat_full <- left_join(
  dat_full,
  dat,
  by = c("group", "horizon")
) %>% 
  arrange(group) %>% 
  tibble

datx <- data.frame(
  group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
  horizon  = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10),
  value = c(1.0,NA,0.9,NA,0.8,0.6,0.3,NA,NA,0.0,0.5,NA,0.6,NA,0.8,NA,NA,NA,0.9,0.8)
) %>% 
  arrange(group) %>% tibble

identical(dat_full, datx)
#> [1] TRUE

Created on 2022-11-21 by the reprex package (v2.0.1)

2 Likes

Thanks, that works great! And what if it has additional variables (columns) in the original dataset (dat) that are now empty rows? Is there any way to make these variables keep their original values and just expand to new horizons?
There are 80 other variables, but they don't change on the horizon, they stay the same for each group. But different groups can have different values of these variables (columns), but also the same value. Sometimes it is a numeric value, sometimes a string.