Add new blank rows into dataset by group (in R)

janjanku · November 21, 2022, 10:36pm

I use R. I have dataframe like this:

dat <- data.frame(
  group = c(1,1,1,1,1,1,2,2,2,2,2),
  horizon  = c(1,3,5,6,7,10,1,3,5,9,10),
  value = c(1.0,0.9,0.8,0.6,0.3,0.0,0.5,0.6,0.8,0.9,0.8)
)

And i would like to add row for every horizon that is missing (2,4,8 and 9 for the first group and 2,4,6,7,8 for the second group). Values (value) for the missing horizons would be blank.

I would like to get something like this:

datx <- data.frame(
  group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
  horizon  = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10),
  value = c(1.0,"na",0.9,"na",0.8,0.6,0.3,"na","na",0.0,0.5,"na",0.6,"na",0.8,"na","na","na",0.9,0.8)
)

This is just an example. I am actually working with a much larger dataset.

Without the groups, the problem would be much easier to solve, i would use something like this:

newdat <- merge(data.frame(horizon=seq(1,10,1)),dat,all=TRUE)
newdat <- newdat[order(newdat$horizon),]

Thanks for help!

EconomiCurtis · November 21, 2022, 10:48pm

I made a few assumption with this method.

I assume you want a full table with all combinations of "group" and "horizon" values, creating the dat_full table.
And I converted the "na" you had in your datx table with R's NA missing values.

But given that, the dat_full table will create table with all combinations of listed group and horizon values, and input an NA for any which have a missing "value".

library(dplyr)
dat <- data.frame(
  group = c(1,1,1,1,1,1,2,2,2,2,2),
  horizon  = c(1,3,5,6,7,10,1,3,5,9,10),
  value = c(1.0,0.9,0.8,0.6,0.3,0.0,0.5,0.6,0.8,0.9,0.8)
)
dat_full <- merge(
  tibble(group = 1:2), 
  tibble(horizon = 1:10)
)

dat_full <- left_join(
  dat_full,
  dat,
  by = c("group", "horizon")
) %>% 
  arrange(group) %>% 
  tibble

datx <- data.frame(
  group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
  horizon  = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10),
  value = c(1.0,NA,0.9,NA,0.8,0.6,0.3,NA,NA,0.0,0.5,NA,0.6,NA,0.8,NA,NA,NA,0.9,0.8)
) %>% 
  arrange(group) %>% tibble

identical(dat_full, datx)
#> [1] TRUE

^{Created on 2022-11-21 by the reprex package (v2.0.1)}

janjanku · November 22, 2022, 7:23am

Thanks, that works great! And what if it has additional variables (columns) in the original dataset (dat) that are now empty rows? Is there any way to make these variables keep their original values and just expand to new horizons?
There are 80 other variables, but they don't change on the horizon, they stay the same for each group. But different groups can have different values of these variables (columns), but also the same value. Sometimes it is a numeric value, sometimes a string.

system · December 13, 2022, 7:24am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.