Add blank rows based on a vector of index values

BillPetti · May 30, 2021, 7:36pm

I have two data frames that result from scraping. In the one data frame (A) there may be rows where a column has a value of 'Ppd'. For these rows I need to insert blank values in another table (B) at the specific index value--so, if rows 1, 10, 15 in data frame A have a value of 'Ppd' I need to insert blank rows at rows 1, 10, 15 in data frame B.

Ideally, I could pass a vector to tibble::add_row() for the .before argument, but that is not accepted.

Any suggestions for how to do this when you have multiple rows to add and you won't know how many ahead of time?

technocrat · May 30, 2021, 10:14pm

Try working from this

mat <- matrix(1:10, ncol = 2)
mat
#>      [,1] [,2]
#> [1,]    1    6
#> [2,]    2    7
#> [3,]    3    8
#> [4,]    4    9
#> [5,]    5   10
rbind(mat[1:4, ], NA, mat[5, ])
#>      [,1] [,2]
#> [1,]    1    6
#> [2,]    2    7
#> [3,]    3    8
#> [4,]    4    9
#> [5,]   NA   NA
#> [6,]    5   10

Get the indices of df1, with the string, by

inserts <- which(df1["V"] == 'Pdp') + c(0,2:6)

The addition of c(0,2:6) takes into account that the row target of the second object increments every time a row is added. The 6 depends on the length returned by which.

DavoWW · May 31, 2021, 2:23am

Hi @BillPetti,
Its a bit clunky but using row numbers as part of a full_join might work for you:

library(tidyverse)

dfA <- data.frame(id = rep("df_a", times=20),
                  Ppd = c(1:4,NA,6:9,NA,11:14,NA,16:20))

dfB <- data.frame(id = rep("df_b", times=20),
                  my_var = rnorm(20, 10))

dfA %>% rownames_to_column(var="old_row_num") -> dfA
dfB %>% rownames_to_column(var="old_row_num") -> dfB

dfA %>% 
  filter(is.na(Ppd)) %>% 
  mutate(old_row_num = as.character(as.numeric(old_row_num)-0.5)) -> dfA_stub

full_join(dfB, dfA_stub, by="old_row_num") %>% 
  arrange(as.numeric(old_row_num)) %>% 
  select(-"Ppd")
#>    old_row_num id.x    my_var id.y
#> 1            1 df_b  8.758071 <NA>
#> 2            2 df_b 10.364683 <NA>
#> 3            3 df_b 10.587924 <NA>
#> 4            4 df_b 11.385030 <NA>
#> 5          4.5 <NA>        NA df_a
#> 6            5 df_b 11.335886 <NA>
#> 7            6 df_b 11.178415 <NA>
#> 8            7 df_b  9.095322 <NA>
#> 9            8 df_b  9.872665 <NA>
#> 10           9 df_b 10.091686 <NA>
#> 11         9.5 <NA>        NA df_a
#> 12          10 df_b 10.916490 <NA>
#> 13          11 df_b 10.782337 <NA>
#> 14          12 df_b 10.132188 <NA>
#> 15          13 df_b  8.692119 <NA>
#> 16          14 df_b  9.739697 <NA>
#> 17        14.5 <NA>        NA df_a
#> 18          15 df_b 10.460583 <NA>
#> 19          16 df_b 10.109600 <NA>
#> 20          17 df_b  9.819515 <NA>
#> 21          18 df_b  9.233955 <NA>
#> 22          19 df_b 11.644270 <NA>
#> 23          20 df_b  8.893601 <NA>

^{Created on 2021-05-31 by the reprex package (v2.0.0)}

system · June 21, 2021, 2:24am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.