Iteratively insert row into data frame

Hello all!

I am trying to iteratively add new observations to my dataframe df1. Instead of using rbind() to insert the observations at the end of the dataframe, I'd like to insert my observations into df1 using add_row(). I'm relatively new to R, so currently I iterate by running a for loop with the range as

row in 1:nrow(df1))

and then (given some conditions), creating a new row and using rbind to append it to df1. If I wanted to instead insert the row inside df1 (i.e. if it happens at row n, inserting at n + 1), could I still use the same range for my loop? If I have k original rows, and I insert an observation at, say, row 5, now I have k + 1 rows, and the former row 5 would instead be row 6, so I would think that I'd need to adjust my loop to a larger range. Is this true?

Also worth noting, I've been reading that it's better to vectorize operations in R because it works faster than a for loop. Is there a way to vectorize this operation? Or any recommendations on tutorials/websites where I can read up and start better understanding vectorizing?

Thanks so much.

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

Sorry about that. Here is an example of what I'm trying to do, using the first 5 rows of the DAX column from the EuStockMarkets dataset.

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.0.3

df <- data.frame(DAX = c(1628.75, 1613.63, 1606.51, 1621.04, 1618.16))

for(row in 1:nrow(df)) {
  if({df[(row + 1), "DAX"] > df[row, "DAX"]}) {
    df <- df %>%
      add_row(DAX = (df[(row + 1), "DAX"] - df[row, "DAX"]),
              .before = df[row, ])
  }
}

Created on 2021-02-14 by the reprex package (v1.0.0)

So, my goal in this reprex is to iterate over the dataframe, checking to see if DAX increased, and then inserting a row into the dataframe (between the two rows) representing the increase. If you run this output, you can pretty clearly see this doesn't work, so I don't think it would work for my dataframe. Are there any recommendations on how to make something like this work or to vectorize it?

What do you think of this approach ?

 mutate(df,
        change_in_dax = lead(DAX)-DAX) %>% 
   pivot_longer(cols = everything()) %>% 
   na.omit()

Okay, I think I understand what you're recommending here. I think I may have oversimplified my reprex relative to my dataset. Here's a better example, I think. If I make the reprex something like this, using that same DAX column and adding a column to represent the (fake) day of the week when the observation was taken:

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.0.3

df <- data.frame(DAX = c(1628.75, 1613.63, 1606.51, 1621.04, 1618.16, 1636.78, 1632.17), 
                 Day.Of.Week = c("1", "2", "4", "5", "1", "2", "3"))

Created on 2021-02-14 by the reprex package (v1.0.0)

My goal would be to maintain the structure of that dataframe I created while inserting the row. So in this case, if I want to find that change_in_dax for observations taken on consecutive days (i.e. if lead(Day.Of.Week) - Day.Of.Week = 1), I would use a for loop to do this. My pseudocode for this would look something like:

for ("every row in the dataframe"){
   if ("the observations are separated by 1 day") {
      "insert row in between the two rows with some descriptor about how the observations changed"
  }
}

And then after running that code, I would ideally end up with a df that looks something like

df <- data.frame(DAX = c(1628.75, -15.12, 1613.63, 1606.51, 14.53, 1621.04, 1618.16, 1636.78, 1632.17), 
                 Day.Of.Week = c(1, 0, 2, 4, 0, 5, 1, 3, 5))

though the Day.Of.Week could definitely be something besides 0, that's just to preserve the class of the vector as numeric, the row insertion is what I'm more concerned about. Does that still make sense? Sorry, and thank you very much for your help.

Hi aellis ,

It's a bit of challenge as one need to have a unique identifier for each observation (row) in order to insert the new values correctly.

One possible approach is like this:

library(tidyverse)

## create dataframe
df_in <- tibble(DAX = c(1628.75, 1613.63, 1606.51, 1621.04, 1618.16, 1636.78, 1632.17),
                day_of_week = c(1, 2, 4, 5, 1, 2, 3)
                )
df_in
#> # A tibble: 7 x 2
#>     DAX day_of_week
#>   <dbl>       <dbl>
#> 1 1629.           1
#> 2 1614.           2
#> 3 1607.           4
#> 4 1621.           5
#> 5 1618.           1
#> 6 1637.           2
#> 7 1632.           3

## create temporary dataframe `df_a` which:
## 1) calculate DAX difference when previous DAX is from yesterday
## 2) add temporary column `id`, which is needed to sort the endresult correctly
df_a <- df_in %>% 
  mutate(diff = case_when(day_of_week - lag(day_of_week) == 1 ~ DAX - lag(DAX)),
         id = row_number(),
         ) %>% 
  select(id, everything()) %>%
  identity()

## create new temporary dataframe, based on `df_a`
## holding rows which need to be added
df_b <- df_a %>% 
  filter(!is.na(diff)) %>% 
  mutate(DAX = diff,
         day_of_week = 0,
         ) %>% 
  select(id, DAX, day_of_week)

## merge both dataframes, explicitely use `copy = TRUE` 
## so that rows are added and not merged!
df_out <- df_b %>% 
  full_join(df_a %>% 
              select(id, DAX, day_of_week),
            copy = TRUE
            ) %>% 
  arrange(id) %>% 
  ## remove the temporary `id` column
  select(-id)
#> Joining, by = c("id", "DAX", "day_of_week")

## show endresult
df_out
#> # A tibble: 11 x 2
#>        DAX day_of_week
#>      <dbl>       <dbl>
#>  1 1629.             1
#>  2  -15.1            0
#>  3 1614.             2
#>  4 1607.             4
#>  5   14.5            0
#>  6 1621.             5
#>  7 1618.             1
#>  8   18.6            0
#>  9 1637.             2
#> 10   -4.61           0
#> 11 1632.             3

Created on 2021-02-15 by the reprex package (v1.0.0)

Note: the values of DAX appears to be incorrect, but use df_out %>% View() and you'll see all is OK.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.