Adding a value in a new column for a specific number of rows based on the value of single row?

Hello!

For my data, a block constitutes of 60 rows. Following a block, there's a row that either has a value of 1 or 2. However, this row does not appear after EVERY block, there are a few blocks that are just back to back. There is another column called "test_name", the value of this column for the blocks are "block" and the value of this column for the single rows following blocks are "question".

The value of the row following a block is either 1 or 2. For the preceding block, I would like to create a new column that adds the same value of the following row.
e.g. row following block 1 has the value "1", so input value "1" for all of block 1 in the new column, while the row following block 2 has a value of "2" so input value "2" for all of block 1, there's no row following block 3 so skip, the row following block 4 has a value of "2" so values in the new column of block 4 should be "2".
Row of 60 - single row w/ 1 value (1 or 2) - row of 60 - row of 60 - single row w/ value of 1 or 2, etc.
This is at random, there's no discernible pattern from what I could see. I hope this makes sense!

I was wondering if its possible to reference whenever this specific row appears and the value it has (maybe through "IF" (?)) and then add a value to the preceding 60 rows with the value it has in a new column?

If you need extra information or clarification, please ask. Thanks for any help in advance!

Can you provide a reproducible example? It will make easier to work out what you are trying to do:

As williaml says, it is more difficult without an example of your data. I tried to produce something similar to your description. Hopefully it will get you close enough.

In the final result, I have left the "question" rows in place, but they're easy enough to filter out if you don't want them

# Data
val <- rep(1:4*10, each = 6)
val <- c(val[1:6], 1, val[7:18], 2, val[19:24])  
test_name <- ifelse(val < 10, 'question', 'block')
d <- data.frame(test_name, val)

# create new column for values
d$new_col <- NA

# Block size
block_size <- 6

# Id question rows
qs <- which(d$test_name == 'question')

# get values 
v <- rep(d[qs, 'val'], each = block_size)

# create matix of indices to update - assumes new column is last in data frame
m <- cbind(unlist(lapply(qs, function(x, y) seq(x-y, x-1), block_size)), ncol(d))

# Update indices
d$new_col[m] <- v

d
#>    test_name val new_col
#> 1      block  10       1
#> 2      block  10       1
#> 3      block  10       2
#> 4      block  10       1
#> 5      block  10       1
#> 6      block  10       1
#> 7   question   1      NA
#> 8      block  20      NA
#> 9      block  20      NA
#> 10     block  20      NA
#> 11     block  20      NA
#> 12     block  20      NA
#> 13     block  20      NA
#> 14     block  30       2
#> 15     block  30       2
#> 16     block  30       2
#> 17     block  30       2
#> 18     block  30       2
#> 19     block  30       2
#> 20  question   2      NA
#> 21     block  40      NA
#> 22     block  40      NA
#> 23     block  40      NA
#> 24     block  40      NA
#> 25     block  40      NA
#> 26     block  40      NA

Created on 2020-10-21 by the reprex package (v0.3.0)

@williaml @jmcvw

Sorry, I've added a small example of what I would like the data to look like!

## subject = subject number, key = button they pressed, test_part = block or question row, value = new column with value (1, 2 or NA)
subject = factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4))
key = c(4, 4, 4, 4, 4, 1, 5, 5, 5, 5, 5, 2, 5, 5, 5, 5, 5, 4, 5, 4, 4, 5, 2, 4, 4, 4, 4, 4, 1, 4, 4, 4, 4, 4, 4, 5, 5, 5, 4, 1, 4, 4, 5, 5, 4, 2)
test_name = c("block", "block", "block", "block", "block", "question", "block", "block", "block", "block", "block", "question", "block", "block", "block", "block", "block", "block", "block", "block", "block", "block", "question", "block", "block", "block", "block", "block", "question", "block", "block", "block", "block", "block", "block", "block", "block", "block", "block", "question", "block", "block", "block", "block", "block", "question")
value = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, NA, NA, NA, NA, NA, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
df = data.frame(subj, key, test_name, value)

df

In the example, I've excluded the other elements of my dataframe, keeping just the relevant columns needed. The "value" column would be the new column.

I tried using @jmcvw method on my dataset and I think I did something wrong somewhere... The "m" dataframe has a column that contains numbers ranging from 90-1500 (similar to how many rows there are total in my dataframe) and another column that just has a value of "16". When I combined the "m" and my dataframe together, the new column has a mix of values of "c(5, 5, 5, 5, 4, 5, 4, 5, 5, 5, 4, 5, 5, 5)" and "NA" thrown around randomly. It works on the "d" dataset that was provided though.

Thanks!

Hmm, it looks like the data you just provided already meets your outcome? I'm not sure why it doesn't
I wrapped my code up into a function for you to play with. Hopefully you'll be able to modify it to meet your needs.

As far as I can tell it works now on both data sets. (There was a bug I think, but should be fixed?)

The only difference I can see is that the "question" rows have NA, but I assumed you would be filtering those out. If they should also be updated just remove the -1 from here seq(x-y, x - 1)

add_question_val <- function(d, block_size,
                             value_from ='key', value_to = 'value') {
  # Add new column if it doesn't exist
  if (!'value_to' %in% names(d)) d[[value_to]] <- NA
  # Id question rows
  qs <- which(d$test_name == 'question')
  # get values 
  v <- rep(d[qs, value_from], each = block_size)
  # create ~~matrix~~ vector of indices to update
  m <- unlist(lapply(qs, function(x, y) seq(x-y, x - 1), block_size))
  # Update indices
  d[[value_to]][m] <- v
  d
}

add_question_val(d, block_size = 6,
                 value_from ='val', value_to = 'new_col')
add_question_val(df, block_size = 5, value_to = 'value2')

Thank you for providing the function! I tinkered with it a bit and I have it working now on my dataset now. The only minor issue is the value in the new column comes out as either "5", "4", or NA, rather than the actual keypress values in my real data (e.g. 1, 2 or NA), but I just modified the values using an ifelse statement. :grinning:

Thank you so much for your help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.