Create loop to create new a column with groupings based on specific values from another column?

For my experiment, we have a column called "stimulus" that has numerous values that range from 1-50. Subjects were shown 4 groups of stimuli:

Block 1 = 1, 5, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 43, 47
Block 2 = 2, 6, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 44, 48
Block 3 = 3, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 45, 49
Block 4 = 4, 8, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 46, 50

General patterning for each group:
from the initial number + 4 twice then + 2 for the next 15 numbers then + 4 twice

I was wondering if there was a loop function or other function of some sort that could create a new column grouping the stimulus into these 4 groups? I was initially creating a ifelse statement and individually putting in values, but it was getting too long, messy and confusing. So I was hoping there was an easier and cleaner solution to achieve this?

Any help is appreciated, thanks!

Hi @jubejube,
R is vectorised so no loops required here.
If understand your description correctly, try this:

additions <- c(4,4,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,4,4)
start_vals <- list(1,2,3,4)

(block1 <- start_vals[[1]]+cumsum(additions))  # Try one
#>  [1]  5  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 43 47

# Do all 4
dat <- data.frame(sapply(start_vals, function(x) {x + cumsum(additions)}))
names(dat) <- c("Block1","Block2","Block3","Block4")

# Insert the starting values as the first row
dat <- rbind(start_vals, dat)
dat
#>    Block1 Block2 Block3 Block4
#> 1       1      2      3      4
#> 2       5      6      7      8
#> 3       9     10     11     12
#> 4      11     12     13     14
#> 5      13     14     15     16
#> 6      15     16     17     18
#> 7      17     18     19     20
#> 8      19     20     21     22
#> 9      21     22     23     24
#> 10     23     24     25     26
#> 11     25     26     27     28
#> 12     27     28     29     30
#> 13     29     30     31     32
#> 14     31     32     33     34
#> 15     33     34     35     36
#> 16     35     36     37     38
#> 17     37     38     39     40
#> 18     39     40     41     42
#> 19     43     44     45     46
#> 20     47     48     49     50

# Need a new grouping Column reflecting these values
library(tidyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
pivot_longer(dat, cols=1:4, names_to="Block", values_to="Stimulus") %>% 
  arrange(Block) -> long.df

head(long.df)
#> # A tibble: 6 x 2
#>   Block  Stimulus
#>   <chr>     <dbl>
#> 1 Block1        1
#> 2 Block1        5
#> 3 Block1        9
#> 4 Block1       11
#> 5 Block1       13
#> 6 Block1       15

Created on 2020-05-28 by the reprex package (v0.3.0)

HTH

Hi! Sorry I'm running into a little trouble, is the format supposed to be like this?

left_join(data$stimulus, data$block_group, # put your data with Subject and Stimulus columns here
           blklook2)

I run into an error: no applicable method for 'left_join' applied to an object of class c("double, "numeric")

or

left_join(data, # put your data with Subject and Stimulus columns here
           blklook2)

This creates a tibble, correct? It doesn't seem to combine and update with my main dataframe.

Thank you again!

The second one.
You need to use consistent naming on the variable to join by . I.e. stimulant or Stimulant. Originally you shared Stimulant but seems it was really stimulant

Sorry I'm still running into some issues. :frowning:

# blockpair code
blklook1 <- data.frame(Block1 = c(1, 5, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 43, 47)
                       ,Block2 = c(2, 6, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 44, 48)
                       ,Block3 = c(3, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 45, 49)
                       ,Block4 = c(4, 8, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 46, 50))

library(tidyverse)

blklook2 <- pivot_longer(blklook1,
                         cols = 1:4,
                         names_to = "blockpair",
                         values_to = "stimulus")

library(dplyr)

left_join(data, # put your data with Subject and Stimulus columns here
           blklook2)

In the main dataframe, I already had a column called "stimulus", but I didn't have one called "blockpair". It doesn't seem to update and create the new column "blockpair" in the main dataframe, but I see the correct output in the console though?

Sorry if I'm overlooking anything... this seems simple, but I can't seem to get my head around it!

-- edit... I forgot to mention that block 1 & 3 and block 2 & 4 share some of the same values, so it ends up duplicating a row with both column values 1 & 3
e.g.
a row with stimulus 15 is duplicated with both column value "block 1" & "block3", when it should only be "block1"
and then..
another row with stimulus 15 is duplicated with both column value "block 1" & "block3", when it should only be "block3"

blklook1 <- data.frame(
  Block1 = c(1, 5, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 43, 47),
  Block2 = c(2, 6, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 44, 48),
  Block3 = c(3, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 45, 49),
  Block4 = c(4, 8, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 46, 50)
)


library(tidyverse)
# randomly create 20 people that belong to one of the 4 blocks
# this 'simulates' the true master data
set.seed(42) # for reproducible randomness

(example_base <- data.frame(
  personid = 1:20,
  blocknum = sample.int(4, size = 20, replace = TRUE)
))
# comtinue to simulate what the real data might look like
(blklook2 <- pivot_longer(blklook1,
  cols = 1:4,
  names_to = "blockname",
  values_to = "stimulus"
) %>% mutate(blocknum = str_replace(blockname, "Block", "") %>% as.integer()))


(example_df <- left_join(
  example_base,
  blklook2
) %>% select(personid, stimulus))

# now we have an example_df to discover the block members from

(min_stim_per_person_from_data <- example_df %>% 
    group_by(personid) %>% 
    summarise(block_membership_detected = min(stimulus)))

# rejoin the detected groups back to the example data

(example_df_with_groups <- left_join(example_df,
                                     min_stim_per_person_from_data,
                                     ))

Hi, sorry for the delayed reply.

Thank you very much for providing a solution, however when we tried implementing it, it didn't seem to work the way we wanted, unfortunately.

I believe this was a similar solution to what was provided earlier. Instead, for each subject, we look at the minimum stimulus number for each of their blocks (blocks 1-8, minimum stimulus value should be 1, 2, 3 or 4) to determine their blockprocedure membership (min stimulus value = 1-4, blockprocedure corresponds to minimum stimulus value).

We tried implementing this an alternative way:

data$blockprocedure = NA #create empty column for blockprocedure
for(row in 1:nrow(data)){
  blockval <- data[row, "block"] #1, 2, 3, 4, 5, 6, 7, 8
  subjval <- data[row, "subject"] #up to 29
  minval <- min(subset(data, subjval == "subject" && blockval == "block")$stimulus)#1, 2, 3, 4
  data[row, "blockprocedure"] <- minval
}

## for each subject, look at the minimum stimulus number of each block (block 1-8) (min stimulus value should be either 1, 2, 3 or 4) and that min stimulus value will determine the blockprocedure (corresponds to min. stim value)
## therefore blocks 1-8 will be paired to either blockprocedure 1, 2, 3 or 4

However we receive an error:

Error: Assigned data `minval` must be compatible with existing data.
ℹ Error occurred for column `blockprocedure`.
x Lossy cast from `value` <double> to `x` <logical>.
Locations: 1.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In min(subset(data, subjval == "subject" && blockval == "block")$stimulus) :
  no non-missing arguments to min; returning Inf

I was wondering if anyone had any idea what went wrong?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Hi!

Thanks for the quick response! This output is exactly what we're looking for. I'm just a little confused how to implement this with my existing dataframe.

Small portion of what my dataframe looks:

Subject Stimulus
1 1
1 5
1 9
1 2
1 6
1 3
1 7
1 4
1 8
1 11
1 13
1 12
1 10
1 9
2 2
2 6
2 1
2 3
2 5
2 7
2 4
2 8
2 9
2 13
2 11
2 12
2 10

etc.

Thank you!

Ah, I think I should have clarified things better or maybe I'm just a little confused, my apologies! Every subject was shown the same list of stimuli, just all in random order. We just want to create a new column that includes the aforementioned "blocks" that the we've categorized the stimuli in. We want to create something like this:

Subject Stimulus Block
1 1 1
1 5 1
1 9 1
1 2 2
1 6 2
1 3 3
1 7 3 
1 4 4
1 8 4 
1 11 1
1 13 1 
1 12 2 
1 10 2
1 9 1
2 2 2
2 6 2
2 1 1
2 3 3
2 5 1
2 7 3
2 4 4
2 8 4
2 9 1
2 13 1
2 11  1
2 12 2
2 10 2

I hope that makes sense. Thank you for helping!

So every subject can have their block membership determined by findinging the minimum Stimulus value they have.
Use dplyr's group_by on subject and then summarise on min function

blklook1 <- data.frame(Block1 = c(1, 5, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 43, 47)
,Block2 = c(2, 6, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 44, 48)
,Block3 = c(3, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 45, 49)
,Block4 = c(4, 8, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 46, 50))

library(tidyverse)


blklook2 <- pivot_longer(blklook1,
                         cols = 1:4,
                         names_to = "block_group",
                         values_to = "Stimulus"
                        )

left_join( your_main_data, # put your data with Subject and Stimulus columns here
           blklook2)
2 Likes