Create loop to create new a column with groupings based on specific values from another column?

jubejube · May 28, 2020, 6:19am

For my experiment, we have a column called "stimulus" that has numerous values that range from 1-50. Subjects were shown 4 groups of stimuli:

Block 1 = 1, 5, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 43, 47
Block 2 = 2, 6, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 44, 48
Block 3 = 3, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 45, 49
Block 4 = 4, 8, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 46, 50

General patterning for each group:
from the initial number + 4 twice then + 2 for the next 15 numbers then + 4 twice

I was wondering if there was a loop function or other function of some sort that could create a new column grouping the stimulus into these 4 groups? I was initially creating a ifelse statement and individually putting in values, but it was getting too long, messy and confusing. So I was hoping there was an easier and cleaner solution to achieve this?

Any help is appreciated, thanks!

DavoWW · May 28, 2020, 7:24am

Hi @jubejube,
R is vectorised so no loops required here.
If understand your description correctly, try this:

additions <- c(4,4,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,4,4)
start_vals <- list(1,2,3,4)

(block1 <- start_vals[[1]]+cumsum(additions))  # Try one
#>  [1]  5  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 43 47

# Do all 4
dat <- data.frame(sapply(start_vals, function(x) {x + cumsum(additions)}))
names(dat) <- c("Block1","Block2","Block3","Block4")

# Insert the starting values as the first row
dat <- rbind(start_vals, dat)
dat
#>    Block1 Block2 Block3 Block4
#> 1       1      2      3      4
#> 2       5      6      7      8
#> 3       9     10     11     12
#> 4      11     12     13     14
#> 5      13     14     15     16
#> 6      15     16     17     18
#> 7      17     18     19     20
#> 8      19     20     21     22
#> 9      21     22     23     24
#> 10     23     24     25     26
#> 11     25     26     27     28
#> 12     27     28     29     30
#> 13     29     30     31     32
#> 14     31     32     33     34
#> 15     33     34     35     36
#> 16     35     36     37     38
#> 17     37     38     39     40
#> 18     39     40     41     42
#> 19     43     44     45     46
#> 20     47     48     49     50

# Need a new grouping Column reflecting these values
library(tidyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
pivot_longer(dat, cols=1:4, names_to="Block", values_to="Stimulus") %>% 
  arrange(Block) -> long.df

head(long.df)
#> # A tibble: 6 x 2
#>   Block  Stimulus
#>   <chr>     <dbl>
#> 1 Block1        1
#> 2 Block1        5
#> 3 Block1        9
#> 4 Block1       11
#> 5 Block1       13
#> 6 Block1       15

^{Created on 2020-05-28 by the reprex package (v0.3.0)}

HTH

jubejube · May 28, 2020, 8:21am

Hi!

Thanks for the quick response! This output is exactly what we're looking for. I'm just a little confused how to implement this with my existing dataframe.

Small portion of what my dataframe looks:

Subject Stimulus
1 1
1 5
1 9
1 2
1 6
1 3
1 7
1 4
1 8
1 11
1 13
1 12
1 10
1 9
2 2
2 6
2 1
2 3
2 5
2 7
2 4
2 8
2 9
2 13
2 11
2 12
2 10

etc.

Thank you!

nirgrahamuk · May 28, 2020, 9:18am

So every subject can have their block membership determined by findinging the minimum Stimulus value they have.
Use dplyr's group_by on subject and then summarise on min function

jubejube · May 28, 2020, 9:29am

Ah, I think I should have clarified things better or maybe I'm just a little confused, my apologies! Every subject was shown the same list of stimuli, just all in random order. We just want to create a new column that includes the aforementioned "blocks" that the we've categorized the stimuli in. We want to create something like this:

Subject Stimulus Block
1 1 1
1 5 1
1 9 1
1 2 2
1 6 2
1 3 3
1 7 3 
1 4 4
1 8 4 
1 11 1
1 13 1 
1 12 2 
1 10 2
1 9 1
2 2 2
2 6 2
2 1 1
2 3 3
2 5 1
2 7 3
2 4 4
2 8 4
2 9 1
2 13 1
2 11  1
2 12 2
2 10 2

I hope that makes sense. Thank you for helping!

nirgrahamuk · May 28, 2020, 9:41am

blklook1 <- data.frame(Block1 = c(1, 5, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 43, 47)
,Block2 = c(2, 6, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 44, 48)
,Block3 = c(3, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 45, 49)
,Block4 = c(4, 8, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 46, 50))

library(tidyverse)


blklook2 <- pivot_longer(blklook1,
                         cols = 1:4,
                         names_to = "block_group",
                         values_to = "Stimulus"
                        )

left_join( your_main_data, # put your data with Subject and Stimulus columns here
           blklook2)

jubejube · May 28, 2020, 6:08pm

Hi! Sorry I'm running into a little trouble, is the format supposed to be like this?

left_join(data$stimulus, data$block_group, # put your data with Subject and Stimulus columns here
           blklook2)

I run into an error: no applicable method for 'left_join' applied to an object of class c("double, "numeric")

or

left_join(data, # put your data with Subject and Stimulus columns here
           blklook2)

This creates a tibble, correct? It doesn't seem to combine and update with my main dataframe.

Thank you again!

nirgrahamuk · May 28, 2020, 9:07pm

The second one.
You need to use consistent naming on the variable to join by . I.e. stimulant or Stimulant. Originally you shared Stimulant but seems it was really stimulant

jubejube · May 28, 2020, 9:25pm

Sorry I'm still running into some issues.

# blockpair code
blklook1 <- data.frame(Block1 = c(1, 5, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 43, 47)
                       ,Block2 = c(2, 6, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 44, 48)
                       ,Block3 = c(3, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 45, 49)
                       ,Block4 = c(4, 8, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 46, 50))

library(tidyverse)

blklook2 <- pivot_longer(blklook1,
                         cols = 1:4,
                         names_to = "blockpair",
                         values_to = "stimulus")

library(dplyr)

left_join(data, # put your data with Subject and Stimulus columns here
           blklook2)

In the main dataframe, I already had a column called "stimulus", but I didn't have one called "blockpair". It doesn't seem to update and create the new column "blockpair" in the main dataframe, but I see the correct output in the console though?

Sorry if I'm overlooking anything... this seems simple, but I can't seem to get my head around it!

-- edit... I forgot to mention that block 1 & 3 and block 2 & 4 share some of the same values, so it ends up duplicating a row with both column values 1 & 3
e.g.
a row with stimulus 15 is duplicated with both column value "block 1" & "block3", when it should only be "block1"
and then..
another row with stimulus 15 is duplicated with both column value "block 1" & "block3", when it should only be "block3"

nirgrahamuk · May 29, 2020, 11:08am

blklook1 <- data.frame(
  Block1 = c(1, 5, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 43, 47),
  Block2 = c(2, 6, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 44, 48),
  Block3 = c(3, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 45, 49),
  Block4 = c(4, 8, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 46, 50)
)


library(tidyverse)
# randomly create 20 people that belong to one of the 4 blocks
# this 'simulates' the true master data
set.seed(42) # for reproducible randomness

(example_base <- data.frame(
  personid = 1:20,
  blocknum = sample.int(4, size = 20, replace = TRUE)
))
# comtinue to simulate what the real data might look like
(blklook2 <- pivot_longer(blklook1,
  cols = 1:4,
  names_to = "blockname",
  values_to = "stimulus"
) %>% mutate(blocknum = str_replace(blockname, "Block", "") %>% as.integer()))


(example_df <- left_join(
  example_base,
  blklook2
) %>% select(personid, stimulus))

# now we have an example_df to discover the block members from

(min_stim_per_person_from_data <- example_df %>% 
    group_by(personid) %>% 
    summarise(block_membership_detected = min(stimulus)))

# rejoin the detected groups back to the example data

(example_df_with_groups <- left_join(example_df,
                                     min_stim_per_person_from_data,
                                     ))

jubejube · June 4, 2020, 8:44pm

Hi, sorry for the delayed reply.

Thank you very much for providing a solution, however when we tried implementing it, it didn't seem to work the way we wanted, unfortunately.

I believe this was a similar solution to what was provided earlier. Instead, for each subject, we look at the minimum stimulus number for each of their blocks (blocks 1-8, minimum stimulus value should be 1, 2, 3 or 4) to determine their blockprocedure membership (min stimulus value = 1-4, blockprocedure corresponds to minimum stimulus value).

We tried implementing this an alternative way:

data$blockprocedure = NA #create empty column for blockprocedure
for(row in 1:nrow(data)){
  blockval <- data[row, "block"] #1, 2, 3, 4, 5, 6, 7, 8
  subjval <- data[row, "subject"] #up to 29
  minval <- min(subset(data, subjval == "subject" && blockval == "block")$stimulus)#1, 2, 3, 4
  data[row, "blockprocedure"] <- minval
}

## for each subject, look at the minimum stimulus number of each block (block 1-8) (min stimulus value should be either 1, 2, 3 or 4) and that min stimulus value will determine the blockprocedure (corresponds to min. stim value)
## therefore blocks 1-8 will be paired to either blockprocedure 1, 2, 3 or 4

However we receive an error:

Error: Assigned data `minval` must be compatible with existing data.
ℹ Error occurred for column `blockprocedure`.
x Lossy cast from `value` <double> to `x` <logical>.
Locations: 1.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In min(subset(data, subjval == "subject" && blockval == "block")$stimulus) :
  no non-missing arguments to min; returning Inf

I was wondering if anyone had any idea what went wrong?

system · June 25, 2020, 8:54pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.