Create a new column with only one value (per group) based on values from multiple columns?

Hi,

I was wondering how to create a new column with values that are dependent on values from another column? For my dataframe, each subject is shown two blocktypes (mouth block or nose block), just in random order. Based on the first blocktype they're shown, we would like to create a new column "blockorder" where all of the data values for each subject are either "mouthfirst" or "nosefirst".

A portion of the dataframe roughly looks like this:

subject | blocktype | trial_index
1 | nose | 6
1 | nose | 9
1 | nose | 12
1 | nose | 15
1 | nose | 18
1 | mouth | 21
1 | mouth | 24
1 | mouth | 27
1 | mouth | 30
2 | mouth | 6
2 | mouth | 9
2 | mouth | 12
2 | mouth | 15
2 | nose | 18
2 | nose | 21
2 | nose | 24
2 | nose | 27
2 | nose | 30

I want the dataframe to look like this:

subject | blocktype | trial_index | block order
1 | nose | 6 | nosefirst
1 | nose | 9 | nosefirst
1 | nose | 12 | nosefirst
1 | nose | 15 | nosefirst
1 | nose | 18 | nosefirst
1 | mouth | 21 | nosefirst
1 | mouth | 24 | nosefirst
1 | mouth | 27 | nosefirst
1 | mouth | 30 | nosefirst
2 | mouth | 6 | mouthfirst
2 | mouth | 9 | mouthfirst
2 | mouth | 12 | mouthfirst
2 | mouth | 15 | mouthfirst
2 | nose | 18 | mouthfirst
2 | nose | 21 | mouthfirst
2 | nose | 24 | mouthfirst
2 | nose | 27 | mouthfirst
2 | nose | 30 | mouthfirst

I tried using the ifelse statement to create a new column with these values. However, the new column would have values that correspond with the blocktype & trial_index and therefore give different values rather than the output the same value for each subject.
RATHER, I want to output that same value for all the rows for each subject (if a subject is shown trial_index = 6 and blocktype = nose, all values for blockorder for that particular subject = nosefirst)

I hope this makes sense, any help is appreciated!

I would first order the data by subject and trial index, group by subject, and then get the first blocktype for each group. See below:

library(tidyverse)

dat_orig <- tibble(
  subject=rep(c(1,2), each=10),
  blocktype=c(rep(c("nose", "mouth"), each=5), rep(c("mouth", "nose"), each=5)),
  trial_index=rep(1:10*3+3, 2)
)

dat_orig %>%
  arrange(subject, trial_index) %>%
  group_by(subject) %>%
  #get the first block type of each group and adds the word first at the end
  mutate(trial_index=str_c(blocktype[1], "first" )) 
#> # A tibble: 20 x 3
#> # Groups:   subject [2]
#>    subject blocktype trial_index
#>      <dbl> <chr>     <chr>      
#>  1       1 nose      nosefirst  
#>  2       1 nose      nosefirst  
#>  3       1 nose      nosefirst  
#>  4       1 nose      nosefirst  
#>  5       1 nose      nosefirst  
#>  6       1 mouth     nosefirst  
#>  7       1 mouth     nosefirst  
#>  8       1 mouth     nosefirst  
#>  9       1 mouth     nosefirst  
#> 10       1 mouth     nosefirst  
#> 11       2 mouth     mouthfirst 
#> 12       2 mouth     mouthfirst 
#> 13       2 mouth     mouthfirst 
#> 14       2 mouth     mouthfirst 
#> 15       2 mouth     mouthfirst 
#> 16       2 nose      mouthfirst 
#> 17       2 nose      mouthfirst 
#> 18       2 nose      mouthfirst 
#> 19       2 nose      mouthfirst 
#> 20       2 nose      mouthfirst

Created on 2020-05-27 by the reprex package (v0.3.0)

Oh, this works perfectly and it's way simpler than I thought it would be!

I'm still learning the basics of R, may I ask what does str_c mean in this case?

Thank you so much for all your help!

This is a string concatenation function which joins strings together. It comes from the stringr package. Any function that begins with str_ is often from this package. The package includes many more functions but this is certainly the one I use most.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.