Working with action sequence data (persons with varying action numbers ==> no dataframe possible)

oopsicusmaximus · January 18, 2020, 6:26am

I'm a beginner in R coding and need to work with a relatively complicated data set.
I have for each person a list of a sequence of which actions were executed

ActSeqList[[1]]<-c("a","b","c")
ActSeqList[[2]]<-c("b","a","c","d")
ActSeqList[[3]]<-c("a","d","e","f","d")

and another list for each person with the corresponding reaction times
RTList[[1]]<-c(156,40,210)
RTList[[2]]<-c(41,320,27,560)
RTList[[3]]<-c(27,99,123,710,79)

Note that different persons have differing action numbers, which is why the data can't be put into a multidimensional object. Now, what I need to do is the following:
1- determine for each action the proportion of persons to have executed that action at least ones.
2- create a vector for each action that contains all the reaction times associated with that action.
3- (this step is probably easier once the others are done) compute the median reaction time for each action.

I would be very grateful for any tips or suggestions.
Thank you in advance,
Taym

technocrat · January 18, 2020, 6:45am

Hi, and welcome. Concrete questions with reproducible example, called a reprex attract more and better answers.

Here, RTList is opaque, and it blocks any further inquiry.

oopsicusmaximus · January 18, 2020, 7:09am

Thank you for your reply.

The thing is, what I posted is really everything I have. I don't have the actual data as my boss said it's protected and she can't send them to me and I have very little clue how I can do what I'm instructed to do. So what I need is just some general tips how I could approach the issue.

technocrat · January 18, 2020, 7:22am

That's often an issue. The answer is you don't need real data, you just needs data that is in the same form of what you're working with.

For example, starting with mtcars
or another standard dataset, you can provide the code to transform into the

oopsicusmaximus · January 18, 2020, 7:35am

I guess I'll have to look at the actual data first, then. Until then (Monday), do you think it would be fruitful to create a reprex only with the lists I originally posted without further code (I mean I don't know where to start so there is no code that can actually be reproduced :D)? Because according to my boss, this is how the actual data is supposed to look like.

technocrat · January 18, 2020, 7:38am

The hangup is actual data, which is irrelevant for purposes of diagnosing the problem. All that is needed is *data in the same format.

oopsicusmaximus · January 18, 2020, 1:37pm

But isn't executable code the main point of a reprex? I unfortunately have no such code for now. All I have is what I originally posted. Thanks still for your recommendations.

andresrcs · January 18, 2020, 2:35pm

Actually you can make a long data frame with the lists, that way you could perform all your tasks easily, see this example:

library(tidyverse)

ActSeqList <- list()
RTList <- list()

ActSeqList[[1]]<-c("a","b","c")
ActSeqList[[2]]<-c("b","a","c","d")
ActSeqList[[3]]<-c("a","d","e","f","d")

RTList[[1]]<-c(156,40,210)
RTList[[2]]<-c(41,320,27,560)
RTList[[3]]<-c(27,99,123,710,79)

# Turn lists into a long dataframe, I'm sure there is a better way but I can't 
# remember it right now
long_dataframe <- map_dfr(ActSeqList, enframe, value = "action",name = "row") %>% 
    bind_cols(map_dfr(RTList, enframe, value = "reaction_time", name = NULL)) %>% 
    mutate(row = if_else(row == 1, row_number(), NA_integer_)) %>% 
    fill(row, .direction = "down") %>% 
    group_by(row) %>% 
    mutate(id = group_indices()) %>% 
    ungroup() %>% 
    select(id, everything(), -row)

long_dataframe
#> # A tibble: 12 x 3
#>       id action reaction_time
#>    <int> <chr>          <dbl>
#>  1     1 a                156
#>  2     1 b                 40
#>  3     1 c                210
#>  4     2 b                 41
#>  5     2 a                320
#>  6     2 c                 27
#>  7     2 d                560
#>  8     3 a                 27
#>  9     3 d                 99
#> 10     3 e                123
#> 11     3 f                710
#> 12     3 d                 79

long_dataframe %>% 
    add_count(action, name = "proportion") %>% 
    mutate(proportion = proportion / max(id))
#> # A tibble: 12 x 4
#>       id action reaction_time proportion
#>    <int> <chr>          <dbl>      <dbl>
#>  1     1 a                156      1    
#>  2     1 b                 40      0.667
#>  3     1 c                210      0.667
#>  4     2 b                 41      0.667
#>  5     2 a                320      1    
#>  6     2 c                 27      0.667
#>  7     2 d                560      1    
#>  8     3 a                 27      1    
#>  9     3 d                 99      1    
#> 10     3 e                123      0.333
#> 11     3 f                710      0.333
#> 12     3 d                 79      1

long_dataframe %>% 
    group_by(action) %>% 
    # I know this is not a vector, it is just for exemplification purposes
    summarise(reaction_time = paste(reaction_time,  collapse = ","))
#> # A tibble: 6 x 2
#>   action reaction_time
#>   <chr>  <chr>        
#> 1 a      156,320,27   
#> 2 b      40,41        
#> 3 c      210,27       
#> 4 d      560,99,79    
#> 5 e      123          
#> 6 f      710

long_dataframe %>% 
    group_by(action) %>% 
    summarise(mean_rt = mean(reaction_time))
#> # A tibble: 6 x 2
#>   action mean_rt
#>   <chr>    <dbl>
#> 1 a        168. 
#> 2 b         40.5
#> 3 c        118. 
#> 4 d        246  
#> 5 e        123  
#> 6 f        710

^{Created on 2020-01-18 by the reprex package (v0.3.0.9000)}

oopsicusmaximus · January 19, 2020, 4:29pm

Awesome! A million thanks! Unfortunately, for the second step, this is not exactly what I need. Rather, I need a single value for each action giving me how the proportion of participants to have executed this action at least once (e.g. this value for d would be 2/3 because d was executed at least once by participants 2 and 3. the fact that it was executed twice by participant 3 is irrelevant). Is there a way to do this?

technocrat · January 19, 2020, 5:29pm

What I'd suggest is to mutate a new column with an ifelse() test for whether a user responded 1 or more times and work from there.

andresrcs · January 19, 2020, 6:05pm

Is this what you mean? BTW I was just giving similar examples, I thought you were capable of performing such calculations by your own if you had the data in a friendly format.

long_dataframe %>% 
    group_by(action) %>% 
    count(id) %>%
    ungroup() %>% 
    count(action, name = "prop") %>% 
    mutate(prop = prop / length(unique(long_dataframe$id)))
#> # A tibble: 6 x 2
#>   action  prop
#>   <chr>  <dbl>
#> 1 a      1    
#> 2 b      0.667
#> 3 c      0.667
#> 4 d      0.667
#> 5 e      0.333
#> 6 f      0.333

oopsicusmaximus · January 19, 2020, 8:23pm

Yes, I think that's it. Again, I can't thank you enough for taking the time!

I had though I was capable of doing it on my own too. Sadly the data frame in long format was not friendly enough for me as I am really still in the early phases of learning R and can't think of all the functions I need when dealing with specific problems (in this case it was mostly the function unique). I have learnt a lot from your help. Thank you!

technocrat · January 19, 2020, 8:49pm

Great. Please mark the solution for the benefit of those to follow.

system · January 26, 2020, 8:50pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.