Reassigning session IDs with different baselines to standardized session IDs

Hi!

In short, I have data where I need to reassign session numbers that are standardized across participants. Here is what I have to work with:

  1. Each participant has a different "session_id" number. These look like 200, 201, 203, etc.
  2. Each participant's session numbers start with a different and arbitrary baseline number because of the way the data was stored on our server (not my choice). Some numbers are as above with 200, 201, 203, while others are 1000, 1001, 1002. So the simplest solution would be aligning these so that they are standardized.

There must be a simple solution that I'm missing, probably some kind of simple function that subtracts the maximum value for session_id per participant from the minimum value and then adds 1. Can anyone help me with a loop here? A brute force, non-looped version would also be very helpful.

Please and thank you! I hope everyone is well and safe.

Are you looking to do something like this?

library(dplyr)

DF <- data.frame(Person = rep(c("A", "B", "C"), each = 4), 
                 Session = c(200, 201, 202, 203, 17, 18, 19, 20, 1001, 1002, 1003, 1004))
DF
#>    Person Session
#> 1       A     200
#> 2       A     201
#> 3       A     202
#> 4       A     203
#> 5       B      17
#> 6       B      18
#> 7       B      19
#> 8       B      20
#> 9       C    1001
#> 10      C    1002
#> 11      C    1003
#> 12      C    1004
MinSession <- DF %>% group_by(Person) %>% summarize(MinSess = min(Session))
DF <- inner_join(DF, MinSession, by = "Person")
DF
#>    Person Session MinSess
#> 1       A     200     200
#> 2       A     201     200
#> 3       A     202     200
#> 4       A     203     200
#> 5       B      17      17
#> 6       B      18      17
#> 7       B      19      17
#> 8       B      20      17
#> 9       C    1001    1001
#> 10      C    1002    1001
#> 11      C    1003    1001
#> 12      C    1004    1001
DF <- DF %>% mutate(AdjSession = Session - MinSess + 1)
DF
#>    Person Session MinSess AdjSession
#> 1       A     200     200          1
#> 2       A     201     200          2
#> 3       A     202     200          3
#> 4       A     203     200          4
#> 5       B      17      17          1
#> 6       B      18      17          2
#> 7       B      19      17          3
#> 8       B      20      17          4
#> 9       C    1001    1001          1
#> 10      C    1002    1001          2
#> 11      C    1003    1001          3
#> 12      C    1004    1001          4

Created on 2020-05-12 by the reprex package (v0.2.1)

1 Like

That's exactly what I want! For some reason, probably me missing something obvious, it's not working when I switch in some of the variable names in what you wrote to those matching my original code.

My data is first read in as a .csv using read.csv() and named Level. We are looking at performance at different levels of several behavioral tasks in a psychology experiment.

Then I go through the following:

####There are sixteen levels, so I'm filtering for rows that have one of these levels to remove practice trials and instruction screens. All of the participants' IDs start with "participant" and then have a number. We have several tester IDs so the next piece is removing the tester data.

Level1 <- Level %>% filter(level %in% c(1:16)) %>% filter(str_detect(username, "^p"))

####After this I go through and change the IDs into something more R-friendly.

NewID <- data.frame(ID = c(Level1$subject_id),
stringsAsFactors = FALSE)
summary(NewID)
range(NewID)
NewID$ID <- factor(NewID$ID)

Level2 <- cbind(NewID, Level1)

####Standardize Session ID. This is where I added in the code that you wrote, which looks like it should work perfectly. I changed some names to match what they are in my data.

MinSession <- Level2 %>% group_by(ID) %>% summarize(MinSess = min(session_id))

####R is returning only one observation of one variable. So I did not get to check the remaining code that you wrote, but I adjusted it for consistency.

DF <- inner_join(Level2, MinSession, by = "ID")
DF <- DF %>% mutate(AdjSession = session_id - MinSess + 1)

####Is it possible that something in my original data frame setup is responsible for this? Did I make some kind of obvious mistake changing the names?

I do not see anything obviously wrong. Can you post the output of

summary(Level2)

after running your code to the point just before

MinSession <- Level2 %>% group_by(ID) %>% summarize(MinSess = min(session_id))
1 Like

I filtered to just the ID and session_id variables since it's a big dataset with some gnarly-looking strings. I added this code:

Level3 <- Level2 %>% select(ID, session_id)
summary(Level3)

Output was:
ID session_id
17 : 8325 Min. : 302
3 : 7760 1st Qu.: 840
11 : 6600 Median :1315
23 : 6584 Mean :1210
13 : 6451 3rd Qu.:1557
22 : 6371 Max. :1756
(Other):35682

When I ran it initially the one observation of one variable was 302, so it looks like somehow I got the minimum for the variable in general and not for the individual. Thank you for your help with this!

You can achieve the same result with this other approach

library(dplyr)

DF <- data.frame(Person = rep(c("A", "B", "C"), each = 4), 
                 Session = c(200, 201, 202, 203, 17, 18, 19, 20, 1001, 1002, 1003, 1004))

DF %>%
    group_by(Person) %>%
    arrange(Person, Session) %>% 
    mutate(AdjSession = row_number())
#> # A tibble: 12 x 3
#> # Groups:   Person [3]
#>    Person Session AdjSession
#>    <chr>    <dbl>      <int>
#>  1 A          200          1
#>  2 A          201          2
#>  3 A          202          3
#>  4 A          203          4
#>  5 B           17          1
#>  6 B           18          2
#>  7 B           19          3
#>  8 B           20          4
#>  9 C         1001          1
#> 10 C         1002          2
#> 11 C         1003          3
#> 12 C         1004          4
1 Like

Thank you. I just ran that code as:

Level2 %>%
group_by(ID) %>%
arrange(ID, session_id) %>%
mutate(AdjSession = row_number())

And got this error:

"Error: row_number() should only be called in a data context"

I ran a backtrace and it returned:

  1. dplyr::group_by(., ID)
  2. plyr::arrange(., ID, session_id)
  3. plyr::mutate(., AdjSession = row_number())
  4. [ base::eval(...) ] with 1 more call
  5. dplyr::row_number()
  6. dplyr:::from_context("..group_size")
  7. %||%(...)

I ran checks on the object types:

typeof(Level2)
[1] "list"
typeof(Level2$ID)
[1] "integer"
typeof(Level2$session_id)
[1] "integer"

For some reason I cannot successfully change how R is treating my data in Level2, it keeps treating it as a list. I've tried as.data.frame, data.frame, as.tibble, as_tibble. I've done each of these and reran the code. Should I export the data again as a .csv and reimport it? Could a function be masked by something?

Thank you again for all of your help!

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

Thank you, I've never used REPREX before so hopefully, I did this correctly. Here's where things get weird. It works in the REPREX.

library(tidyverse)

DF <- data.frame(
  session_id = c(1264L,1264L,1264L,
                 1264L,1264L,1264L,1264L,1264L,1264L,1264L),
  ID = as.factor(c("3","3","3","3","3","3","3","3","3",
                   "3"))
)

DF %>%
    group_by(ID) %>%
    arrange(ID, session_id) %>% 
    mutate(AdjSession = row_number())
#> # A tibble: 10 x 3
#> # Groups:   ID [1]
#>    session_id ID    AdjSession
#>         <int> <fct>      <int>
#>  1       1264 3              1
#>  2       1264 3              2
#>  3       1264 3              3
#>  4       1264 3              4
#>  5       1264 3              5
#>  6       1264 3              6
#>  7       1264 3              7
#>  8       1264 3              8
#>  9       1264 3              9
#> 10       1264 3             10

Created on 2020-05-14 by the reprex package (v0.3.0)

And when I run it on my end, even using just the DF data frame created above, it returns:

"Error: row_number() should only be called in a data context"

Should I uninstall and reinstall R Studio? I noticed that datapasta returned what I see as "1264" in my data frame as "1264L" and am wondering if that might be doing something on my end but not in the REPREX? This is well outside of my R knowledge. Thank you for all of your help!

The "L" is only there to denote integer numbers, it is not related to your issue.

regex() runs your code in a clean R session so if you get the right result with it, most likely you need to restart your R session and start with a clean environment.

I restarted the session and then moved the ordering of the code up. It worked once I assigned the new session variable before creating a new ID variable. Thank you again!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.