Calculate autocorrelation for multiple participants at once and including the (ac) values as variable in new dataframe

Rose · September 24, 2020, 9:48am

Hi all,

I'm analyzing time-series data, one of the variables I want to study is the autocorrelation. In the process of getting the data ready for analyses, I'm experiencing some difficulties.

My goal: one single continuous value per participant representing the autocorrelation of the participant at hand.

I have +-200 participants in my dataset, with 20 completed (repeated) measurements ('ratings') each. I know I can use the ACF-function to calculate autocorrelation, but I was wondering whether there is a way (or function) to manipulate the data in such way I can calculate the autocorrelation for each of the participants without needing to make a separate dataframe for each participant. I.e. without having to do the following 200 times

df = long dataframe with all measurements (ratings) of the participants

PIDENT 1

d<-subset(df, pid == "a")
d<-acf(d$rating, plot = FALSE, lag.max=1)
d
Autocorrelations of series ‘d$rating’, by lag
0 1
1.000 0.495

ac<-data.frame(pid="a", ac=d[["acf"]][2, ,1])
ac
pid ac
1 a 0.495

PIDENT 2

d<-subset(df, pid == "b")
d<-acf(d$rating, plot = FALSE, lag.max=1)
d
Autocorrelations of series ‘d$rating’, by lag
0 1
1.000 -0.250

ac2<-data.frame(pid="b", ac=d[["acf"]][2, ,1])
d2
pid ac
1 b -0.250

ac<-rbind(ac, ac2)

ac
pid ac
1 a 0.495
2 b -0.250

Hope you can help me!

Kind regards,
Rose

AlexisW · September 25, 2020, 8:01pm

Indeed it can be simplified a lot!

Let's start with generating some testing data

set.seed(1)
df <- data.frame(pid = rep(letters[1:2], each=5),
                 rating = runif(10))
df
#    pid     rating
# 1    a 0.26550866
# 2    a 0.37212390
# 3    a 0.57285336
# 4    a 0.90820779
# 5    a 0.20168193
# 6    b 0.89838968
# 7    b 0.94467527
# 8    b 0.66079779
# 9    b 0.62911404
# 10   b 0.06178627

Now we can run your manual approach:

d<-subset(df, pid == "a")
d<-acf(d$rating, plot = FALSE, lag.max=1)
ac<-data.frame(pid="a", ac=d[["acf"]][2, ,1])
d<-subset(df, pid == "b")
d<-acf(d$rating, plot = FALSE, lag.max=1)
ac2<-data.frame(pid="b", ac=d[["acf"]][2, ,1])
ac<-rbind(ac, ac2)

ac
#   pid         ac
# 1   a -0.1840563
# 2   b  0.1849619

To simplify things, we can put the actual autocorrelation computation in a function, so we just need to call it with the data:

autocor <- function(x, ...){
  acf(x, plot=FALSE, lag.max=1)[["acf"]][2, ,1]
}

autocor(df$rating[1:5])
# [1] -0.1840563
autocor(df$rating[6:10])
# [1] 0.1849619

So, the problem reduces to applying this function to every set of rating corresponding to a given pid. The package dplyr has functions for this purpose:

library(tidyverse)
df %>%
  group_by(pid) %>%
  summarize(ac = autocor(rating))
# A tibble: 2 x 2
#   pid       ac
#   <chr>  <dbl>
# 1 a     -0.184
# 2 b      0.185

That's it!

Rose · September 28, 2020, 9:52am

Many thanks Alexis, this is absolutely great!

system · October 19, 2020, 9:52am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.