split into trials for multiple files

Hi,
I am a beginner of R and I have a basic problem.
I am using RStudio to read a series of .xls files. The organisation in columns is identical for all files. I am trying to split the file into trials, and I can't figure out how to do it for all trials in once.

Below my code

# 1. Read Data Viewer sample report - which must contain the 
# following variables in order:
# RECORDING_SESSION_LABEL,TRIAL_INDEX,TIMESTAMP,LEFT_GAZE_X,
# LEFT_GAZE_Y,RIGHT_GAZE_X,RIGHT_GAZE_Y,RESOLUTION_X,RESOLUTION_Y
# Split report into trials
#---------------------------
#code the .xls files to be read
pre1 <- "Sample_Report_V0100" 
pre2 <- str_pad(1:99, pad = 0,width = 2 , "left")
pre3 <- 1
suf <- ".xls" 
file.names <- paste(pre1, sep = "", paste(pre2,pre3,suf, sep ="")) 
rm(pre1, pre2, pre3, suf) 
file.names 

allfiles <- list.files (path = "C:/Users/Daniela Canu/Documents/R/MS_Toolbox_Daniela/MS_Toolbox_Daniela/data", pattern = "*.xls", full.names=TRUE)
allfiles
df.list <- lapply(allfiles, read.table)
df.list
all_data <- sum(complete.cases(file.names))
dt <- split(all_data,all_data[,2])

the error i get is the following Error in all_data[, 2] : incorrect number of dimensions

I see this error not existing if I read one file alldata <- read.table(all_data, na.strings=c("."), header = TRUE)

How to split all files into trials?
Thanks a lot!

Hi,

Welcome to the RStudio community!

I'm not 100% sure what you're trying to do, but here is some code that might help:

library(tidyverse)
library(readxl)

#Ensures data frames can be merged without factor warnings
options(stringsAsFactors = F)

#Get the list of all files (should all have the same columns!)
allfiles <- list.files (path = "C:/Users/Daniela Canu/Documents/R/MS_Toolbox_Daniela/MS_Toolbox_Daniela/data", pattern = "*.xls", full.names=TRUE)

#Load all the files and merge them together
df <- map_df(allfiles, function(myFile){
  myData = read_excel(myFile) #load file with excel function
  fileName = str_match(myFile, "([^\\/]+).xls")[2] #extract the file name from the path
  cbind(fileName, myData) #add the file name to the data as first column
})

head(df)
  • First all xls files are found in your folder (make sure no other xls files are in there)
  • Then all files are loaded with the read_excel function
    • Using the map_df will merge all files into one big final data frame
    • I use regex to extract the file name from the path, and use it as a first column so you'll know which data came from which file in the final result
  • The resulting data frame has all files merged, with the first column as label. You can now proceed filtering or further analysis

Hope this helps,
PJ

PS: If you want to make sure people on this forum can help you out, make sure to create a reprex in future. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

1 Like

Hi,
thanks a lot!
I probably did not express myself very well. Basically I need to detect microsaccades in a series of files, which are data from different participants.

I have the script to detect MS ...
lxdeg <- (lx - (SCREEN_WIDTH_PX/2)) / resx
lydeg <- (ly - (SCREEN_HEIGHT_PX/2)) / resy
rxdeg <- (rx - (SCREEN_WIDTH_PX/2)) / resx
rydeg <- (ry - (SCREEN_HEIGHT_PX/2)) / resy

#---------------------------------------------------

4. Detection of microsaccades

This is all E&K code - with a tweak in microsacc.R

to avoid PSOs. Adjust MINDUR and VFAC as desired

#---------------------------------------------------

xl <- cbind(lxdeg,lydeg)
xr <- cbind(rxdeg,rydeg)
msl <- microsacc(xl,VFAC,MINDUR,SAMPLING)
msr <- microsacc(xr,VFAC,MINDUR,SAMPLING)
sac <- binsacc(msl$table,msr$table)
sacpar_out <- sacpar(sac,SAMPLING=1000)
output[[i]] <- cbind(rep(i,nrow(sacpar_out)),sacpar(sac,SAMPLING=1000))
}

so I need to open all files and then, at the end, create a report for each file with the frequency of microsaccades per trial.

Does your script also work with the write.table function to create the report?
myData2 = read.table(myFile)

sac_file <- do.call(rbind, lapply(output, as.data.frame))
colnames <- c('trial','start','end','duration','delay','peakv','distance','orient1','amp','orient2')
write.table(sac_file, file = "sacc.txt", sep = "\t")

the write.table function applied to all files

Hi,

Although you have provided some more code, it's still not really a reprex I can work with. Why don't you create a dummy input / output dataframe using the tips in the reprex link and tell me what info you need to extract from it and we'll take it from there.

So provide me with some data frame code of a file as it is read, and the dataframe that is saved again after processing.

Good luck!
PJ

Hi.
I figured out another way to do that.

Now, I ran the script and created a sac file for the first participant. When changing the participant number from V0100011 to V0100021 i get the error

Error in if (msdx < 1e-10) { : missing value where TRUE/FALSE needed

at the end of the loop

output <- list()
index <- 1
for (i in 1:length(dt_part)){
dt <- split(dt_part[[i]],dt_part[[i]]$Trial_Index_) #get data by participant
edf <- toString(dt[[1]][[1]][[1]]) #grab edf name
for (j in 1:length(dt)){
trial_d <- dt[[j]] # This grabs the gaze / resolution data single trial (the first) You could put this in a loop or something
xl <- cbind(trial_d$lxdeg,trial_d$lydeg)
xr <- cbind(trial_d$rxdeg,trial_d$rydeg)
msl <- microsacc(xl,VFAC,MINDUR,SAMPLING)
msr <- microsacc(xr,VFAC,MINDUR,SAMPLING)
sac <- binsacc(msl$table,msr$table)
sacpar_out <- sacpar(sac,SAMPLING)
output[[index]] <- cbind(rep(edf,nrow(sacpar_out)),rep(j,nrow(sacpar_out)),sacpar_out)
index <- index + 1
}
}

what does the error mean?

the error considered in isolation is saying that msdx can take NA / missing values. which wont evaluate as being less than 1e-10 or not, so the TRUE FALSE dichotomy cant be tested for.

however, the if condition does not appear in any of the code you shared. so not sure how it would relate.

1 Like

yes, true, it is an info present in another function file

microsacc <- function(x,VFAC=5,MINDUR=3,SAMPLING=500) {

Compute velocity

v <- vecvel(x,SAMPLING=SAMPLING, TYPE=1)

Compute threshold

medx <- median(v[,1])
msdx <- sqrt( median((v[,1]-medx)^2) )
medy <- median(v[,2])
msdy <- sqrt( median((v[,2]-medy)^2) )
if (msdx<1e-10 ) {
msdx <- sqrt( mean(v[,1]^2) - (mean(v[,1]))^2 )
if ( msdx<1e-10 ) {
stop("msdx<realmin in microsacc.R")
}
}
if ( msdy<1e-10 ) {
msdy <- sqrt( mean(v[,2]^2) - (mean(v[,2]))^2 )
if ( msdy<1e-10 ) {
stop("msdy<realmin in microsacc.R")
}
}
radiusx <- VFACmsdx
radiusy <- VFAC
msdy
radius <- c(radiusx,radiusy)

you may need to go back to refining your program logic. For example, deciding how to handle missing values and such. I don't think we will be in much of a position to guide you in this...

Are you familiar with using debug() function to step through your code, and see the variable states from line to line ? it may be useful to you.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.