Mutiple .txt list to data frame in r

sete.gonz · February 12, 2019, 3:36pm

I have a folder with 44 .txt files (.../data). I want to collect the data contained in all those 44 files in just one data frame. This is what I have done:

library(tidyverse)
list_of_files <- list.files(path = "/Users/setegonz/MEGAsync/ProjetoUFABC-master/data", recursive = TRUE, pattern = "\\.txt$", full.names = TRUE)
df <- list_of_files %>%
  set_names(.) %>%
  map_df(., read_table, .id = "FileName")

This is my output:

I would like to achieve two things:

Two separate all the variables that I have grouped in the second column, into individual columns.
Two shorten the identification name to somthing like "sub_01", "sub_2", and so on.

I would like to achieve this by using a tidiverse approach.

Thanks!

andresrcs · February 12, 2019, 3:48pm

Could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help, you can also include some example data using the datapasta package

sete.gonz · February 12, 2019, 6:35pm

I'm getting problems trying to use this format. I think that my list its not supported.

andresrcs · February 12, 2019, 6:38pm

What error message do you get when you do datapasta::df_paste(head(df))?

Also, have you tried specifying the separator character?

map_df(., read_table, sep = ",", .id = "FileName")

sete.gonz · February 12, 2019, 7:06pm

After using:

map_df(., read_table, sep = ",", .id = "FileName")

The output chaged and I manage to use the datapasta way...So here it goes:

mydata <- tibble::tribble(
  ~FileName,                                                                                                                                                                                               ~X.,
  "/Users/setegonz/MEGAsync/ProjetoUFABC-master/data/sub_01.txt", "Trial,nBack,Valence,Image,imagePresentationTime,NoisePresentationTime,x,StimDuration,NoiseDuration,RT,Accuracy,experiment,Clock_1,Clock_2,dataOneMinPress_1,dataOneMinPress_2,dataOneMinPress_3",
  "/Users/setegonz/MEGAsync/ProjetoUFABC-master/data/sub_01.txt",                                                                             "1,1,0,4530.jpg,0.0141170582301129,0.684714303524743,2.03743156177552,0.67059724529463,1.35271725825078,NaN,1,0,,,,,",
  "/Users/setegonz/MEGAsync/ProjetoUFABC-master/data/sub_01.txt",                                                                     "2,1,0,4530.jpg,2.0492009591087,2.6609536327162,5.26096588873156,0.611752673607498,2.60001225601536,1.3177634643348,1,0,,,,,",
  "/Users/setegonz/MEGAsync/ProjetoUFABC-master/data/sub_01.txt",                                                                               "3,1,0,4000.jpg,5.27273649355379,5.88450426077657,8.88452026000869,0.611767767222773,3.00001599923212,NaN,1,0,,,,,",
  "/Users/setegonz/MEGAsync/ProjetoUFABC-master/data/sub_01.txt",                                                                                 "4,1,0,6314.jpg,8.89628452551256,9.50805833018126,12.50807915937,0.611773804668701,3.00002082918877,NaN,1,0,,,,,"
)

As you can see all my variables are togheter in the same column. Then I tried to separate with:

mydataseparation <- separate(mydata, c(2), ",")

But separate() doesn't divide my variables

andresrcs · February 12, 2019, 7:44pm

separate works but it is not an ideal solution, you should get separate variables from read_table()

library(dplyr)
library(tidyr)
mydata %>% 
    separate(X., into = c('Trial','nBack','Valence','Image','imagePresentationTime','NoisePresentationTime','x','StimDuration','NoiseDuration','RT','Accuracy','experiment','Clock_1','Clock_2','dataOneMinPress_1','dataOneMinPress_2','dataOneMinPress_3'),
             sep = ",") %>% 
    head()
#> # A tibble: 5 x 18
#>   FileName Trial nBack Valence Image imagePresentati~ NoisePresentati~
#>   <chr>    <chr> <chr> <chr>   <chr> <chr>            <chr>           
#> 1 /Users/~ Trial nBack Valence Image imagePresentati~ NoisePresentati~
#> 2 /Users/~ 1     1     0       4530~ 0.0141170582301~ 0.6847143035247~
#> 3 /Users/~ 2     1     0       4530~ 2.0492009591087  2.6609536327162 
#> 4 /Users/~ 3     1     0       4000~ 5.27273649355379 5.88450426077657
#> 5 /Users/~ 4     1     0       6314~ 8.89628452551256 9.50805833018126
#> # ... with 11 more variables: x <chr>, StimDuration <chr>,
#> #   NoiseDuration <chr>, RT <chr>, Accuracy <chr>, experiment <chr>,
#> #   Clock_1 <chr>, Clock_2 <chr>, dataOneMinPress_1 <chr>,
#> #   dataOneMinPress_2 <chr>, dataOneMinPress_3 <chr>

^{Created on 2019-02-12 by the reprex package (v0.2.1)}

andresrcs · February 12, 2019, 7:58pm

Try with read.csv to get separate variables from the beginning

df <- list_of_files %>%
    map_df(read.csv, sep = ",", .id = "FileName")

sete.gonz · February 13, 2019, 12:35am

Thanks!! read.csv works great!!!

system · February 20, 2019, 12:35am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.