All files are listed in directory but not all get read into data frame

The overall goal is to create a continuous data set from weekly data. I have used the following code and it recognizes that there are 14 files in the directory, but only reads 9 into a data frame. I know that the limit for data frames is much higher than what I have, so I don't understand why it would stop adding to the data frame at that point.

Link to files

library(tidyverse)
library(purrr)
library(ggplot2)
library(plotly)

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
FILES <- list.files("C:\\Users\\krist\\Documents\\TEST",pattern = "csv$",full.names = TRUE)
AllDat <- map_dfr(FILES, read.csv)
weeks <- length(FILES) + 3

There is additional code to produce plots etc but that works in the way that I expected so I assume it will be irrelevant.

I do not see a problem with your code. If I compare the number of rows in AllDat to the rows in the individual files, they match. How are you detecting that data are missing?

library(purrr)

FILES <- list.files("~/R/Play/FILES/WBEA-Raw",pattern = "csv$",full.names = TRUE)
AllDat <- map_dfr(FILES, read.csv)

#sum the number of rows in the files
tmp2 <- 0
for (Nm in FILES) {
  tmp <- read.csv(Nm)
  tmp2 <- tmp2 + nrow(tmp)
}

#compare the rows in AllDat to tmp2
nrow(AllDat)
#> [1] 34249
tmp2
#> [1] 34249

Created on 2023-02-06 with reprex v2.0.2

I'm looking at the tail of AllDat - if all of the files were showing up, the end of the data should be January 31 instead of December 13. The files are in numerical order in my directory (ie 1-14) so I don't see why it should be placing January data in the middle and December at the end.

The files are read in alphabetical order, so they are ordered as 1, 10, 11, 12, 13, 14, 2, 3, etc. You can sort the rows by Date_Time after you read in the data and convert Date_Time to a numeric value.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

FILES <- list.files("~/R/Play/FILES/WBEA-Raw",pattern = "csv$",full.names = TRUE)
AllDat <- map_dfr(FILES, read.csv)
tail(AllDat$Date_Time)
#> [1] "12/13/2022 5:35" "12/13/2022 5:40" "12/13/2022 5:45" "12/13/2022 5:50"
#> [5] "12/13/2022 5:55" "12/13/2022 6:00"
AllDat <- AllDat |> mutate(Date_Time = mdy_hm(Date_Time)) |> 
  arrange(Date_Time)
tail(AllDat$Date_Time)
#> [1] "2023-01-31 05:35:00 UTC" "2023-01-31 05:40:00 UTC"
#> [3] "2023-01-31 05:45:00 UTC" "2023-01-31 05:50:00 UTC"
#> [5] "2023-01-31 05:55:00 UTC" "2023-01-31 06:00:00 UTC"

Created on 2023-02-06 with reprex v2.0.2

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.