All files are listed in directory but not all get read into data frame

Astoriana · February 6, 2023, 5:00pm

The overall goal is to create a continuous data set from weekly data. I have used the following code and it recognizes that there are 14 files in the directory, but only reads 9 into a data frame. I know that the limit for data frames is much higher than what I have, so I don't understand why it would stop adding to the data frame at that point.

Link to files

library(tidyverse)
library(purrr)
library(ggplot2)
library(plotly)

knitr::opts_chunk$set(warning = FALSE, message = FALSE)

FILES <- list.files("C:\\Users\\krist\\Documents\\TEST",pattern = "csv$",full.names = TRUE)
AllDat <- map_dfr(FILES, read.csv)
weeks <- length(FILES) + 3

There is additional code to produce plots etc but that works in the way that I expected so I assume it will be irrelevant.

FJCC · February 6, 2023, 5:32pm

I do not see a problem with your code. If I compare the number of rows in AllDat to the rows in the individual files, they match. How are you detecting that data are missing?

library(purrr)

FILES <- list.files("~/R/Play/FILES/WBEA-Raw",pattern = "csv$",full.names = TRUE)
AllDat <- map_dfr(FILES, read.csv)

#sum the number of rows in the files
tmp2 <- 0
for (Nm in FILES) {
  tmp <- read.csv(Nm)
  tmp2 <- tmp2 + nrow(tmp)
}

#compare the rows in AllDat to tmp2
nrow(AllDat)
#> [1] 34249
tmp2
#> [1] 34249

^{Created on 2023-02-06 with reprex v2.0.2}

Astoriana · February 6, 2023, 6:50pm

I'm looking at the tail of AllDat - if all of the files were showing up, the end of the data should be January 31 instead of December 13. The files are in numerical order in my directory (ie 1-14) so I don't see why it should be placing January data in the middle and December at the end.

FJCC · February 6, 2023, 7:14pm

The files are read in alphabetical order, so they are ordered as 1, 10, 11, 12, 13, 14, 2, 3, etc. You can sort the rows by Date_Time after you read in the data and convert Date_Time to a numeric value.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

FILES <- list.files("~/R/Play/FILES/WBEA-Raw",pattern = "csv$",full.names = TRUE)
AllDat <- map_dfr(FILES, read.csv)
tail(AllDat$Date_Time)
#> [1] "12/13/2022 5:35" "12/13/2022 5:40" "12/13/2022 5:45" "12/13/2022 5:50"
#> [5] "12/13/2022 5:55" "12/13/2022 6:00"
AllDat <- AllDat |> mutate(Date_Time = mdy_hm(Date_Time)) |> 
  arrange(Date_Time)
tail(AllDat$Date_Time)
#> [1] "2023-01-31 05:35:00 UTC" "2023-01-31 05:40:00 UTC"
#> [3] "2023-01-31 05:45:00 UTC" "2023-01-31 05:50:00 UTC"
#> [5] "2023-01-31 05:55:00 UTC" "2023-01-31 06:00:00 UTC"

^{Created on 2023-02-06 with reprex v2.0.2}

system · February 13, 2023, 7:15pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.