Hi,
I am combining multiple csvs and have an issue whereby some of the older csvs have some different headers from those named in colTypes. I would like to simply gracefully output an empty dataframe/tibble, with the expected headers, when the csv read in doesn't exactly match all headers in my specification.
In the reprex below, ok has the desired headers, not_ok does not and should instead be handled such that it returns an empty dataframe/tibble with the expected headers.
The idea being that the flow of csv combining doesn't fall over with these older files but carries on until all files are handled.
Please can someone advise on how I might go about this? Happy to clarify anything.
I imagine I perhaps need to write a wrapper function that tests for a specific warning and handles accordingly? I will end up with a lot of nested function calls though.
I can' t see any arguments within read_csv itself that would help with this scenario.
Here is the reprex:
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.3
colTypes = cols(
REPORTING_PERIOD_START = col_character(),
REPORTING_PERIOD_END = col_character(),
STATUS = col_character(),
BREAKDOWN = col_character(),
PRIMARY_LEVEL = col_character(),
PRIMARY_LEVEL_DESCRIPTION = col_character(),
SECONDARY_LEVEL = col_character(),
SECONDARY_LEVEL_DESCRIPTION = col_character(),
MEASURE_ID = col_character(),
MEASURE_NAME = col_character(),
MEASURE_VALUE = col_character()
)
ok <- 'https://files.digital.nhs.uk/18/C92DD8/MHSDS%20Data_AugPrf_2020.csv'
not_ok <- "https://files.digital.nhs.uk/97/0C4714/MHSDS%20Monthly_File_Dec_2018_Final.csv"
# works as headers all match...wish to keep as is
keep <- read_csv(ok, col_types = colTypes)
# would like empty df with expected names here instead of warning and actual file content
return_empty <- read_csv(not_ok, col_types = colTypes)
#> Warning: The following named parsers don't match the column names:
#> REPORTING_PERIOD_START, REPORTING_PERIOD_END, MEASURE_ID, MEASURE_NAME,
#> MEASURE_VALUE
Created on 2020-12-02 by the reprex package (v0.3.0)
In case it matters, the csvs are being combined as follows:
combined_csvs <- future_map_dfr(.x = files_df$links,
.f = ~ cleaned_csv2(
read_csv(get_url2(.x, '[title*="MHSDS Data File"],[title*="MHSDS Monthly Data File"]'), col_types = colTypes)
), seed=TRUE)
In this case, ok or not_ok would be being returned by:
get_url2(.x, '[title*="MHSDS Data File"],[title*="MHSDS Monthly Data File"]')
files_df$links is the input list of files and cleaned_csv2 does some column type conversions.