I'm trying to automate combining two csv files containing twitter data from the twitteR package. Historically I have been combining manually in Excel. I have imported both files into R then exported to csv to get formats the same, however I suspect there is something going on with the id field. The combined tibble has duplicates, which I fully expect due to overlap of dates, but I can only filter out a handful, most of them remain.
files <- dir(data_path, pattern = "*.csv") # get file names
u308df <- files %>%
# read in all files, appending the path before the filename. Source:
# https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R
map_df(~ read_csv(file.path(data_path, .)))
When I use read.csv as opposed to read_csv I get the below warning 15 times:
"In bind_rows_(x, .id) : Unequal factor levels: coercing to character"
I suspect it has something to do with handling the bigger historical data set in Excel and it is somehow different to the recently imported data.
If I export the combined tibble to csv and then in Excel, go to Data, Remove Duplicates, Excel has no issues finding the duplicates.