Data format for Date/time

Hello,
I am working on a HUGE dataset, and at the same time trying to learn R. I am in the stages of organizing my dataset for input into R. A key variable is Date/Time. It is currently formated in excel as 06/14/2019 09:30:00 AM (all one column). Is it safe to leave it in this format to save as a csv to input into R or is there a better format I can use?
Note- it relates to the time a water quality variable has been collected in a tidal situation. I eventually will be attempting to analyze the water quality variables based on tidal height (in another data set).

Thanks

Lora

Hi, and welcome.

This looks like its formatted as a date in Excel, if memory serves. (I haven't used it for quite some time.)

R has good tools for converting strings into datetime objects in case the import tool treats the field as a string. I've seen that when the Excel file is saved as a csv file, for example. For HUGE datasets, packages such as readxls, which recognize the Excel field types may be too slow.

My rule of thumb is to keep source data as close to its original form as possible, do any clean-up and conversions in R and keep a log of the changes, and then save the converted file as an R rds object separately.

So, my recommendation is to leave the format as is, except checking to the extent feasible, for cases in which the data is inconsistently formatted.

A few other things:

  1. See the tidy Chapter 12 in Data Science with R for the importance of row-wise vs. column-wise organization.

  2. Decide on a key field (possibly time) to relate your quality and tidal data.

  3. R operates in-memory. That means that everything including packages and data has to fit within the available RAM and the associated OS u-limit (which is tunable). Depending on just how huge your data is, you may need more memory. You can also consider Amazon AWS EC2 instances which allow you to rent ridiculously large virtual machines.

  4. Once you have scrubbed your data, consider migrating it to an SQL database. R can read from data stored that way and allows you to do sampling of your data for exploratory purposes and machine learning.

  5. Don't wait until you've finished with your Excel sheets to start testing imports with small sets--even a screenful is enough. This will orient you to the import process.

Good luck, and come back for help as you progress.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.