excel file not R appropriate?

Using a pre existing dataset in excel and wanting to import to R. Don't think it is laid out (columns and rows?) in a way R likes. Any easy to understand manuals (for a complete R novice) for laying out data and saving correctly so R can read it?

R can handle data in a wide range of layouts but I would advocate a "tidy" arrangement as explained in this vignette:
https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
Even if you do not go the whole tidy data route, the most important requirements are, I think, that:

  1. The header row, if you have one, be a single row at the top of the data with every column having a header (no merged cells)
  2. The data form a continuous block with no empty rows intended to improve readability for humans and no merged cells.
  3. Every column should be of a single type, numeric or date or text.

R can read Excel files but you can also save data in the above format as a csv file and read it in with the read.csv() function. available in base R

1 Like

FJCC is right: the best way to store data is in a universal format, like CSV or JSON.

However, sometimes we don't get a choice. Many of us have to adapt our projects to the Excel workbooks coworkers share. We could prep them and save them in a nicer format, but that's boring and manual work. If we expect to do this over and over (e.g., for a monthly report), we should make computers do the boring work.

The readxl package is good for pulling data from Excel files, even poorly formatted ones. You'll probably need to clean up the results a little more in R, but that can be done in a script.

If you have cases which readxl cannot manage, then tidyxl can deal with more unstructured data layouts:
https://cran.r-project.org/web/packages/tidyxl/index.html

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.