Looking for Ideas on Ingesting Spreadsheets and Checking Structure and Data Quality

I am curious on hearing other's work experiences on how you deal with a spreadsheet culture. I'm trying to help create uniformity in spreadsheets used as databases in my current environment as we plan to migrate certain data collection from spreadsheets to systems appropriate for the work. But that calls for some automated functionality to help transition without extra manpower. My biggest concern is how to ensure that the data imported is:

  1. in the proper format
  2. and that there aren't extra columns added

Now I do know how to do this the "brute force" sort of way by using R, creating structures that keep the data structure and format and check at import. But I wanted to see if there are alternative ways that anyone might have insight that I have yet to find.

Love to hear what you think!

Kara Woo and Karl Broman (both awesome R users) wrote an excellent paper, Data Organization in Spreadsheets, in The American Statistician (and were baller enough to make it open access) that gives really solid best practices:
https://amstat.tandfonline.com/doi/abs/10.1080/00031305.2017.1375989

Good enough practices in scientific computing is another great guide that covers the whole workflow (and does so quickly, at that):

3 Likes

Oh thank you @mara, there are some great nuggets in here!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.