Hello,
I'm currently writing a package for a city's open data store API (CKAN). Some of the data returns are a bit much. For example...
- Many columns with variables that serve little or no purpose - e.g., there's a date-time variable but then columns for "Year", "Month", "Day", "Hour" etc.
-
Less than ideal data formatting - A column labeled "UCR" that has character entries written as "
UCR Level One
", "UCR Level Two
", etc. and I would just rather factor code them as1
,2
,3
. - Poorly formatted variable names - e.g., very verbose, all in capitals, etc.
Does anybody have any advice on how opinionated I should get in designing the API returns? Changing column/variable names or removing entire columns seems risky because it will introduce conflict with the data store's own codebooks, but I do think it will increase user satisfaction.
I was thinking of having some sort of option such as pretty = TRUE
that gives the user the option to return the raw data in a cleaner, albeit opinionated, form.
I do plan on releasing the package on CRAN at some point.