I'm new to Shiny and having some problematic behavior with my first app. Thank you in advance for your help.
I've created a simple app that allows me to call a complex scoring algorithm for a standardized assessment of child data. My colleague created a package that processes the raw data and generates various composite scores. https://github.com/marcus-waldman/credi
Knowing that many non-R users are going to want to use this, I wanted to create a simple Shiny app that allows users to upload their data and download the processed results. Conceptually it does the following:
User inputs a CSV with raw scores
subset CSV with only relevant variables so that it can be fed into the scoring package
generate processed scores with scoring package
merge the processed scores with the uploaded dataset
provide a download link for the processed files
I've partially accomplished this here: https://credi.shinyapps.io/credi/ It works with many of my files, but I'm trouble with some (but not all) CSVs that I upload. Oddly, it works perfectly in my R Studio IDE, but once deployed is when I start to have issues.
I understand that the trouble I am having is due to character encodings in some of the csvs that I am uploading (it breaks if the csv has accented characters (e.g. á, é, Ó)). Originally I had simply:
And it's pretty clear that this is being caused by CSVs with different types of encoding. If I have CSVs in Unicode, but my environment is set to UTF-8 (which I believe is the default on the Shiny Server), then I am going to have issues.
Are there any generalizable solutions that allow me to 1) detect the character encoding of a CSV and 2) set my CSV to read it in said encoding?
I am by no means an expert in dealing with text encoding issues (it's a thorny problem and I hope somebody more knowledgeable than me will chime in!), but as a starting point the answers to this Stack Overflow question identify all the tools I know about:
In the end, I don't think there's a completely bulletproof way to do it because the file may not contain enough features to be diagnostic between possible encodings. You may be forced to fall back to rejecting some files and asking the user to re-export their CSV with a specific encoding, which I know is distasteful because it's asking way too much for some user communities. FWIW, Excel may be the source of a lot of your troubles, in which case that link has advice on how to advise your users, at least.
Thank you for the very helpful reply! I hadn't seen that Stack Overflow thread, so thank you for pointing it out. I think you're right that we may have to reject certain types of files. I'll post back here when I figure what works for my particular use case.
I wanted to give a brief update to this post. As @jcblum indicated, character encoding is a surprisingly tricky issue and there doesn't seem to be a foolproof generalizable solution. This, combined with the fact that the vast majority of my anticipated users will be importing data from Excel (which is often the culprit), led me to change my strategy for accepting files. Rather than creating a df from a csv using read_csv, I opted to use read_excel from the readxl package. This seems to be working!