I think that is a great idea!
I am exploring using BigQuery, google's serverless database as a general data repository for a number of reasons:
- It has UI from which data can be stored or queried
- Very fast - joining two files in PubChem, 100 million chemical structures and 70 million names took less than 3 minutes without having to define an index
- Very cheap. There is no fee for the server it is hosted on, rather there is a small fee for storing data (10Gb free, $0.02 for each additional Gb - i.e. 1TB for $20 per month) and a fee for querying the data (1Tb free, $5 per additional TB)
- It has a rest API (and many clients) including R
- Metadata can be used to describe the dataset.
- All datasets can be referenced with unique URL
My initial code is available here which allows uploading but at the moment there is nothing available for searching/browsing.