(@jameshwade are you the person I spoke with after the keynote?)
For these types of data, the rows in the data set are not going to be independent. The independent experimental unit will be something like the sample of material and there will be many other types of variables.
I think that the important part of this project is to differentiate the different classes of variables.
Some terminology I just made up:
technical variables: associated with the type of raw data coming off of the instrument, such as the wavelength, time, etc.
sample based columns/identifiers: these are going to define the subset of data that should be processed. Examples might be patient, day, aliquot/subsample, etc.
experimental conditions: these might affect preprocessing or might just be lumped into the sample-based variables. They reflect assay conditions such as fractionation identifiers, (HPLC) column, reagents, etc.
I think that the most help we need is on identifying the technical variables for different types of assays.
Here's an example with Raman spectroscopy:
Once we have an idea of the technical variables, the actual recipe parts are pretty straight-forward (as are the preprocessing methods).