A {recipes} extension for spectrometry, spectroscopy, chromatographic, or similar data

@Max mentioned a tidymodels extension of {recipes} for spectrometric, spectroscopic, or chromatographic data at the rstudio::conf keynote on tidymodels. Broadly, I refer to this as "characterization" data in that it is data characterizing a sample or product.

I'm hoping to kick start a discussion on how to build out this extension. I'm very open on thoughts about (a) is this useful, (b) what is a reasonable starting point, and (c) what data types are in scope?

(@jameshwade are you the person I spoke with after the keynote?)

For these types of data, the rows in the data set are not going to be independent. The independent experimental unit will be something like the sample of material and there will be many other types of variables.

I think that the important part of this project is to differentiate the different classes of variables.

Some terminology I just made up:

  • technical variables: associated with the type of raw data coming off of the instrument, such as the wavelength, time, etc.

  • sample based columns/identifiers: these are going to define the subset of data that should be processed. Examples might be patient, day, aliquot/subsample, etc.

  • experimental conditions: these might affect preprocessing or might just be lumped into the sample-based variables. They reflect assay conditions such as fractionation identifiers, (HPLC) column, reagents, etc.

I think that the most help we need is on identifying the technical variables for different types of assays.

Here's an example with Raman spectroscopy:

  • technical variables: intensity (the assay measurement) and wavelength.

  • sample variables: day.

  • experimental variables: reactor size.

Once we have an idea of the technical variables, the actual recipe parts are pretty straight-forward (as are the preprocessing methods).

1 Like

Yup, that was me after the keynote.

This makes sense. I'll get started on a first attempt. It should be relatively straightforward for me to identify technical variables for most techniques. Splitting the data as you suggest could also help with storing and referencing data.

Thank you for the suggestions! I hope to have more to share soon.

Great!

If you want to use that example, the data are here:

1 Like

I've got a ways to go, but if you want to follow along, you can do so here: A Recipes-style Interface to Tidymodels for Analytical Measurements • measure. I'll try to keep my plans updated on the issues/project plan on the GitHub page: GitHub - JamesHWade/measure: The goal of measure is to be a recipes-like interface to tidymodels for analytical characterization data..

I'll make a post here and/or on Twitter once I've made meaningful progress.

By the way, I'm very open to contributions from others, but I don't have anyone in mind who has both the interest and the time to work with me just quite yet.

Thanks for the help getting started, Max! I really appreciate the advice and encouragement.