@Max mentioned a tidymodels extension of {recipes} for spectrometric, spectroscopic, or chromatographic data at the rstudio::conf keynote on tidymodels. Broadly, I refer to this as "characterization" data in that it is data characterizing a sample or product.
I'm hoping to kick start a discussion on how to build out this extension. I'm very open on thoughts about (a) is this useful, (b) what is a reasonable starting point, and (c) what data types are in scope?
(@jameshwade are you the person I spoke with after the keynote?)
For these types of data, the rows in the data set are not going to be independent. The independent experimental unit will be something like the sample of material and there will be many other types of variables.
I think that the important part of this project is to differentiate the different classes of variables.
Some terminology I just made up:
technical variables: associated with the type of raw data coming off of the instrument, such as the wavelength, time, etc.
sample based columns/identifiers: these are going to define the subset of data that should be processed. Examples might be patient, day, aliquot/subsample, etc.
experimental conditions: these might affect preprocessing or might just be lumped into the sample-based variables. They reflect assay conditions such as fractionation identifiers, (HPLC) column, reagents, etc.
I think that the most help we need is on identifying the technical variables for different types of assays.
This makes sense. I'll get started on a first attempt. It should be relatively straightforward for me to identify technical variables for most techniques. Splitting the data as you suggest could also help with storing and referencing data.
Thank you for the suggestions! I hope to have more to share soon.
I'll make a post here and/or on Twitter once I've made meaningful progress.
By the way, I'm very open to contributions from others, but I don't have anyone in mind who has both the interest and the time to work with me just quite yet.
Thanks for the help getting started, Max! I really appreciate the advice and encouragement.