Approaches for multiclass classification with a reference level

adomingues · January 17, 2022, 2:33pm

I have a dataset with with multiple classes (< 20) which I want to classify in reference to one of the classes. The final goal is to extract the variables of importance which are useful to distinguish each of the classes vs reference. If it helps to frame the question, an example would be to classify different cancer types vs a single healthy tissue and determine which features are important for the classification of each tumour.

My first naive approach is to subset the dataset and compare each non-reference class to the reference using any number of appropriate methods, starting with generalised linear model and / or random forest, determine model performance and extract VIPs for each comparison. Basically a loop.

However this feels inelegant, so I am wondering which other approaches should be considered for this problem.

Cheers.

phil_hummel · February 2, 2022, 4:10am

How about multinomial linear regression?

adomingues · February 2, 2022, 8:18pm

Thanks @phil_hummel. I did look at multinomial linear regression but in the docs for #tidymodels and there I could not find how one would set the reference level - aside from the usual base methods - and if it would be used at all. I will use bas for this one then.

Cheers.

phil_hummel · February 3, 2022, 12:13am

Multinomial logistic regression? Some packages allow specification of the reference class and I think I have used some that use the first factor level as the reference. Good luck

system · February 24, 2022, 12:13am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.