Linear SVM/ Extracting the Top "Predictive" Ngram Features by Weight Assigned in the Linear SVM Fitting

I'm working on a binary text classification problem using the tagged packages of this post and it turns out the scrappy linear Support Vector Machine (SVM) is doing well.

As a next step I am hoping to extract the most "predictive" ngram features based on the positive and negative weights the linear SVM has assigned to them as per Chang & Lin (2008 link below) and then manually annotating a set of topic labels based on the ngrams that appear.

This would ideally be a query to extract the top X (e.g. 20) ngrams ordered by their SVM weights both positive and negative.

I'm currently working with Parsnip's General Interface for Polynomial SVMs and I understand that one can calculate the weights using Kernlab directly (but might not be a straightforward task... ):

Before digging deeper I was wondering if the community might be aware of any methods or classes I might use to perform the above task?

I've figured it out thanks to a few posts, very glad for Rebecca Barter's detailed tutorial regarding variable importance of another model class random forests (See ref 2 below).

Outlining the main steps here but please review the links at the end for detail for why it was done this way.

1. Get Your Final Model


# Assuming kernlab linear SVM

# Grid Search Parameters
tune_rs <- tune_grid(
  grid = param_grid,
  metrics = classification_measure,
  control = control_grid(save_pred = TRUE)

# Finalise workflow with the parameters for best accuracy
best_accuracy <- select_best(tune_rs, "accuracy")

svm_wf_final <- finalize_workflow(

# Fit on your final model on all available data at the end of experiment
final_model <- fit(svm_wf_final, data)
# fit takes a model spec and executes the model fit routine (Parsnip)
  # model_spec, formula and data to fit upon

2. Extract the KSVM Object, Pull Required Info, Calculate Variable Importance

ksvm_obj <- pull_workflow_fit(final_model)$fit
# Pull_workflow_fit returns the parsnip model fit object
# $fit returns the object produced by the fitting fn (which is what we need! and is dependent on the engine)

coefs <- ksvm_obj@coef[[1]]
# first bit of info we need are the coefficients from the linear fit

mat <- ksvm_obj@xmatrix[[1]]
# xmatrix that we need to matrix multiply against

var_impt <- coefs %*% mat
# var importance


  1. Extracting the Weights of Support Vectors using Caret:

  2. Variable Importance (Last Section of this post):

