Table of contents

Predictor importance

Some types of machine learning models, such as Random Trees, include methods for producing predictor importance measures. For others, such as regression models like linear and logistic regression, measures of predictor importance are not built into the algorithms. The IBM SPSS Spark Machine Learning Library features a separate PredictorImportance option that can be applied after fitting these models.

To use the PredictorImportance option, you would replace the code in the create model visualization step of the notebook with code such as the following (for the linear regression example, where the model is named linearRegressionModel and the data frame is named data):

val linearRegressionPMML = linearRegressionModel.toPMML()

import com.ibm.spss.ml.utils.PredictorImportance
val pi = PredictorImportance(linearRegressionPMML)
val piModel = pi.fit(data)
val piPMML =  piModel.toPMML()

import com.ibm.spss.scala.ModelViewer
val html =
ModelViewer.toHTML(pc,piPMML,Option(linearRegressionModel.statXML))
kernel.magics.html(html)

The three sections of code perform the following steps:

  1. The first line creates a PMML object linearRegressionPMML containing the PMML output from the linear regression model.
  2. The middle block of code imports the Predictor Importance function, applies it to the data frame data and the existing PMML file linearRegressionPMML and produces a new PMML object containing the predictor importance values in addition to the information from the linear regression.
  3. The last section calls the ModelViewer method, specifying use of the new PMML file and the original statXML file that was automatically produced in running the linear regression model.