Table of contents

Save and score a model in a Python or Scala notebook

Save a model in Python

After a model is trained in a notebook, the user can save the model into Watson Studio Local by using the save function in the dsx_ml library. The functions arguments differ for spark models and other model types. Users are also allowed to provide JSON information they want to save for the model rather than provide the test dataset, and this works for both spark models and other model types. The following sample code snippet shows how to use the save function:

from dsx_ml.ml import save

save(name='kerasMulticlass',
     model=kerasm,
     x_test=pd.DataFrame(X_test),
     y_test=pd.DataFrame(y_test),
     algorithm_type='Classification',
     source='mynotebookname.ipynb',
     description='This is a sample description for a keras model')

save(name = 'sparkBinary',
     model = model_lr,
     test_data = testDataFrame,
     algorithm_type = 'Classification',
     source='mynotebookname2.ipynb',
     description='This is a sample description for a spark model')

save(name='scikitRegression', 
     model=model_reg, 
     algorithm_type = 'Regression', 
     features_json = [{"name": "Y", "sample": 2.3898030333, "type": "float"}, {"name": "X", "sample": 2.7189337835, "type":  "float"}], 
     labelColumn_json = [{"name": "Z", "type": "float"}],
     source='mynotebookname3.ipynb',
     description='This is a sample description for save model using provided JSON information')

Parameters:

For spark model:

  • name: The name of the model to be saved.
  • model: The model itself.
  • test_data: Sample dataframe to score the performance of the models. (optional when features_json & labelColumn_json provided)
  • features_json: User provided JSON information about the model's label column. (optional when test_data provided)
  • labelColumn_json: User provided JSON information about the model's label column. (optional when test_data provided)
  • algorithm_type: Current supported values are 'Classification' and 'Regression'.
  • source: Parameter that is used to store the location of where the model was created for future reference. (optional)
  • description: The description of the model to be saved. (optional)

For other model types:

  • name: The name of the model to be saved.
  • model: The model itself.
  • x_test: Some sample features to score the performance of the models. (optional when features_json & labelColumn_json provided)
  • y_test: Some sample labels to score the performance of the models. (optional when features_json & labelColumn_json provided)
  • features_json: User provided JSON information about the model's label column. (optional when test_data provided)
  • labelColumn_json: User provided JSON information about the model's label column. (optional when test_data provided)
  • algorithm_type: Current supported values are 'Classification' and 'Regression'.
  • source: Parameter that is used to store the location of where the model was created for future reference. (optional)
  • description: The description of the model to be saved. (optional)

After you save the model, a dictionary is returned storing two parameters: the path to the location of the stored model and the scoring endpoint.

{
  'path': 'path/to/the/model',
  'scoring_endpoint': 'https://dsxl-api.ibm-private-cloud.svc.cluster.local/v3/project/score/...'
}

Save a model in Scala

After a model is trained in a notebook, a user can save the model into Watson Studio Local by using the save function in the dsx_ml library. The following sample code snippet shows how to use the save function:

import com.ibm.analytics.ngp.dsxML._

val ml_client = ML()
val modelName = "Phone-Notebook-Model-01"
val saveResult = ml_client.save(model, trainingDF, testDF, performanceMetrics,
                                modelName, description, source,
                                algorithm_type, com.ibm.analytics.ngp.dsxML.MetaNames.LABEL_FIELD -> "label")

Parameters:

  • model: The model itself.
  • trainingDF: DataFrame.
  • testDF: DataFrame.
  • performanceMetrics: (JSValue) optional performance metrics object, put None if not available.
  • name: (String) The name of the model to be saved.
  • source: (String) is used to store the location of where the model was created for future reference.
  • algorithm_type: Current supported values are 'Classification' and 'Regression'.
  • description: (String) The description of the model to be saved. (optional)

After you save the model, a dictionary is returned storing two parameters: the path to the location of the stored model and the scoring endpoint.

{
  'path': 'path/to/the/model',
  'scoring_endpoint': 'https://dsxl-api.ibm-private-cloud.svc.cluster.local/v3/project/score/...'
}

Performance Metrics

Performance metrics are calculated when a model is saved to reflect the performance of this model by using the test data provided. Based on if the algorithm type is "Classification" or "Regression", different metrics are used for the evaluation of the performance.

In the "Classification" case, the metric "accuracy" is calculated showing the percentage of predictions that matches the label column in the test data set. While in "Regression" case, the metric used is "Explained Variance Score", which indicates the proportion of variance within test data set that is captured by the variance between prediction value and label value in test data set.

Versioning

To create a new version of the same model, you can update the model itself or modify the test data then run the same save function again. A new version of the model is created with the previous version number incremented by 1. In this new model version, the performance metrics are recalculated.

Score a model

Using the scoring endpoint that is returned after you save the model, online scoring can be run on the model to make prediction for test data input. A REST API call that includes test data payload (in JSON format) and a header that contains authorization is sent to the endpoint. The following sample code snippet calls the scoring endpoint:

import requests, json, os

json_payload =[{
    "X1": "x1_value",
    "X2": "x2_value"
  }]

# this scoring endpoint should be the same as the one returned above
scoring_endpoint = 'https://dsxl-api.ibm-private-cloud.svc.cluster.local/v3/project/score/...''

header_online = {'Content-Type': 'application/json', 'Authorization':os.environ['DSX_TOKEN']}
response_scoring = requests.post(scoring_endpoint, json=json_payload, headers=header_online)
response_scoring.content