Table of contents

Writing a Custom Evaluation Script for a model

When you create an evaluation script, you need to save an evaluations.json file in the following location: evaluations_file_path = os.getenv("DSX_PROJECT_DIR") + '/models/' + str("{{model_name}}") + '/' + str("{{version}}") + '/evaluations.json'. This file contains all the evaluations results from your model. Replace {{model_name}} with the model's name and {{version}} with the model's version.

In the evaluations.json file you need to have the following:


[
    {
        "metrics": {
            "accuracyScore": 0.0000000000000000,
            "areaUnderROC" : 0.0000000000000000,
            "recallScore" : 0.0000000000000000,
            "precisionScore" : 0.0000000000000000,
            "explainedVarianceScore" : 0.0000000000000000,
            "r2Score" : 0.0000000000000000,
            "meanAbsoluteError" : 0.0000000000000000,
            "meanSquaredError" : 0.0000000000000000,
            "rootMeanSquaredError" : 0.0000000000000000,
            "areaUnderPR" : 0.0000000000000000,
            "weightedPrecisionScore" : 0.0000000000000000,
            "weightedRecallScore" : 0.0000000000000000,
            "f1Score": 0.0000000000000000,
            "threshold": {
                "metric": "accuracyScore",
                "mid_value": 0.7,
                "min_value": 0.3
            }
        },
        "modelName": "PipelineModel_NaiveBayes_Spark_2.1d",
        "modelVersion": "1",
        "performance": "fair",
        "startTime": 1515706899
    }
]

The evaluations.json file is an array of objects. The last entry in the array is the one that the Watson Studio Local user interface uses to display the evaluation metrics.

Not all the metrics that are listed in the previous evaluations.json is necessary or valid for your model. Choose from them whichever you need depending on your purposes and needs. All other properties are necessary. However, you should at least include the metric that is listed in metrics['threshold'], so at least one metric is needed.

The startTime property is calculated in python, like so: evaluation["startTime"] = int(time.time()).

Possible values for performance are "fair", "good" and "poor". Depending on your needs, you can assign this value accordingly.

Required Fields Optional Fields
metrics metrics['accuracyScore']
metrics['threshold'] metrics['areaUnderROC']
modelName metrics['recallScore']
modelVersion metrics['precisionScore']
startTime metrics['explainedVarianceScore']
performance metrics['r2Score']
metrics['meanAbsoluteError']
metrics['meanSquaredError']
metrics['rootMeanSquaredError']
metrics['areaUnderPR']
metrics['f1Score']
metrics['weightedPrecisionScore']
metrics['weightedRecallScore']