Guidelines about batch scoring scripts for models
A batch scoring script is a basic template that is generated for Watson Studio Local users.
The template does these tasks:
- Creates a spark/pandas dataframe from CSV or remote data set
- Loads the model from file system
- Scores by using the model
- Stores predictions into output CSV
Watson Studio Local users are encouraged to write more printout statements, so that it becomes easier to debug.
Required options for generating a batch scoring script:
- Input data set (CSV or remote data set)
- Output data set (CSV)
Optional Features (Go to Advanced Settings):
- Save it as .py script, .R script or .ipynb notebook (Watson Studio Local internally converts python script to notebook by using nbconvert and saves it in user's workspace)
- Job worker: Watson Studio Local supports four different types of job workers, anaconda with python 2.7, anaconda with python 3.5, python 3.5 with GPU, and RStudio with R 3.4.3
- Set environment variables specific to the job Add command line arguments.
- Schedule the job.
Writing custom scoring script: Watson Studio Local adds a few statements for internal tasks like loading model, prepare dataframe for input and output data sets, and basic prediction.
The Watson Studio Local user can do many tasks on the template:
- import any conda library
- preprocess of data sets
- make more advanced predictions
- write ETL processes
- edit and store predictions into any data source or print it in logs