Table of contents

Save and score a model in R

You can do the following tasks for a machine learning model in R:

Train and save an R model

In Watson Studio Local, you can run R code in RStudio, Jupyter notebooks, or in the background as a job.

Run R code in RStudio or Jupyter

RStudio on Watson Studio Local behaves exactly as it does in open source. You can run the R interpreter, and create and store R scripts. Creating an R Notebook in Jupyter where you can use the Insert to code feature to easily access data sets stored within Watson Studio Local for model training.

The caret library is installed and available for use for model training in both RStudio and Jupyter. The following example trains a linear regression model on the mtcars data set included in R:

linear_model <- train(mpg~., data=mtcars, method="lm")

Run R code in the background as a job

R models can take a long time to train, especially when training on large quantities of data sets. You can define an R script by using either RStudio or the built-in Watson Studio Local script editor that can load data, train models, and tune hyperparameters for each of those models. Because this process can take hours, you can define a job to run your code asynchronously. The following sample script trains a random forests model, which often takes long to train:


# Read the data
dataset <- fread(paste(Sys.getenv("DSX_PROJECT_DIR"),"/datasets/GoSalesNew.csv"), stringsAsFactors = TRUE)

# Create model with default paramters
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
tunegrid <- expand.grid(.mtry=c(1:15))
rf_gridsearch <- train(PRODUCT_LINE~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)

With this script that is saved in your project, you can create a job out of it. When creating the job, select the RStudio with R 3.3.2 as the worker and Script run as the job.

You can instruct the job to run on demand, or at a scheduled interval.

Save the model

After the training of the model is done, you can save it to Watson Studio Local. Included with RStudio and Jupyter is a library that is called modelAccess, where you can save caret, among other various R model types, into Watson Studio Local:

myModel <- train(...)
saveModel(model = myModel, name = "Model Name", test_data = dataset)

The parameter test_data is optional, and is used for calculating a baseline performance metric, along with saving a sample row of the data to prepopulate a test API.

Score and evaluate the R model

Score for the model

After you save the model, the details will display a default URL endpoint for calling the model from inside the cluster. You can also click Test next to your model to send a test API query to the endpoint.

Score or evaluate the model as a job

Like how you can define a custom R script to train the model, you can also run batch scoring or evaluation scripts that consume the model as jobs:

  1. From the model details page, click either Batch score or Evaluate.
  2. Complete the details that are appropriate for your model, and click the Generate script button. Watson Studio Local auto-populates a script edit window with a generic script that might be applicable to your model.
  3. Make any customizations that might be necessary, along with selecting a different worker in the Advanced settings if required. By default, your R model is associated with the RStudio with R 3.3.2 worker.
  4. Click Run Now to immediately run your script. Alternatively, you can save the script in the Advanced settings and schedule it to run at regular intervals.