Table of contents

Create a notebook in Watson Studio Local

To create a notebook in Watson Studio Local, set up a project, create the notebook file, and use the notebook interface to develop your notebook.

Requirement: You must have a project.

Create the notebook file

To create or get a notebook file to add to the project:

  1. From your project assets view, click the add notebook link.
  2. In the Create Notebook window, specify the method to use to create your notebook.
  3. You can create a blank notebook, upload a notebook file from your file system, or upload a notebook file from a URL. The notebook you create or select must be a .ipynb file.
  4. Specify the rest of the details for your notebook.
  5. Click Create Notebook.

Alternatively, you can copy a sample notebook from the community page. The sample notebooks are based on real-world scenarios and contain many useful examples of computations and visualizations that you can adapt to your analysis needs. To work with a copy of the sample notebook, click the Open Notebook icon (Open notebook icon) and specify your project and the Spark service for the notebook.

For information about the notebook interface, see parts of a notebook.

Create the SparkContext

A SparkContext setup is required to connect Jupyter notebooks to the Spark execution environment. See the sample notebooks for guidance on the setup.

By default, SparkContext is not set up for R notebooks. Watson Studio Local users can modify one of the following templates to create a SparkContext setup for their R notebooks:

sparklyr library
For Python 2.7:

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "spark://spark-master-svc:7077")
For Python 3.5:

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "spark://spark-master221-svc:7077")
SparkR library
For Python 2.7, use master="spark://spark-master-svc:7077". For Python 3.5, use master="spark://spark-master221-svc:7077".

library(SparkR)
sc <- sparkR.session(master="spark://spark-master-svc:7077",
           appName="notebook-R",
           enableHiveSupport=FALSE,
           sparkEnvir=list(
            spark.port.maxRetries="100",
            spark.dynamicAllocation.enabled="true",
            spark.shuffle.service.enabled="true",
            spark.dynamicAllocation.executorIdleTimeout="300",
            spark.executor.memory="4g",
            spark.cores.max="2",
            spark.dynamicAllocation.initialExecutors="1",
spark.driver.extraJavaOptions="-Djavax.net.ssl.trustStore=/user-home/_global_/security/customer-truststores/cacerts",
spark.executor.extraJavaOptions="-Djavax.net.ssl.trustStore=/user-home/_global_/security/customer-truststores/cacerts"
      )
)

Set Watson Studio Local Spark resources

Based on the user cases, you might need to change the resources allocated for the Spark application. The default settings of Watson Studio Local Spark are as follows:

Parameter Watson Studio Local Defaults Meaning
spark.cores.max 3 The maximum amount of CPU cores to request for the application from across the cluster (not from each machine).
spark.dynamicAllocation.initialExecutors 3 Initial number of executors to run.
spark.executor.cores 1 The number of cores to use on each executor.
spark.executor.memory 4g Amount of memory to use per executor process.

By default, Watson Studio Local uses three Spark workers on the compute nodes. If you add more compute nodes, one additional Spark worker will be started on each added compute node.

To change the resources for the Spark application in the notebook:

First, stop the pre-created sc and then create a new spark context with the proper resource configuration. Python example:


sc.stop()
from pyspark import SparkConf, SparkContext
conf = (SparkConf()
    .set("spark.cores.max", "15")
    .set("spark.dynamicAllocation.initialExecutors", "3")
    .set("spark.executor.cores", "5")
    .set("spark.executor.memory", "6g"))
sc=SparkContext(conf=conf)

Then you can verify the new settings by running the following command in a cell using the new sc:


for item in sorted(sc._conf.getAll()):
print(item)

Note that the resource settings also apply to running notebooks for scheduling jobs.

See Spark Configuration for more information.

Analyze data in the notebook

Now you're ready for the real work to begin!

Typically, you'll install any necessary libraries, load the data, and then start analyzing it. You and your collaborators can prepare the data, visualize data, make predictions, make prescriptive recommendations, and more.

Tip: The default auto-save interval in a Jupyter notebook is 20 seconds. If want to save the notebook immediately, click the save button. To change the autosave interval for an individual notebook, use the %autosave magic command in the cell, for example, %autosave 5.
Tip: When a notebook runs the %%javascript Jupyter.notebook.session.delete(); command to stop the kernel, note that the preceding cell might still appear to be running ( [*]) even though it has actually finished.