Table of contents

RStudio overview

R is a popular statistical analysis and machine-learning package that enables data management and includes tests, models, analyses, and graphics, and enables data management. RStudio, included in Watson Studio Local, provides an IDE for working with R.

An RStudio session created in Watson Studio Local includes 2 GB of storage and 5 GB of memory available for your use.

For information about how to set up and start using RStudio, see the blog post Using RStudio in IBM Data Science Experience, and Using RStudio on the RStudio Support site.

Related tasks:

Install a package

To connect to relational data sources from RStudio from a Watson Studio Local cluster without Internet access, the Watson Studio Local administrator must copy the required packages to your RStudio pods and then follow the installation steps specific to installing from a downloaded package. If the Watson Studio Local cluster has Internet access, complete the following steps:

  1. In the Tools shell, install the database driver in the /user-home/ directory. Example:
    pwd
    /user-home/1003/DSX_Projects/project-nb-test/rstudio 
    cd /user-home/1003/;wget
    https://jdbc.postgresql.org/download/postgresql-42.2.0.jar
  2. Configure Java on the pod:
    R CMD javareconf
  3. Return to the RStudio script and install the RJDBC module and dependencies:
    install.packages("RJDBC",dep=TRUE)

PostgreSQL example:

library(RJDBC)

driverClassName <- "org.postgresql.Driver"
driverPath <- "/user-home/1003/postgresql-42.2.0.jar"
url <- "jdbc:postgresql://9.876.543.21:27422/compose"
databaseUsername <- "admin"
databasePassword <- "ABCDEFGHIJKLMNOP"
databaseSchema <- "public"
databaseTable <- "cars"
drv <- JDBC(driverClassName, driverPath)
conn <- dbConnect(drv, url, databaseUsername, databasePassword)
#dbListTables(conn)
data <- dbReadTable(conn, databaseTable)
#data <- dbReadTable(conn,
paste(databaseSchema,'.',databaseTable, sep='')
data

Change the Spark version

Sparklyr library
To change the default Spark 2.0.2 service to a Spark 2.2.1 service, use the spark_connect() function.
To connect to the Spark 2.2.1 cluster:
sc <- spark_connect( master =
"spark://spark-master221-svc:7077",
spark_home="/usr/local/spark-2.2.1-bin-hadoop2.7" )`
To connect to the Spark 2.0.2 cluster:
sc <- spark_connect( master =
"spark://spark-master-svc:7077" )
SparkR library
To change the default Spark 2.0.2 service to a Spark 2.2.1 service, use the $SPARK_HOME environment variable to specify the Spark 2.2.1 installation location in RStudio:
Sys.setenv("SPARK_HOME"="/usr/local/spark-2.2.1-bin-hadoop2.7")
# import SparkR
library(SparkR, lib.loc =
"/usr/local/spark-2.2.1-bin-hadoop2.7/R/lib")
# initial sc
sc = sparkR.session(master="spark://spark-master221-svc:7077",
appName="dsxlRstudioSpark221")

See SparkR (R on Spark) for more information.

Transfer files to and from your user project folder

Using the File Explore in RStudio, a Watson Studio Local user can upload and download files between their project folder and a local disk outside of the cluster:

  • To download an RStudio file, select it, click more, and click Export to save the file to your local disk. To upload an RStudio file, click Upload and select the file to upload.
  • To download a Jupyter file, click ..., type ~/../jupyter, select the file, click more, and click Export to save the file to your local disk. To upload an Jupyter file, click Upload and select the file to upload.

Learn more

Read and write data to and from IBM Cloud object storage in RStudio