R is a popular statistical analysis and machine-learning package that enables data management and includes tests, models, analyses, and graphics, and enables data management. RStudio, included in Watson Studio Local, provides an IDE for working with R.
An RStudio session created in Watson Studio Local includes 2 GB of storage and 5 GB of memory available for your use.
Tasks related to setting up and working with RStudio:
- Install a package to connect R Studio to relational data sources
- Change the Spark version
- Transfer files to and from your user project folder
Install a package to connect R Studio to relational data sources
The steps you follow to connect RStudio to relational data sources depends on whether the cluster has Internet access.
If the cluster has Internet access, the administrator must complete the following steps:
- In the Tools shell, install the database driver in the
pwd /user-home/1003/DSX_Projects/project-nb-test/rstudio cd /user-home/1003/;wget https://jdbc.postgresql.org/download/postgresql-42.2.0.jar
- Configure Java on the
R CMD javareconf
- Return to the RStudio script and install the RJDBC module and
library(RJDBC) driverClassName <- "org.postgresql.Driver" driverPath <- "/user-home/1003/postgresql-42.2.0.jar" url <- "jdbc:postgresql://9.876.543.21:27422/compose" databaseUsername <- "admin" databasePassword <- "ABCDEFGHIJKLMNOP" databaseSchema <- "public" databaseTable <- "cars" drv <- JDBC(driverClassName, driverPath) conn <- dbConnect(drv, url, databaseUsername, databasePassword) #dbListTables(conn) data <- dbReadTable(conn, databaseTable) #data <- dbReadTable(conn, paste(databaseSchema,'.',databaseTable, sep='') data
Change the Spark version
To change the version of Spark for RStudio, follow the set of steps appropriate for your library:
- Sparklyr library
- To change the default Spark 2.0.2 service to a Spark 2.2.1 service, use the
- To connect to the Spark 2.2.1
sc <- spark_connect( master = "spark://spark-master221-svc:7077", spark_home="/usr/local/spark-2.2.1-bin-hadoop2.7" )`
- To connect to the Spark 2.0.2
sc <- spark_connect( master = "spark://spark-master-svc:7077" )
- SparkR library
- To change the default Spark 2.0.2 service to a Spark 2.2.1 service, use the $SPARK_HOME
environment variable to specify the Spark 2.2.1 installation location in
Sys.setenv("SPARK_HOME"="/usr/local/spark-2.2.1-bin-hadoop2.7") # import SparkR library(SparkR, lib.loc = "/usr/local/spark-2.2.1-bin-hadoop2.7/R/lib") # initial sc sc = sparkR.session(master="spark://spark-master221-svc:7077", appName="dsxlRstudioSpark221")
See SparkR (R on Spark) for more information.
Transfer files to and from your user project folder
Using the File Explore in RStudio, a Watson Studio Local user can upload and download files between their project folder and a local disk outside of the cluster:
- RStudio files
- To download an RStudio file, select the file, click more, and click Export to save the file to your local disk.
- To upload an RStudio file, click Upload and select the file to upload.
- Jupyter files
- To download a Jupyter file, click ..., type ~/../jupyter, select the file, click more, and click Export to save the file to your local disk.
- To upload an Jupyter file, click Upload and select the file to upload.