Table of contents

Import a custom JDBC data source

If a DSX user plans to write notebook or RStudio code that uses a relational database beyond the data source options already available in DSX Local, then the DSX administrator must first import a JDBC driver for it. Complete the following steps:

  1. Import a JDBC driver
  2. Create a custom data source and remote data set
  3. Load data in the notebook

Import a JDBC driver

Import a JDBC driver using the admin console

To import a JDBC driver using the Admin Console, click the menu icon (The menu icon) and click Scripts. In the Scripts pull-down menu, select JDBC driver jars in DSX Local (moveJarClasspath.sh), and click add jar. Drag and drop the .jar file or browse to it and click the Run button.

Run script

Create a custom data source and remote data set

Any collaborator with data source information and credentials can create a data source in the project. To add a data source to a project:

  1. From your project, go to the Data Sources page and click add data source.
  2. In the Create Data Source window, specify all of the required fields. Shows the Create Data Source panel
  3. In the Data Source Type pull-down menu, select Custom JDBC.
  4. Type in the JDBC driver class name, for example, org.postgresql.Driver.
  5. Type in the JDBC URL, for example, jdbc:postgresql://<host>:29922/compose.
  6. If you select the Shared check box, the data source and its credentials become accessible to all collaborators in the project. If you do not select Shared, then each collaborator must provide their own credentials to use the data source. Important: When a data source is shared, it cannot be made private again. Restriction: A remote data source cannot be shared for an external Git project.

You can add or edit remote data sets that connect directly to a schema and table name in the data source. Notebooks within the project can then refer to each remote data set by a name to retrieve its information. You can either define remote data sets when you create the data source, or add them later.

All collaborators in a project can use a shared data source in their notebook, or add their own private data source. Collaborators with viewer privileges cannot edit data sources.

Load data in the notebook

Version 1.2: To automatically load data into a data frame in a notebook, open the notebook and click the Find and Add Data icon (Shows the Find and Add Data icon) in the toolbar. Then, click the Remote tab. Find the data set that you want, and click Insert to code.

To manually write a connection to the relational database in notebook, see the following examples.

Scala Python R
Spark APIs, java.sql.{Connnection,DriverManager} Spark APIs, jaydebeapi Spark APIs

Scala syntax for V1.2

import com.ibm.analytics.dsxCoreUtils._
import scala.util.{Try, Success, Failure}
import java.io._
// Add asset from remote connection
val data = new DataUtil()
val retTryDf = data.getRemoteDataSet(sc, "posrgresql-rds")
val df1 = retTryDf.get
df1.show(5)

Python syntax for V1.2

import dsx_core_utils, requests, jaydebeapi, os, io, sys
from pyspark.sql import SparkSession
import pandas as pd
df1 = None
dataSet = dsx_core_utils.get_remote_data_set_info('posrgresql-rds')
dataSource = dsx_core_utils.get_data_source_info(dataSet['datasource'])
if (sys.version_info >= (3, 0)):
 conn = jaydebeapi.connect(dataSource['driver_class'], dataSource['URL'], [dataSource['user'], 
dataSource['password']])
else:
 conn = jaydebeapi.connect(dataSource['driver_class'], [dataSource['URL'], dataSource['user'], 
dataSource['password']])
query = 'select * from ' + (dataSet['schema'] + '.' if (len(dataSet['schema'].strip()) != 0) else '') + 
dataSet['table']
df1 = pd.read_sql(query, con=conn)
df1.head()

R syntax for v1.2

library(RJDBC)
library(dsxCoreUtilsR)
df1 <- NULL
# Add data asset from remote connection
rmdsi = getRemoteDataSetInfo("posrgresql-rds")
dsi= getDataSourceInfo(rmdsi$datasource)
if (nchar(trimws(rmdsi$schema)) != 0) {
  dbSchemaTable = noquote(paste(rmdsi$schema,rmdsi$table,sep="."))
} else {
  dbSchemaTable = rmdsi$table
}
drv <- JDBC(dsi$driver_class, "/user-home/_global_/dbdrivers/postgresql-42.1.1.jar")
conn <- dbConnect(drv, dsi$URL, dsi$user, dsi$password)
df1 <- dbReadTable(conn, dbSchemaTable)
df1