Table of contents

What's new in Data Science Experience Local

Check out what's new for DSX Local!

What's new in Version 1.2.0 (March 2018)

Installation

  • The DSX Local 9-node configuration now uses eight nodes: 3 control/storage nodes, 3 compute nodes, and 2 optional deployment nodes. Learn moreLearn more

  • The DSX Local 3-node configuration now uses five nodes: 3 master nodes and 2 optional deployment nodes. Learn moreLearn more

  • DSX Local no longer installs prerequisite RHEL packages. Instead, DSX Local provides an option to install CentOS packages (including docker) to fulfill the DSX Local prerequisites. If you do not mind DSX Local installing CentOS packages on the RHEL kernel (by appending the --centos-repo flag to the installation command), then the docker is automatically installed for you and no further action is required. If you only want RHEL packages on the RHEL kernel, you must enable a Red Hat repo for package installation, enable the extras repo so docker can be installed, and install and start a docker on every node. Learn moreLearn more

  • You can now run the installation package with the --extract-pre-install-scripts parameter to extract pre-installation scripts for DSX Local: docker_redhat_install.sh automatically installs a docker for RHEL and pre_install_check.sh verifies that your cluster meets the necessary software requirements. Learn moreLearn more

  • DSX Local now installs the Community Edition of the Decision Optimization engines. The Community Edition is limited to 1000 constraints and variables, and a search space of 1000 X 1000 for constraint programming problem. A full version is available as a separate purchase. Learn moreLearn more

  • The sub-directory to automatically install add-on TARs such as Flow and H2O Flow as part of the main DSX Local installation changed from wdp-addon to addon-packages.

General

  • In the new Model Management and Deployment interface, a DSX administrator can create a project release, deploy assets within it, and go live with the project release in a production environment. Model Management and Deployment replaces the Model Management page. Learn moreLearn more

    Project release assets tab

  • In the new All Active Environments page, you can view all of the current environments, workers, and services in the DSX Local cluster. The columns include the users who created them, how much CPU and memory is reserved (unless they use unmanaged resources), and whether they are currently running. The DSX administrator can also free up resource by stopping a running environment, worker, or service. Learn moreLearn more

    Resource management

  • You can now configure runtime environments or job workers to use unmanaged resources instead of reserved resources. For example, you can opt to use unmanaged resources on a large job for a temporary period of time. Caution: Turning the Reserve resources toggle off can affect system performance if available CPU and RAM on the servers become overcommitted.

  • DSX Local now installs Apache Spark version 2.2.1 in addition to Apache Spark version 2.0.2 on the compute nodes. You can pick which Spark version to use for your Jupyter notebook kernels, RStudio, and scripts. Learn moreLearn more

  • DSX Local now logs an audit record of user login attempts on the DSX Local cluster. In the Admin Console, go to Cluster log and in the ibm-nginx-container container, search for DSX_AUDIT_RECORD. You can also download the audit records by REST API. Learn moreLearn more

Projects

  • In the new Data Refiner, you can build a data flow that cleanses and shapes a local data set. To start, go to your project assets page and click Refine next to the local data set. Learn more Learn more

    Data Refiner

  • A project collaborator now create and run Python and R scripts in a new DSX text editor. To create the script, go to the project assets and click add script. You can also run the script on demand, view a log history, and customize settings such as auto-save. Learn moreLearn more

    Scripts editor

  • To test a script in a development environment with Python 2, Python 3, or R, click Test script as API next to it. DSX Local automatically generates curl commands to retrieve a bearer token and run the script. You can copy and paste the commands and replace the variables accordingly. Learn moreLearn more

    Test script

  • From the Git Actions icon (Git Actions) in the project action bar, you can now click Commit History for a history of Git commits and who performed them. You can also add multiple tags to each commit. Learn moreLearn more

    Commit history

  • When you click Push project, you can now select which files to commit and push. Learn moreLearn more

    Commit and push

  • In the From Git repo tab of the Create Project page, you can now import a clone of an external Git repository and de-select Manage project from the external Git repository so that DSX Local manages the new project. Learn moreLearn more

  • DSX Local projects can now be created and managed from a BitBucket Git repo. A BitBucket token is required. Learn moreLearn more

    Add token

  • You can now click Stop next to a job run to exit it immediately with a Failure status.

  • You can now create a job of type SPSS Stream run with an SPSS Modeler Worker.

  • After the DSX administrator imports a JDBC driver, users can now create a data source for it of type Custom JDBC from their project Data Sources page. Remote data sets from this custom data source can then be loaded into a notebook data frame by clicking the Find and Add Data icon (Shows the Find and Add Data icon), clicking Remote, and clicking Insert to code. Learn moreLearn more

  • The tree view (The tree view icon) across all projects in DSX Local now provides breadcrumbs for folders and options to publish or delete assets. You can also use your web browser's back button.

Administration

  • IBM Data Platform Manager is now called the Admin Console.

    Admin console

  • In the new Hadoop integration page in the Admin Console, a DSX administrator can register DSX Hadoop clusters, view their configurations, and create images for virtual runtime environments on them. DSX Local users can then select these images as workers to remotely submit Python jobs to them. Learn moreLearn more

    Hadoop integration

  • In the Admin Console, a DSX administrator can now run the following configuration scripts from the Scripts page instead of from the command line. Learn moreLearn more

    • Add JDBC driver jar to DSX Local (moveJarClasspath.sh)
    • Set the default Livy endpoint for DSX Local (add_endpoint.sh)
    • Perform key and certificate tasks (idpKeyMan.sh)
    • Replace an existing cert.key and cert.crt certificates (change_nginx_cert.sh)

Run script

Machine learning

  • You can now click Batch score or Evaluate next to models created in RStudio, and generate an R script that can be run with an R worker.

  • In the new Scripts tab on the model details page, you can associate scripts with that particular model.

  • Batch scoring WML models with CSV files is now supported.

What's new in Version 1.1.3 (January 2018)

Machine learning

  • Because models are now saved in the cluster file system, any notebooks earlier than DSX Local Version 1.1.3 must be manually changed accordingly. See migrate notebooks to DSX Local Version 1.1.3 and the revised sample notebooks for details.

  • Models can now be upgraded to Version 1.1.3 using the migrateModels.sh script. Learn moreLearn more

  • From either a notebook or the model builder, you can now experiment with a model by saving different versions of it. You can also compare the accuracies of each version you ran, and publish the best version. Learn more Learn more

    Accuracy history

  • From the project Assets page, you can now click Batch Score or Evaluate next to a model to automatically generate a Python script or notebook that can be scheduled to run as a job.

    Batch score

  • The Model Management dashboard now only lists models that have been published. Also, only published models can be deployed.

  • In addition to Spark ML, PMML with online scoring, and Custom models with batch scoring, DSX Local now supports the following model types: scikit-learn with pickle format, scikit-learn with joblib format, XGBoost, Keras TensorFlow, and WML. The new service dsxl-scripted-ml scores the Python models and Apache Spark models. Learn moreLearn more

  • DSX Local now displays an encrypted bearer token in the model deployment details that an application developer can use for evaluating models online with REST APIs. The token never expires, and is limited to the model it is associated with.

  • The Flows add-on is now called "SPSS Modeler Flows". This new SPSS Modeler provides an enhanced UI with advanced visualization. Learn moreLearn more

    SPSS Modeler

    Spreadsheet

    Data Audit

Projects

  • Runtimes are now called Environments.

    Runtime environments

  • Runtime environments can now be associated with Graphics processing units (GPUs). Learn moreLearn more

    Runtime environment

  • From the Assets page, a project Editor or Admin can now publish an asset (for example, a Jupyter notebook or R Shiny app), to share it either within the DSX Local community or outside of DSX Local. The publish action copies the asset, and automatically generates a URL (except for models) where the published copy can be viewed. Alternatively, DSX Local users with access to the published copy can view it in the new Published Assets page. Learn more Learn more

  • In the Assets page, a project collaborator can now preview an R Shiny app by clicking Preview next to it.

  • You can now create data sources and remote data sets to access IBM Big SQL, Cloudera Hive, Cloudera HDFS data, and a non-secure HDP cluster. Learn moreLearn more

  • Project administrators and editors can schedule a job to run a source file such as a model, Python notebook, or Python script in the background. To create the job, go to the new Jobs page and click create job. Learn moreLearn more

    Jobs

  • From the new Git Actions icon (Git Actions) in the project action bar, a collaborator can now click Pull project to accept changes and Push project to both commit and push the changes. Previously, the project list had Accept changes and Commit changes actions. Learn moreLearn more

    Shows the project icons

Administration

  • A DSX administrator can now import a custom image into DSX Local that contains a specific runtime (for example, Jupyter notebook) and a set of packages and libraries. All users can then conveniently select this image for their own usage. Learn moreLearn more

    Image Management

  • A DSX administrator can now monitor CPU core, RAM memory reservations, and GPUs in the compute nodes.

    Compute nodes

Installation

  • DSX Local now supports GPUs by NVIDIA in Azure, AWS and Softlayer. DSX Local also supports GPU on Red Hat Enterprise Linux x86. You must have the required GPU hardware present to successfully import libraries that are dependent on GPU. See the sample notebook "Use deep learning for image classification" for details on how to use GPU (note that the GPU code only displays on a cluster with GPU). Learn moreLearn more

    Shows the project icons

  • You can now automatically install add-ons such as Flow and H2O Flow as part of the main DSX Local installation. To do so, place the TAR packages in a wdp-addon sub-directory in the same directory as the installer files. Then start the DSX Local installation.

  • Instead of a proxy IP address for your HA cluster, you can now specify your own load balancer in the wdp.conf file. This option is only available in a configuration file install. Learn more Learn more

What's new in Version 1.1.2 (October 2017)

General

  • When a user signs in, the DSX Local client now displays sample notebooks, recently updated projects, and helpful external links. The new project type column indicates whether a project is Standard, GitHub, or Library. To browse all sample notebooks, go to the Community page.

  • A DSX administrator can now set up a default Livy endpoint URL for DSX Local using the default_endpoints.conf file. Learn moreLearn more

  • In the user settings, you can now select a personal avatar for your user profile photo.

  • You can now install DSX Local with a built in trial license that expires after 60 days. To unlock the enterprise version of your DSX Local cluster, purchase and apply an enterprise license. Learn moreLearn more

Hortonworks HDP support

  • Users can now submit jobs to Spark or Spark 2 through Livy using Knox on a secure HDP cluster. Learn moreLearn more

  • You can now create data sources and remote data sets to access HDFS data. In the Data Source Type field, select HDFS - HDP to specify the WebHDFS URL, namenode URL, and RPC port. To select specific data files for the remote data set, click Browse. You can also click Preview next to an HDFS data set to view the contents of CSV files. Learn moreLearn more

  • You can now create data sources and remote data sets to access Hive data. In the Data Source Type field, select Hive - HDP to specify the WebHCAT URL, WebHDFS URL and Livy URL. You can also browse the databases, select a specific table for the remote data set, and preview the schema of the table.

  • You can now transfer files between the HDP cluster and the DSX Local cluster using Python utility functions. Learn moreLearn more

Projects

  • Using a new tree view (The tree view icon) in the DSX Local client, you can view all projects in the system and expand their contents. You can click on any folder, Jupyter notebook, or CSV file to preview it. Learn moreLearn more

    Tree View

  • All users in DSX Local can now share files in a library project. A library project can contain common data sets, code packages, and scripts. A library project has no repository or collaborators. Only the Admin of the library project can edit it. Learn moreLearn more

  • Users can now go to Settings > Integrations and click add token to add multiple GitHub access tokens. Any of these tokens can be selected from the Create Project page. Learn more Learn more

  • DSX Local now displays a notification in the project overview window if a user can commit new changes or accept new changes for a project.

  • Runtimes can now reserve specific allocations of CPU cores and RAM memory. To view all runtime environments in the DSX Local system, select the Runtimes page from the main menu or click Manage across projects from the Runtimes list in the project overview. To edit runtime resource allocations click the github_commitproject button and click Edit settings.

    Runtime

  • A project administrator can now rename or delete their project. Deleted projects now go into a recycle bin directory on the DSX Local file system. A DSX Local administrator can manually recover it. Learn more Learn more

  • Users can now delete data files, models, runtimes, and RStudio files.

Machine learning

  • Users can now publish and score a DSX Local model on the Watson ML service by clicking Publish model next to the model. Learn moreLearn more

  • Users can now deploy a model using a Batch scoring type. For a batch deployment, DSX Local reads in a remote data set, scores the data offline, and outputs the predictions in a CSV file. Learn more Learn more

  • Users can now import and score third-party vendor models using the Custom Batch or Custom Online option. Learn moreLearn more

  • Models can now be trained on a remote Spark through Livy REST APIs.

Notebooks

  • DSX Local now supports an add-on for H2O Flow notebooks. You can also create new runtimes for H2O Flow. Learn moreLearn more

    H2O Flow in DSX Local

  • DSX Local now supports scikit-learn (for developing machine learning in a Python notebook) and XGBoost libraries.

  • DSX Local now provides a sample notebook for Apache Zeppelin.

What's new in Version 1.1.1 (September 2017)

General

  • DSX Local now supports Apache Zeppelin notebooks. You can switch between Jupyter, RStudio, and Zeppelin notebooks by expanding the new Tools menu item. Learn moreLearn more

  • In the newly enhanced Model Management dashboard, DSX Local users can now schedule and monitor evaluations, and import machine learning models. Also, machine learning models are no longer in Beta. Learn moreLearn more

    Model Management dashboard

  • You can now connect your DSX Local project to a GitHub or GitHub Enterprise repository. You can import projects from a GitHub repository, pull changes from it, and commit project changes to it. Learn more Learn more

  • DSX Local projects now show collaborators on the Overview tab. All assets, including RStudio and data sets, can now be managed from the Assets tab. New buttons are available to add notebooks, data sets, machine learning models, and collaborators easier. Learn moreLearn more

    Shows the project icons

  • You can now reserve CPU resources from the new Runtimes tab in your project. You can also stop a runtime environment to fix a notebook crash or unavailable spark context.

  • DSX Local projects now connect to relational databases using Data sources and Remote data sets instead of Connections. A data source allows you to securely store information about your database and credentials, and can be either shared in a project or private to a user. In each data source, you can add remote data sets for particular schemas and tables. To add a new data source, go to the new Data Sources tab. Learn moreLearn more

  • DSX Local notebooks can now retrieve data from relational databases using APIs from third party modules. Learn moreLearn more

  • You can now export or import a DSX Local project as a ZIP or TAR.GZ file. You can also drag and drop projects and notebooks when you create them from files.

  • DSX Local now provides a sample project named dsx-samples that contains sample notebooks to help get you started. Because this sample project is always available to all users, the collaborator feature is disabled.

  • Data assets no longer have size restrictions.

  • To view the version of your DSX Local installation, you can now click your profile icon and click About. You can also click Settings to change the photo for your profile icon.

Installation

  • DSX Local now stores objects in the file system instead of an object store. If you are upgrading DSX Local from an earlier release, you must migrate your data to the file system. Also, the upgrade scripts have changed from updateModule.sh and getUpdateStatus.sh to upgradeAll.sh and upgradePostCreate.sh. Learn moreLearn more

  • DSX Local now supports installation on Amazon Web Services (AWS) and Linux on z. Learn moreLearn more

  • Before you install DSX Local, you can now run a preInstallCheck script on each node of your cluster to verify that all hardware and software requirements are met. The script provides console output and a preInstallCheckResult file with Pass/Warning/Fail to each item that it checks for. For ERROR, you need to modify the specific item before you can install DSX Local. For Warning, you can still proceed but it is not recommended. If you pass all validations in this script, then DSX Local will likely install successfully.

  • You can now opt to specify a private SSH key in the admin-utils.sh and uninstall.sh scripts.

Administration

  • A DSX administrator can now configure alert thresholds, dashboard refresh, and log and metric rotations by clicking Settings from their user profile icon in IBM Data Platform. Learn more Learn more

  • DSX Local can now email alerts to administrators, and displays all alerts on a new Alerts panel that can filter by alert types, read/unread, and status.

  • The Cluster Log panel now displays a bar chart of the log number count at regular intervals.

  • A DSX administrator can now manually replace a storage node or compute node by command line. Learn moreLearn more

  • The script to import a JDBC driver has changed from importJarToClasspath.sh to moveJarClasspath.sh. Learn moreLearn more

Development

  • The script that imports the CA certificate to a Spark truststore has changed from /wdp/utils/importCertToTruststore.sh to /wdp/utils/idpKeyMan.sh. Learn moreLearn more

  • An application developer can now use REST APIs to create folders, copy a file or folder, move a file or folder, or delete a file or folder within a user home directory in the DSX Local cluster. Learn more Learn more

  • An application developer can now submit a Apache Kafka Spark streaming application that connects to Kafka broker over SSL. Learn moreLearn more

What's new in Version 1.1.0.02 (July 2017)

General

  • DSX Local can now connect to an existing remote Spark through Livy REST APIs. Learn moreLearn more

  • DSX Local now automatically creates an access token for each new project.

  • A DSX Local user can now create, train, deploy, search, view, monitor, and delete models for machine learning. Learn moreLearn more

Development

  • An application developer can now use REST APIs to submit a Spark application, including Apache Kafka Spark streaming application, to run within the DSX Local cluster. Learn moreLearn more

  • An application developer can now use REST APIs to transfer files to and from a user home directory in the DSX Local cluster. Learn moreLearn more

Installation

  • DSX Local now supports installation on POWER8. Learn moreLearn more

  • For increased security, you can now install DSX Local from the Electron app for Windows, macOS, or Linux. The Electron app communicates with the cluster by SSH, and does not require an open port or web browser. Learn moreLearn more

Administration

  • DSX Local can now email notifications to users. To enable, the DSX administrator must set up an SMTP service by clicking on Settings from their user profile icon in IBM Data Platform.

  • A DSX administrator can now opt to set up LDAP using a domain search rather than by specifying the exact distinguished name. Learn moreLearn more

  • A DSX administrator can now run utilities to automatically troubleshoot the cluster and collect logs for IBM support. Learn moreLearn more

What's new in Version 1.1.0.01 (May 2017)

General

  • Your connection to a relational database can now collect data using either an SQL table or an SQL query. Learn moreLearn more

    SQL Object Type pull-down menu

Installation

  • In the /etc/seliunx/config file, you must now enable SELinux to either SELINUX=enforcing or SELINUX=permissive, and then reboot the node prior to DSX Local installation. Learn moreLearn more

  • For storage node, the installer no longer prompts for raw disk. You now need to have another disk partitioned and mounted to a path. Learn moreLearn more

  • You must only install DSX Local in a non-root path (any path not mounted as /), and formated as xfs.

  • Both node configurations for DSX Local now install from one package. For a nine node installation, run the installation package without any parameters. For a three node installation, run the installation package with the --three-nodes parameter.

  • You now have the option to install DSX Local using a private SSH key file for the credentials (instead of a sudo username and password). Learn moreLearn more

  • The installer now prompts for a cluster overlay network to install and configure a subnet on (the network for the pods in Kubernetes to communicate to each other). If you accept the default subnet, then ensure it does not conflict with an existing subnet within your network. In wdp.conf, the new parameter is overlay_network=192.168.0.0/16. Learn moreLearn more

  • You can now generate a wdp.conf template by running the installation with parameter --get-conf-user (for a sudo user) or --get-conf-key (for an SSH key). Learn moreLearn more

  • If an individual installation step fails, you can now retry or skip it (but make sure you resolve the issue first). Learn moreLearn more

  • You can now uninstall DSX Local by command line. Learn moreLearn more

Administration

  • A DSX administrator can now manually import JDBC drivers for relational databases that are not listed under the Connections page. Learn moreLearn more

  • DSX administrators can now click on their user profile icon in IBM Data Platform to sign out or change their settings.

  • In the Pods panel, you can now view used CPU, used memory, and used disk.

  • The User Management panel now displays total users and total pending users.

  • The Set up LDAP panel now provides an External LDAP Prefix field, and you can now test your LDAP set up and LDAP users. Learn moreLearn more

  • The new Solution Builder option in the admin dashboard guides you on a step-by-step journey of your organization's data through IBM products.