Table of contents

Load and access data for DSX Local

Your notebook can load and access data from these databases and services using the following methods.

Tip: For local data sets that are small (for example, reference data) and only applicable to one project, store them as assets in the project. Local data sets stored as assets are versioned just like the other project files, and an individual copy is maintained for each of the collaborators. For local data sets that are large, common, or governed by a strict life cycle policy, store them in a library instead. Assets stored in a library are not versioned, and a single copy is maintained that can be accessed by all authorized users. Libraries can be configured to be readable by only a selected list of users.

When developing models and other analytics assets, you should run data preparation on demand and store a copy of that prepared data. As you create scripts for use in your project release, you should include any necessary data preparation step as part of each script. For example, when you are evaluating a model in deployment, you should prepare the data just before running the evaluation so that it is done against the most recent data available.

Sometimes the data provided to you might contain corrupt or inaccurate data, or might not be in a suitable structure for your use. You will need to prepare the data (clean and transform it) before you can use it for building models or performing other analytics. You can use three primary strategies:

  • Notebooks, R Studio, and the script editor can all be used to create Python or R code to prepare the data.
  • SPSS Modeller can include nodes to perform data preparation. Data Refinery can be used both to directly prepare the data and to generate an R script that can be used in a job.

You can do this interactively via a notebook or as a job running a script, notebook or SPSS Flow.