Organize assets in a project
A project is how you organize your assets to achieve a particular data analysis goal. Your project assets can include:
- RStudio files
- Data sets (local files and remote data sets)
Restriction: A project name, asset name, data source name, and remote data set name cannot contain any special characters.
You can also export or import a Data Science Experience project as a ZIP or TAR.GZ file.
DSX provides a sample project named dsx-samples, available to all users, with sample notebooks to help get you started. Although you can create new notebooks, models, scripts, and data sets, you cannot add jobs, collaborators, or SPSS Modeler flows. You also cannot use any local and remote data sets from any notebooks within dsx-samples.
Tasks you can perform:
- Create a project
- Manage collaborators
- Manage assets
- Add data sources
- Publish assets
- Export a project
- Rename a project
- Delete a project
- View all projects
- Create a script
- Set up runtime environments
- Run jobs in the background
Create a project
To create a project, go the the Projects list and click New Project.
For a new blank project, click the Blank tab.
To import a preexisting project from your local device, click the From File tab and upload the ZIP or TAR.GZ.
Restriction: Data source credentials are not imported. For any data sources with credentials, you will need to open the imported project and specify the credentials for the data source again.
To import your project from a GitHub, GitHub Enterprise, BitBucket, or BitBucket Server repository, click the From Git repo tab. To set up the connection to the repository, you must save a GitHub or BitBucket personal access token. See Import and commit projects on a Git repository for details on how to work with a Git repo.
Restriction: A project name cannot contain spaces or non-ASCII characters. If your DSX Local project connects to a Git repository, then use the Git repo guidelines for project names and notebook names.
If you select the Library Project check box, then the project will have no repository or collaborators (only the Admin of the library project can edit it), and will be shared across all users in the DSX system. A library project
is best for storing large common data sets, code packages, and scripts. The files are stored in the global path, for example,
/user-home/_global_/libraryProjects/Jdoe\ Library\ Project/datasets/cars.csv in the library project's
user home and
../../../../_global_/libraryProjects/Jdoe\ Library\ Project/datasets/cars.csv from a notebook. Tip: In the library project, you can click Path next to the file to display the exact path
|Project type||Collboration privileges||Master repository||Repository copy|
|Standard||Managed in DSX||Master repository exists in the DSX cluster file system||Each collaborator gets a copy|
|GitHub||Managed outside of DSX||Master repository exists in GitHub||Each user gets a copy when the project is imported from GitHub|
|Library||No collaboration, anyone can view||No master repository||No repository copy|
Click Create. Your new project opens and you can start adding collaborators and assets to it.
If you have Admin permissions for a project, you can add collaborators, change collaborator permissions, or remove collaborators from that project on its Collaborators page.
The collaborator permissions are:
- Viewer: Can view the project and accept changes.
- Editor: Can control project assets, accept changes, and commit changes.
- Admin: Can control project assets, collaborators, and settings. Can accept and commit changes.
An Admin, Editor, or Viewer can pull changes from the master repository by clicking Accept changes next to a project, or reset the project to what is currently in the master repository by clicking Reset project next to it. An Admin or Editor can also commit changes to it by clicking Commit changes next to a project. A Viewer can add an asset but cannot commit changes.
You can click Leave next to a project to remove yourself from it. However, if you are the only collaborator with Admin permissions, you must give another collaborator Admin permissions before you can leave the project.
If you have Admin or Editor permissions on a project, you can add assets from its Assets page.
If you have Admin permissions on a project, you can delete an asset by clicking Delete next to it.
Add data sources
A data source provides data for your project. For example, a database table or data stream. A data source allows you to securely store information about your database and credentials. To add a data source, go to the Data Sources page in your project.
A project Editor or Admin can share a read-only copy of the asset either within the DSX Local community or with people outside of DSX (for example, a Jupyter notebook, HTML file, PDF, text file, or PNG graphic). To do so, go to the Assets page and click Publish next to the file. The publish action creates a read-only snapshot of the current version of the asset, copies it to a published content directory in the user-home file system (if the file already exists, then it is versioned), and automatically generates a URL (except for models) where the asset can be viewed.
The following assets can be published:
- "Static" content such as HTML files
- Jupyter notebook content
- "Local" data sets
- R Shiny web apps
You can set the following content visibility permissions for the published asset (except for models):
- All users with the URL (anyone outside of DSX can view it).
- Any authenticated user (only signed in DSX Local users can view it).
- Restricted to members in the selected project (only collaborators in the selected project can view it). You can only publish to projects that you have Admin access to, and you cannot publish an asset to a project that was imported from GitHub (because these are not DSX managed projects).
Security recommendation: When restricting access to sensitive published content, create or select a "target" project to publish the assets to. For example, your department can create a special private project for collecting sensitive assets, models, and reports from other projects. Then assign a specific subset of users as collaborators to that project. If you grant a user Admin access to the project, that user will be allowed to control access rights and to publish content into the project. Collaborators with Editor and Viewer access can only view assets in the project, and cannot publish to or from the project.
If you publish a Jupyter notebook, then the published copy is automatically converted to HTML. You can publish the notebook with the following options:
- You can either rerun the entire notebook (might take awhile) or publish it as-is.
- You can either include code cells in the published copy, or hide the code cells so that only the output appears.
If you publish an R Shiny app, then the URL displays it as an interactive UI where users can dynamically input their own variables to explore trends.
DSX Local then automatically generates a permalink URL to the published asset (except for models) that you can copy. Alternatively, DSX Local users can view the published asset in the Published Assets page. Note that the Published Assets page only shows assets that the signed in DSX Local user has permissions to. To unpublish a file, you can go to the Published Assets page and click Unpublish next to it.
Export a project
You can download a project as a ZIP or TAR.GZ file by clicking Export as next it. Note that the environments in the project do not get exported.
Restriction: When you import this project, data source credentials will not be imported. For any data sources with credentials, you will need to open the imported project and specify the credentials for the data source again.
Rename a project
If you have Admin permissions on a project, you can rename it by clicking Rename next to it. This renames the project for all of the collaborators, and automatically stops the Admin's runtimes active for that project. When the renaming completes, any access to notebooks or RStudio will automatically start up the runtimes inside the context of the new project. The Admin can also choose to manually start them in the Runtimes page. Because the containers are not stopped for the collaborators, each collaborator must stop the runtimes associated with the old project name in the All Runtimes page. Any subsequent access to notebooks and RStudio would automatically bring up the runtimes with the correct project name context, or the collaborator can go to the Project > Runtimes page to manually start runtimes. Also, collaborators should verify that assets like notebooks and scripts do not directly specify the project name, for example, in any of the paths (the paths should always be relative for portability).
Recommendation: Before renaming a project, the Admin should ensure that all hard-coded project names used in paths are corrected to relative paths, and that all collaborators are forewarned of the change so that they can terminate any of their runtimes prior to the rename.
Delete a project
If you have Admin permissions on a standard project, you can delete it by clicking Delete next to it. This deletes the project for all of the collaborators, and deletes all assets (and the storage directories) associated with the project. If necessary, a DSX admin can manually recover deleted projects from the DSX system's recycle bin directory.
If an Admin deletes a GitHub project, then only the DSX copy of the project will be deleted (not the remote repository on GitHub).
Recommendation: Before deleting a project, ensure the project's collaborators have stopped all running containers, saved their work, and exported a copy of the project if they need a backup of it, and stopped all runtimes for that project. Otherwise, the project might continue to use resources that can only be freed up when the DSX administrator deletes the corresponding pods. You can also forewarn the DSX administrator about the impact of the deleted project on all collaborators.
View all projects
Click the Tree View icon () to view all projects in the system and expand their contents. You can click on any folder, Jupyter notebook, or CSV file to preview it.