Table of contents

Import, commit, and push projects on a Git repository

To support collaboration with stakeholders and the data science community at large, you can import and push project assets on a Git repository from Data Science Experience.

Some files need to be in a specific Git repository folder in order to display and behave appropriately in DSX.

If you pre-populate the repository, be sure to use the following folders in the repository for specific assets:

  • Jupyter notebooks (.ipynb files) > jupyter folder
  • Zeppelin notebooks (.json files) > zeppelin folder
  • RStudio assets > rstudio folder
  • Data sets > datasets folder
  • Models > models folder

Files that are not under one of these folders will be shown under Other Files on DSX.

Collaborators in the Git repository can import assets from shared repositories and contribute their own assets to the project. The Git integration feature provides a flexible source code management method to track, back up, and collaborate on a project with multiple users. You can import, push, and pull changes from Git repositories with DSX. All assets in the project will be on the Git repository when you push the changes.

Tasks to get integrated with the Git repository:

  1. Enable access to a Git repository from your DSX account
  2. Import a project from a Git repository
  3. Add DSX project assets to a Git repository
  4. Commit and push changes to a Git project
  5. Pull changes for a Git repository project

Other features of a Git repository integration:

Enable access to a Git repository from your DSX account

Before you can import and push assets on a Git repository, you must enable your DSX user account to access the Git repository. You enable access by creating a personal access token with the required access scope in the Git repository and linking the token to your DSX account.

To create a personal access token:

  1. Depending on which Git repository you want to integrate with, generate your personal access token from one of the URLs below:

  2. From the Git repository page that opens, select at least the "repo" scope to import, commit, and push changes to the repository.

    Select 'repo' scope. Copy the generated access token.

  3. Add your personal access token. Open your profile settings, select the Integrations tab, and click add token.

    Add token

  4. Select the Token for the platform you generated the token on.

  5. Paste the token you copied in Step 3 in the Access Token field in DSX.

  6. Give the new token a token name, and click Create.

Note that you can add more access tokens later in Profile > Settings under the Integrations tab, by clicking on add Token.

Import a project from a Git repository

After you save the access token, your project in DSX can be created from an existing Git repository. You can choose the linking repository by selecting the user access token of that repository when creating the project. Private repositories are supported.

To link a DSX project to an existing Git repository, you must have administrator permission to the project. All users must have permission to access the Git repository. Granting user permissions to repositories must be done in the Git repository.

To create a project from an existing Git repository:

  1. In your project list, click create project. Select the From Git repo tab.
  2. In the Token field, select the name of the access token that is linked to the Git repository that you want to import to your project.

    Select token

  3. Enter your Git repo URL in the Repository URL field. You can also change the name of your project, which by default is the repository name.

Now you can view the project and all its imported assets like any other project in DSX in your project list.

Add DSX project assets to a Git repository

You can add assets to a Git-linked project like you would add assets to a normal DSX project. Specific assets types are published under designated folders in the Git repository:

  • Jupyter notebooks (.ipynb files) > jupyter folder
  • Zeppelin notebooks (.json files) > zeppelin folder
  • RStudio assets > rstudio folder
  • Data sets > datasets folder
  • Models > models folder

If the designated folder does not exist in the repository, it will be created when the project is committed and pushed.

Commit and push changes to a Git repository project

You can publish assets on the Git repository only by committing and pushing the changes to the project it is linked to.

To commit and push changes to a project:

  1. Select the Git Actions icon (Git Actions) in the project action bar and click Push project.

  2. Select which files to commit and push.

    Commit and push

  3. Enter a message about committed changes and click Commit and push.

Tip: You can view the commit messages by navigating to /user-home/{uid}/DSX_Projects/{projectName} and running git log.

From the Git Actions icon (Git Actions) in the project action bar, you can click Commit History for a history of Git commits and who performed them. You can also add multiple tags to each commit.

Commit history

Pull changes for a Git repository project

You can update your assets on a DSX project by pulling changes from its linked Git repository.

Make sure your project is already backed up in some way, such as exporting it as a .zip file or committing it on Git before pulling from a repository. The process of accepting changes could potentially overwrite changes you recently made in DSX.

To pull changes from a repository:

  1. Select the Git Actions icon (Git Actions) in the project action bar and click Pull project.

Reset your Git repository project

You can reset your project so that it clones the Git repository that it is linked to. All assets in the project will be overwritten by the versions of the assets in the remote Git repository. To reset your project is a way to re-sync with all the collaborators, however it is to be used only if necessary, because you will lose all local changes made to the project.

To reset your project:

  1. Select the github_commitproject icon beside the project in your project list and click Reset Project.

This action will reset your project to whatever is in the master repository.