Common Git terms for data-scientists

Common Git terms for data-scientists and machine learning engineers

This guide provides a short explanation of the terminology behind git which can be intimidating for new users.

Git Branch

A branch is a version of the repository that diverges from the main state. It is an essential feature available in most modern version control systems. A Git project can have more than one branch. Git CLI provides commands to act on branches-like: rename, list, delete, etc.

Git Checkout

In Git, the term checkout is used to switch between different versions of a repository, in other terms the git checkout command is used to switch between branche-like states (tree-ish) in a repository.

Git Cherry-Picking

Cherry-picking in Git is meant to apply some commits from one branch into another branch by selecting specific changes in the history of that branch.

Git Clone

Cloning is used to make a copy of a remote repository to a local path.

Git Fetch

Fetching allows updating the remote-tracking branches and tags from one or more other repositories, along with the objects necessary to complete their histories.


HEAD is the representation of the last commit in the current checkout branch. We can think of the head like a current branch. When you switch branches with git checkout, the HEAD revision changes, and points the new branch.

Git Index

The Git index is a staging area between the working directory and repository. It is used as the index to build up a set of changes that the user wants to commit together.

Git Master/Main

Master (or main) is a naming convention for the default git branch.

Git Merge

Merging is a process to put a forked history back together. The git merge command facilitates to take the history created by git branch and integrate them into a single branch.

Git Origin

In Git, “origin” is a reference to the remote repository from a project was initially cloned. More precisely, it is used instead of that original repository URL to make referencing much easier.

Git Pull

The term pull is used to receive data from a remote repositry, for example on Github or Gitlab. It fetches and merges changes on the remote server to the local working directory.

Git Pull Request

Pull requests are a process for a developer to notify other team members that they have completed a change, bug fix or a feature, and that they need a review of the code changes to merge it into the master branch.

Git Push

Pushing allows to upload the content of a local repository to a remote repository, an it allows to transfer commits from the local repository to a remote repository.

Git Rebase

In Git, the term rebase is referred to as the process of moving or combining a sequence of commits to a new base commit. Rebasing allows to change the base of a local branch from one commit to another.

Git Remote

In Git, the term remote is concerned with the remote repository. It is a shared repository that all team members use to exchange their changes. A remote repository is stored on a code hosting service like an internal server, GitHub, Gitlab or other services.

Git Repository

In Git, Repository is like a database used by version control system to store metadata for a set of files and directories. It contains the collection of the file as well as the history of changes made to those files. Repositories in Git is considered as your project folder. A repository has all the project-related data. Distinct projects have distinct repositories.

Git Stash

When a user tries to change branches and do not want to make a commit of incomplete work, they can stash they work. The command allows to switch branches without committing the current branch.

Git Tag

Tags make a point as a specific point in Git history. It is used to mark a commit stage as important. We can tag a commit for future reference. Primarily, it is used to mark a projects initial point like v1.

Git Upstream And Downstream

The term upstream and downstream is a reference of the repository. Generally, upstream is where users clone the repository from (the origin) and downstream is any project that integrates the work with other works. However, these terms are not restricted to Git repositories.

Git Revert

In Git, the term revert refers to undoing/removing some commits.

Git Reset

In Git, the term reset stands for undoing changes. The git reset command has three forms of invocation:

  • Soft
  • Mixed
  • Hard

Git Ignore

In Git, the term ignore is used to specify intentionally untracked files that Git should ignore, it does not affect the files already tracked by Git.

Git Diff

Git diff is a command-line utility that runs a diff function on Git data sources. These data sources can be files, branches, commits, and more. It is used to show changes between commits, commit, and working tree, etc.

Git Flow

GitFlow is a branching model for Git, developed by Vincent Driessen. It is very well organized to collaborate and scale the development team. Git flow is a collection of Git commands. It accomplishes many repository operations with just single commands.

Git Squash

Squash allows to reduce previous commits into a single commit. Git squash is a technique to group-specific changes before forwarding them to others.

Git Rm

In Git, the term rm stands for remove. It is used to remove individual files or a collection of files. The key function of git rm is to remove tracked files from the Git index. Additionally, it can be used to remove files from both the working directory and staging index.

Git Fork

A fork is a rough copy of a repository. Forking a repository allows you to freely test and debug changes without affecting the original project.

Git Cheatsheet

A Git cheat sheet is a summary of Git quick references. It contains basic Git commands with quick installation.

Subscribe to Polyaxon Blog

Get the latest posts delivered right to your inbox