guides

Common Docker terms for data-scientists

Common Docker terms for data-scientists and machine learning engineers

In order to understand how to use docker correctly, data-scientists should be familiarized with some terms.

Dockerfile

A Dockerfile is a file with instructions for how Docker should build your image. Each Dockerfile refers to a base image that is used to build the initial image layer. Popular official base images include python, tensorflow, notebook… Additional layers can then be stacked on top of the base image layers, according to the instructions in the Dockerfile. For example, a Dockerfile for a machine learning application could tell python base image to add NumPy, Pandas, and Scikit-learn as a new layer.

Docker Image

A Docker image is an archive with all data needed to run a programme. If you’re familiar with programming languages, you can think of an image the same way you do a class. Classes are blueprints to create instances, while images are blueprints to create containers.

Docker Image Name and Tag

An image name is made up of slash-separated name components, optionally prefixed by a registry hostname. An image tag is an identifier attached to images within a repository, e.g. v1, latest, stable, rc12.

The image fully qualified name becomes: repo/image:tag

Docker ImageID

Unique identifier hash generated for each image.

Docker Container

A container is an enclosed environment where your app runs. Containers only have access to the resources it is allowed to (storage, CPU, memory), and does not know anything else about the machine it is running on. A container only has access to a Linux distribution with the information needed to run the application. Containers leave no data behind by default. Any changes made to a container, as long as you don’t save it as a new image, are lost as soon as it is removed.

Docker Volume

Instead of creating a docker image with all data required, docker has a concept for mounting data needed for an experiment using a volume.

Docker Registry

A registry is where a user can store and share Docker images. It allows you to push and pull images from local machine or from a colleague’s machine. By distributing a docker image to a registry, users can run their application from a local machine, or from a cloud provider or a Kubernetes cluster.

Subscribe to Polyaxon Blog

Get the latest posts delivered right to your inbox