Customizing Engine Images
Cloudera Data Science Workbench site administrators and project administrators can add libraries and other dependencies to the Docker image in which their engines run. Currently, Cloudera Data Science Workbench only supports public Docker images in registries accessible to the Cloudera Data Science Workbench nodes.
Site administrators can whitelist images for use in projects, and project administrators can select which of these white-listed images is installed for their projects.
The following Dockerfile shows how to add MeCab, a Japanese text tokenizer, to the base Cloudera Data Science Workbench engine.
# Dockerfile FROM docker.repository.cloudera.com/cdsw/engine:1 RUN apt-get update && \ apt-get install -y -q mecab \ libmecab-dev \ mecab-ipadic-utf8 && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* RUN cd /tmp && \ git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git && \ /tmp/mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -y -n -p /var/lib/mecab/dic/neologd && \ rm -rf /tmp/mecab-ipadic-neologd RUN pip install --upgrade pip RUN pip install mecab-python==0.996
- Build your image with the Dockerfile.
docker build -t <company-registry>/user/cdsw-mecab:latest . -f Dockerfile
- Push the image to your company's Docker registry.
docker push <company-registry>/user/cdsw-mecab:latest
- Whitelist the image, <company-registry>/user/cdsw-mecab:latest. Only a site administrator can do this.
- Log in as a site administrator.
- Click Admin.
- Go to the Engines tab.
- Add <company-registry>/user/cdsw-mecab:latest to the list of whitelisted engine images.
- Make the whitelisted image available to your project. Only a project administrator can do this.
- Go to the project Settings page.
- Click Engines.
- Select company-registry/user/cdsw-mecab:latest from the dropdown list of available Docker images. Sessions and jobs you run in your project will now have access to this custom image.