Project Environment Variables

Sometimes your code needs to use secrets, such as passwords and authentication tokens, in order to access external resources.

In general, Cloudera recommends that you not paste secrets into your code. Anyone with read access to your project would be able to view the secrets. Even if you did not give anyone read access, you would have to remember to carefully check any code that you copy and paste into another project, or add to a Git repository.

A better place to store secrets is in your project's environment variables, which you can manage by going to the project's Overview page and from the left sidebar, click Settings > Engine.

These environment variables are set in every engine that runs in your project. The code samples that follow show how to access the environment variable DATABASE_PASSWORD from your code.

R

database.password <- Sys.getenv("DATABASE_PASSWORD")

Python

import os
database_password = os.environ["DATABASE_PASSWORD"]

Scala

System.getenv("DATABASE_PASSWORD")

Engine Environment Variables

The following table lists environment variables that can be set in every engine.

Environment Variable Description
CDSW_PROJECT

The project to which this engine belongs.

CDSW_CREATOR

The username of the creator of this engine.

CDSW_ENGINE_ID

The ID of this engine. For sessions, this appears in your browser's URL bar.

CDSW_MASTER_ID

If this engine is a worker, this is the CDSW_ENGINE_ID of its master.

CDSW_MASTER_IP

If this engine is a worker, this is the IP address of its master.

CDSW_PUBLIC_PORT

A port on which you can expose HTTP services in the engine to browsers. HTTP services that bind CDSW_PUBLIC_PORT will be available in browsers at: http(s)://<$CDSW_ENGINE_ID>.<$CDSW_DOMAIN>. By default, CDSW_PUBLIC_PORT is set to 8080.

A direct link to these web services will be available from the grid icon in the upper right corner of the Cloudera Data Science Workbench web application, as long as the job or session is still running. For more details, see Accessing Web User Interfaces from Cloudera Data Science Workbench.

CDSW_DOMAIN

The domain on which Cloudera Data Science Workbench is being served. This can be useful for iframing services, as demonstrated in the Shiny example.

CDSW_CPU_MILLICORES

The number of CPU cores allocated to this engine, expressed in thousandths of a core.

CDSW_MEMORY_MB

The number of megabytes of memory allocated to this engine.

CDSW_IP_ADDRESS

Other engines in the Cloudera Data Science Workbench cluster can contact this engine on this IP address.

IDLE_MAXIMUM_MINUTES

Maximum number of minutes a session can remain idle before it exits.

Default: 60 minutes

Maximum Value: 35,000 minutes

SESSION_MAXIMUM_MINUTES

Maximum number of minutes a session can run before it times out.

Default: 60*24*7 minutes (7 days)

Maximum Value: 35,000 minutes

JOB_MAXIMUM_MINUTES

Maximum number of minutes a job can run before it times out.

Default: 60*24*7 minutes (7 days)

Maximum Value: 35,000 minutes

CONDA_DEFAULT_ENV

Points to the default Conda environment so you can use Conda to install/manage packages in the Workbench. For more details on when to use this variable, see Using Conda with Cloudera Data Science Workbench.