Configuring Cloudera Data Science Workbench Deployments Behind a Proxy

If your deployment is behind an HTTP or HTTPS proxy, set the respective HTTP_PROXY or HTTPS_PROXY property in /etc/cdsw/config/cdsw.conf to the hostname of the proxy you are using.
HTTP_PROXY="<http://proxy_host>:<proxy-port>"
HTTPS_PROXY="<http://proxy_host>:<proxy_port>"
If you are using an intermediate proxy such as Cntlm to handle NTLM authentication, add the Cntlm proxy address to the HTTP_PROXY or HTTPS_PROXY fields in cdsw.conf.
HTTP_PROXY="http://localhost:3128"
HTTPS_PROXY="http://localhost:3128"

If the proxy server uses TLS encryption to handle connection requests, you will need to add the proxy's root CA certificate to your host's store of trusted certificates. This is because proxy servers typically sign their server certificate with their own root certificate. Therefore, any connection attempts will fail until the Cloudera Data Science Workbench host trusts the proxy's root CA certificate. If you do not have access to your proxy's root certificate, contact your Network / IT administrator.

To enable trust, perform the following steps on the master and worker nodes.
  1. Copy the proxy's root certificate to the trusted CA certificate store (ca-trust) on the Cloudera Data Science Workbench host.
    cp /tmp/<proxy-root-certificate>.crt /etc/pki/ca-trust/source/anchors/
  2. Use the following command to rebuild the trusted certificate store.
    update-ca-trust extract
  3. If you will be using custom engine images that will be pulled from a Docker repository, add the proxy's root certificates to a directory under /etc/docker/certs.d. For example, if your Docker repository is at docker.repository.mycompany.com, create the following directory structure:
    /etc/docker/certs.d
    |-- docker.repository.mycompany.com          # Directory named after Docker repository 
        |-- <proxy-root-certificate>.crt         # Docker-related root CA certificates 

    This step is not required with the standard engine images because they are included in the Cloudera Data Science Workbench RPM.

  4. Re-initialize Cloudera Data Science Workbench to have this change go into effect.
    cdsw init

Configure hostnames to be skipped from the proxy

Use the NO_PROXY field in cdsw.conf to include a comma-separated list of hostnames that should be skipped from the proxy. These typically include 127.0.0.1, localhost, the value of MASTER_IP, and any private Docker registries and HTTP services inside the firewall that Cloudera Data Science Workbench users might want to access from the engines. This change must be made on the master and on all the worker nodes.

At a minimum, Cloudera recommends the following NO_PROXY configuration.
NO_PROXY="127.0.0.1,localhost,<MASTER_IP>,100.66.0.1,100.66.0.2,
100.66.0.3,100.66.0.4,100.66.0.5,100.66.0.6,100.66.0.7,100.66.0.8,
100.66.0.9,100.66.0.10,100.66.0.11,100.66.0.12,100.66.0.13,100.66.0.14,
100.66.0.15,100.66.0.16,100.66.0.17,100.66.0.18,100.66.0.19,100.66.0.20,
100.66.0.21,100.66.0.22,100.66.0.23,100.66.0.24,100.66.0.25,100.66.0.26,
100.66.0.27,100.66.0.28,100.66.0.29,100.66.0.30,100.66.0.31,100.66.0.32,
100.66.0.33,100.66.0.34,100.66.0.35,100.66.0.36,100.66.0.37,100.66.0.38,
100.66.0.39,100.66.0.40,100.66.0.41,100.66.0.42,100.66.0.43,100.66.0.44,
100.66.0.45,100.66.0.46,100.66.0.47,100.66.0.48,100.66.0.49,100.66.0.50"