Enabling TLS/SSL for Cloudera Data Science Workbench

Cloudera Data Science Workbench uses HTTP and WebSockets (WS) to support interactive connections to the Cloudera Data Science Workbench web application. However, these connections are not secure by default. This topic describes how you can use TLS/SSL to enforce secure encrypted connections, using HTTPS and WSS (WebSockets over TLS), to the Cloudera Data Science Workbench web application.

Specifically, Cloudera Data Science Workbench can be configured to use a TLS termination proxy to handle incoming connection requests. The termination proxy server will decrypt incoming connection requests and forwards them to the Cloudera Data Science Workbench web application. A TLS termination proxy can be internal or external.

Internal Termination

An internal termination proxy will be run by Cloudera Data Science Workbench's built-in load balancer, called the ingress controller, on the master node. The ingress controller is primarily responsible for routing traffic and load balancing between Cloudera Data Science Workbench's web service backend. Once configured, as shown in the instructions that follow, it will start terminating HTTPS traffic as well. The primary advantage of internal termination approach is simplicity.

External Termination

External TLS termination can be provided through a number of different approaches. Common examples include:
  • Load balancers, such as the AWS Elastic Load Balancer
  • Modern firewalls
  • Reverse web proxies, such as nginx
  • VPN appliances supporting TLS/SSL VPN

Organizations that require external termination will often have standardized on single approach for TLS. The primary advantage of this approach is that it allows such organizations to integrate with Cloudera Data Science Workbench without violating their IT department's policies for TLS. For example, with an external termination proxy, Cloudera Data Science Workbench does not need access to the TLS private key.

Load balancers and proxies often require a URL they can ping to validate the status of the web service backend. For instance, you can configure a load balancer to send an HTTP GET request to /internal/load-balancer/health-ping. If the response is 200 (OK), that means the backend is healthy. Note that, as with all communication to the web backend from the load balancer when TLS is terminated externally, this request should be sent over HTTP and not HTTPS.

Note that any terminating load balancer must provide the following header fields so that Cloudera Data Science Workbench can detect the IP address and protocol used by the client:

  • X-Forwarded-For (client's IP address),
  • X-Forwarded-Proto (client's requested protocol, i.e. HTTPS),
  • X-Forwarded-Host (the "Host" header of the client's original request).
See Configuring HTTP Headers for Cloudera Data Science Workbench for more details on how to customize HTTP headers required by Cloudera Data Science Workbench.

Private Key and Certificate Requirements

The TLS certificate issued by your CA must list both, the Cloudera Data Science Workbench, as well as a wildcard for all first-level subdomains. For example, if the Cloudera Data Science Workbench domain is cdsw.company.com, then the TLS certificate must include both cdsw.company.com and *.cdsw.company.com.

Creating a Certificate Signing Request (CSR) and Key/Certificate Pair

Use the following steps to create a Certificate Signing Request (CSR) to submit to your CA. Then, create a private key/certificate pair that can be used to authenticate incoming communication requests to Cloudera Data Science Workbench.
  1. Create a cdsw.cnf file and populate it with the required configuration parameters including the SAN field values.
    vi cdsw.cnf
  2. Copy and paste the default openssl.cnf from: http://web.mit.edu/crypto/openssl.cnf.
  3. Modify the following sections and save the cdsw.cnf file:
    [ CA_default ]
    default_md = sha2
    
    [ req ]
    default_bits       = 2048
    distinguished_name = req_distinguished_name
    req_extensions     = req_ext
    
    [ req_distinguished_name ]
    countryName                 = Country Name (2 letter code)
    stateOrProvinceName         = State or Province Name (full name)
    localityName               = Locality Name (eg, city)
    organizationName           = Organization Name (eg, company)
    commonName                 = Common Name (e.g. server FQDN or YOUR name)
    
    [ req_ext ]
    subjectAltName = @alt_names
    
    [alt_names]
    DNS.1   = *.cdsw.company.com
    DNS.2   = cdsw.company.com
    Key points to note:
    • The domains set in the DNS.1 and DNS.2 entries above must match the DOMAIN set in cdsw.conf.
    • The default_md parameter must be set to sha256 at a minimum. Older hash functions such as SHA1 are deprecated and will be rejected by browsers, either currently or in the very near future.
    • The commonName (CN) parameter will be ignored by browsers. You must use Subject Alternative Names.
  4. Run the following command to generate the CSR.
    openssl req -out cert.csr -newkey rsa:2048 -nodes -keyout private.key -config cdsw.cnf
    This command generates the private key and the CSR in one step. The -nodes switch disables encryption of the private key (which is not supported by Cloudera Data Science Workbench at this time).
  5. Run the following command to use the CSR and private key generated in the previous step to request a certificate from the CA.
    openssl x509 -req -days 365 -in cert.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out <your_tls_cert>.crt -sha256 -extfile cdsw.cnf -extensions req_ext
  6. Run the following command to verify that the certificate issued by the CA lists both the required domains, cdsw.company.com and *.cdsw.company.com, under X509v3 Subject Alternative Name.
    openssl x509 -in <your_tls_cert>.crt -noout -text
    You should also verify that a valid hash function is being used to create the certificate. For SHA-256, the value under Signature Algorithm will be sha256WithRSAEncryption.

Configuring Internal Termination

Depending on your deployment (CSD or RPM), use one of the following sets of instructions to configure internal termination.

CSD Deployments

To enable internal termination, configure the following properties in the CDSW service in Cloudera Manager.

  1. Log in to the Cloudera Manager Admin Console.
  2. Navigate to the CDSW service and click Configuration.
  3. Search for the following properties and configure as required.
    • Enable TLS - When enabled, this property enforces HTTPS and WSS connections. The server will now redirect any HTTP request to HTTPS and generate URLs with the appropriate protocol.
    • TLS Key for Internal Termination - Set to the path of the TLS private key.
    • TLS Certificate for Internal Termination - Set to the path of the TLS certificate.

      Certificates and keys must be in PEM format.

  4. Click Save Changes.
  5. Restart the CDSW service.

RPM Deployments

To enable internal termination, configure the following properties in cdsw.conf (on all Cloudera Data Science Workbench nodes).

  • TLS_ENABLE - When set to true, this property enforces HTTPS and WSS connections. The server will now redirect any HTTP request to HTTPS and generate URLs with the appropriate protocol.
  • TLS_KEY - Set to the path of the TLS private key.
  • TLS_CERT - Set to the path of the TLS certificate.

    Certificates and keys must be in PEM format.

You can configure these properties either as part of the installation process or after the fact. If you make any changes to cdsw.conf after installation is complete, make sure to restart the master and worker nodes as needed.

Configuring External Termination

Depending on your deployment (CSD or RPM), use one of the following sets of instructions to configure external termination.

CSD Deployments

To enable external termination, configure the following property in the CDSW service in Cloudera Manager.

  1. Log in to the Cloudera Manager Admin Console.
  2. Navigate to the CDSW service and click Configuration.
  3. Search for the following properties and configure as required.
    • Enable TLS - When enabled, this property enforces HTTPS and WSS connections. The server will now redirect any HTTP request to HTTPS and generate URLs with the appropriate protocol.

      The TLS Key for Internal Termination and TLS Certificate for Internal Termination properties must be left blank.

  4. Click Save Changes.
  5. Restart the CDSW service.

RPM Deployments

To enable external termination, configure the following property in cdsw.conf (on all Cloudera Data Science Workbench nodes).

  • TLS_ENABLE - When set to true, this property enforces HTTPS and WSS connections. The server will now redirect any HTTP request to HTTPS and generate URLs with the appropriate protocol.

    The TLS_KEY and TLS_CERT properties must be left blank.

You can configure this property either as part of the installation process or after the fact. If you make any changes to cdsw.conf after installation is complete, make sure to restart the master and worker nodes as needed.

Known Issues and Limitations

  • Communication within the Cloudera Data Science Workbench cluster is not encrypted.

  • Cloudera Data Science Workbench does not support encrypted private keys with internal TLS termination. If you require an encrypted private key, use external TLS termination with a terminating proxy that does support encrypted private keys.

  • Troubleshooting can be difficult because browsers do not typically display helpful security errors with WebSockets. Often they will just silently fail to connect.

  • Self-signed certificates

    In general, browsers do not support self-signed certificates for WSS. Your certificate must be signed by a Certificate Authority (CA) that your users’ browsers will trust. Cloudera Data Science Workbench will not function properly if browsers silently abort WebSockets connections.

    If you are using a TLS certificate that has been used to sign itself, and is not signed by a CA in the trust store, then the browser will display a dialog asking if you want to trust the certificate provided by Cloudera Data Science Workbench. This means you are using a self-signed certificate, which is not supported and will not work. In this case WSS connections will likely be aborted silently, regardless of your response (Ignore/Accept) to the dialog.

    As long as you have a TLS certificate signed by a CA certificate in the trust store, it will be supported and will work with Cloudera Data Science Workbench. For example, if you need to use a certificate signed by your organization's internal CA, make sure that all your users import your root CA certificate into their machine’s trust store. This can be done using the Keychain Access application on Macs or the Microsoft Management Console on Windows.