Using Search through a Proxy for High Availability

Using a proxy server to relay requests to and from the Solr service can help meet availability requirements in production clusters serving many users.

A proxy server works a set of servers that is organized into a server group. A proxy server does not necessarily work with all servers in a deployment.

Overview of Proxy Usage and Load Balancing for Search

Configuring a proxy server to relay requests to and from the Solr service has the following advantages:

  • Applications connect to a single well-known host and port, rather than keeping track of the hosts where the Solr service is running. This is especially useful for non-Java Solr clients such as web browsers or command-line tools such as curl.
  • If any host running the Solr service becomes unavailable, application connection requests still succeed because you always connect to the proxy server rather than a specific host running the Solr server.
  • Users can configure an SSL terminating proxy for Solr to secure the data exchanged with the external clients without requiring SSL configuration for the Solr cluster itself. This is relevant only if the Solr cluster is deployed on a trusted network and needs to communicate with clients that may not be on the same network. Many of the advantages of SSL offloading are described in SSL Offloading, Encryption, and Certificates with NGINX.
  • The "coordinator host" for each Search query potentially requires more memory and CPU cycles than the other hosts that process the query. The proxy server can issue queries using round-robin scheduling, so that each connection uses a different coordinator host. This load-balancing technique lets the hosts running the Solr service share this additional work, rather than concentrating it on a single machine.

The following setup steps are a general outline that apply to any load-balancing proxy software.

  1. Download the load-balancing proxy software. It should only need to be installed and configured on a single host.
  2. Configure the software, typically by editing a configuration file. Set up a port on which the load balancer listens to relay Search requests back and forth.
  3. Specify the host and port settings for each Solr service host. These are the hosts that the load balancer chooses from when relaying each query. In most cases, use 8983, the default query and update port.
  4. Run the load-balancing proxy server, pointing it at the configuration file that you set up.

Special Proxy Considerations for Clusters Using Kerberos

In a cluster using Kerberos, applications check host credentials to verify that the host they are connecting to is the same one that is actually processing the request, to prevent man-in-the-middle attacks. To clarify that the load-balancing proxy server is legitimate, perform these extra Kerberos setup steps:

  1. This section assumes you are starting with a Kerberos-enabled cluster. See Search Authentication for instructions for setting up Search with Kerberos. See the CDH Security Guide for general steps to set up Kerberos: CDH 5 instructions.
  2. Choose the host you will use for the proxy server. Based on the Kerberos setup procedure, it should already have an entry solr/proxy_host@realm in its keytab. If not, go back over the initial Kerberos configuration steps to the keytab on each host running solr as described in Search Authentication.
  3. Copy the keytab file from the proxy host to all other hosts in the cluster that run the solr daemon. (For optimal performance, solr should be running on all DataNodes in the cluster.) Put the keytab file in a secure location on each of these other hosts.
  4. For each solr node, merge the existing keytab with the proxy’s keytab using ktutil, producing a new keytab file. For example:
    $ ktutil
    ktutil: read_kt proxy.keytab
    ktutil: read_kt solr.keytab
    ktutil: write_kt proxy_Search.keytab
    ktutil: quit
  5. Make sure that the Search user has permission to read this merged keytab file.
  6. For every host running Solr daemon, edit the SOLR_AUTHENTICATION_KERBEROS_PRINCIPAL property in /etc/default/solr file to set the value to *. The value should appear as:
    SOLR_AUTHENTICATION_KERBEROS_PRINCIPAL=*
  7. Restart the Search service to make the changes take effect.

Configuring Dependent Services

Other services that use Search must also be configured to use the load balancer. For example, Hue may need reconfiguration. To reconfigure dependent services, ensure that the service uses a URL constructed of the load balancer hostname and port number when referring to Solr service. For example, in case of Hue, update hue.ini file to set solr_url parameter to a url referring load balancer. URL referring load balancers are typically of the form http://<load-balancer-host>:<port>/solr. For example, the value might appear as:
solr_url=http://load-balancer.example.com:1518/solr