Cloudera Search Incompatible Changes and Limitations

General Limitations of Cloudera Search

  • Cloudera Search supports one instance of the Apache Solr service on each host in a cluster. Using multiple Solr instances on a host is not supported.
  • If multiple Solr instances are configured to depend on the same Sentry service, it is not possible to create unique Solr Sentry privileges per Solr deployment. Since privileges are enforced in all Solr instances simultaneously, you cannot add distinct privileges that apply to one Solr cluster, but not to another.

  • Converting existing file-based Sentry authorization policy files to permissions in the Sentry service does not support preserving case-sensitive role or group names

    The file-based model allows case-sensitive role names. During conversion, all roles and groups are converted to lower case.
    • If a policy-file conversion will change the case of roles or groups, a warning is presented. Policy conversion can proceed, but if you have enabled document-level security and use role names as your tokens, you must re-index using the new lower case role names after conversion is complete.
    • If a policy-file conversion will change the case of roles or groups, creating a name collision, an error occurs and conversion cannot occur. In such a case, you must eliminate the collisions before proceeding. For example, you could rename or delete the roles or groups that cause a collision.

Incompatible Changes Between Cloudera Search for CDH 5.8 and Previous Versions of Cloudera Search

  • Converting existing file-based Sentry authorization policy files to permissions in the Sentry service does not support preserving case-sensitive role or group names

    The file-based model allows case-sensitive role names. During conversion, all roles and groups are converted to lower case.
    • If a policy-file conversion will change the case of roles or groups, a warning is presented. Policy conversion can proceed, but if you have enabled document-level security and use role names as your tokens, you must re-index using the new lower case role names after conversion is complete.
    • If a policy-file conversion will change the case of roles or groups, creating a name collision, an error occurs and conversion cannot occur. In such a case, you must eliminate the collisions before proceeding. For example, you could rename or delete the roles or groups that cause a collision.

Incompatible Changes Between Cloudera Search for CDH 5.5 and Previous Versions of Cloudera Search

  • Using MapReduceIndexerTool with configurations that require an updateRequestProcessorChain may fail unless an alternate configuration is specified

    With Search for CDH 5.5, the MapReduceIndexerTool uses a default solrconfig.xml that is appropriate for most collection configurations. With this configuration, the MapReduceIndexerTool can index data even if Sentry is enabled. This default configuration does not include any updateRequestProcessorChains; if your configuration requires an updateRequestProcessorChain, you can configure the MapReduceIndexerTool to use the configuration from ZooKeeper by specifying --use-zk-solrconfig.xml or from local disk by specifying --solr-home-dir.

Incompatible Changes Between Cloudera Search for CDH 5.4 and Previous Versions of Cloudera Search

  • HDFS locality metrics are disabled by default
    HDFS locality metrics are disabled by default in CDH 5.4.8 and higher because they can generate numerous HDFS calls, negatively affecting performance. Performance degradation is more common in production-scale environments that include rapidly changing indexes. To re-enable these metrics, add the following to the directory config in solrconfig.xml:
      <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:org.apache.solr.core.HdfsDirectoryFactory}">
        <bool name="solr.hdfs.locality.metrics.enabled">true</bool>
      </directoryFactory>
  • CloudSolrServer and LBHttpSolrServer no longer declare MalformedURLException as thrown from their constructors

    As a result of this change, compilation failures against the 4.10.3 Solr libraries may fail. To avoid this issue, make relevant source code changes, such as removing catch phrases related to MalformedURLException, and then recompile the application.

    Related JIRA: Solr-5555

  • The solrJ client JavaBinCodec serializes unknown objects differently

    Starting with Search for CDH 5.4.0, Search moves from Solr 4.4 to Solr 4.1.0. With Solr 4.4, JavaBinCodec serialized unknown Java objects as obj.toString(). In Solr 4.10.0, JavaBinCodec serializes unknown Java objects as obj.getClass().getName() + ':' + obj.toString().

    As a result, the same objects may produce different results when serialized with CDH 5.4 and higher compared with objects serialized with CDH 5.3 and lower.

  • Parsing using schema.xml creates an init error when <dynamicField/> declarations include default or required attributes

    In previous releases, these attributes were ignored. If init errors occur when upgrading with an existing schema.xml, remove the default or required attributes. After removing these attributes, Search functions as it did before upgrading.

    Related JIRA: SOLR-5227.

  • Indexing documents with terms that exceed Lucene's MAX_TERM_LENGTH registers errors

    In previous releases, terms that exceeded the length limit were silently ignored. To make Search function as it did in previous releases—silently ignoring longer terms—use solr.LengthFilterFactory in all of your Analyzers.

    Related JIRA: LUCENE-5472.

  • The fieldType configuration docValuesFormat="Disk" is no longer supported

    If your schema.xml contains fieldTypes using docValuesFormat="Disk", modify the file to remove the docValuesFormat attribute and optimize your index to rewrite to the default codec. Make these changes before upgrading to CDH 5.4.

    Related JIRA: LUCENE-5761.

  • UpdateRequestExt has been removed

    Use UpdateRequest instead.

    Related JIRA: SOLR-4816.

  • Parsing schema.xml registers errors when multiple values exist where only a single value is permitted.

    With previous releases, when multiple values existed where only a single value was permitted, one value was silently chosen. In CDH 5.4, if multiple values exist where only a single value is supported, configuration parsing fails. The extra values must be removed.

    Related JIRAs: SOLR-4953, SOLR-5108.

Incompatible Changes Between Cloudera Search for CDH 5.2 and Cloudera Search for CDH 5.3

Some packaging changes were made that have consequences for CrunchIndexerTool start-up scripts. If those startup scripts include the following line:

export myDriverJar=$(find $myDriverJarDir -maxdepth 1 -name \
'*.jar' ! -name '*-job.jar' ! -name '*-sources.jar')

That line in those scripts should be changed as follows:

export myDriverJar=$(find $myDriverJarDir -maxdepth 1 -name \
'search-crunch-*.jar' ! -name '*-job.jar' ! -name '*-sources.jar')

Incompatible changes between Cloudera Search for CDH 5 Beta 2 and Older Versions of Cloudera Search

The following incompatible changes occurred between Cloudera Search for CDH 5 beta 2 and older versions of Cloudera Search, including both lower versions of Cloudera Search for CDH 5 and Cloudera Search 1.x:

  • Supported values for the --reducers option of the MapReduceIndexer tool change with the release of Search for CDH 5 beta 2. To use one reducer per output shard, 0 is used in Search 1.x and Search for CDH 5 beta 1. With the release of Search for CDH 5 beta 2, -2 is used for one reducer per output shard. Because of this change, commands using --reducers 0 that were written for previous Search releases do not continue to work in the same way after upgrading to Search for CDH 5 beta 2. After upgrading to Search for CDH 5 beta 2, using --reducers 0 results in an exception stating that zero is an illegal value.