Cloudera Search Known Issues

The current release includes the following known limitations:

Default Solr core names cannot be changed (limitation)

Although it is technically possible to give user-defined Solr core names during core creation, it is to be avoided in te context of Cloudera Search. Cloudera Manager expects core names in the default "collection_shardX_replicaY" format. Altering core names results in Cloudera Manager being unable to fetch Solr metrics for the given core and this, eventually, may corrupt data collection for co-located core, or even shard and server level charts.

Affected versions: All.

The `solrconfig.xml.secure` Template Does Not Enforce Apache Sentry Authorization

The solrconfig.xml.secure file generated by the solrctl instancedir --generate command does not enable Sentry authorization correctly. If you copied this file or renamed it to solrconfig.xml and used it for any collections, Sentry authorization is not being enforced on those collections.

Workaround: Modify the solrconfig.xml configuration file for each Sentry-protected collection as follows:

Download the collection configuration from ZooKeeper:

solrctl instancedir --get <config_name> /tmp/<config_name>

Edit the /tmp/<config_name>/conf/solrconfig.xml file as follows:

Change this line:

<updateRequestProcessorChain name="updateIndexAuthorization">

to this:

<updateRequestProcessorChain name="updateIndexAuthorization" default="true">

Upload the modified configuration to ZooKeeper:

solrctl instancedir --update <config_name> /tmp/<config_name>

Cloudera Bug: CDH-54101

Affected Versions: All CDH 5 versions except CDH 5.12.1 and higher

Fixed Versions: CDH 5.9.3, CDH 5.10.2, CDH 5.11.2, CDH 5.12.1

Collection Creation No Longer Supports Automatically Selecting A Configuration If Only One Exists

Before CDH 5.5.0, a collection could be created without specifying a configuration. If no -c value was specified, then:

If there was only one configuration, that configuration was chosen.
If the collection name matched a configuration name, that configuration was chosen.

Search for CDH 5.5.0 includes multiple built-in configurations. As a result, there is no longer a case in which only one configuration can be chosen by default.

To avoid this issue, explicitly specify the collection configuration to use by passing -c configName to solrctl collection --create.

Cloudera Bug: CDH-34050

Affected Versions: CDH 5.5.0 and higher

Creating an Instance Directory Fails If Matching Name Exists in `/solr/configs` Znode

The solrctl instancedir --create command checks for existing configurations under the /solr/configs ZooKeeper znode. If there is a znode with a name that matches the name of the config you are trying to create, the command fails.

Workaround: Use a different name for the instance directory, or make sure that the conflicting data in ZooKeeper is not needed and use solrctl instancedir --update instead.

Cloudera Bug: CDH-57036

Affected Versions: All CDH 5 versions except CDH 5.13.0 and higher

Fixed Versions: CDH 5.9.4, CDH 5.10.3, CDH 5.11.2, CDH 5.12.1, CDH 5.13.0

Configuration Templates Are Not Automatically Created on Existing SolrCloud Deployments

Configuration templates are only created when Solr is initialized. As a result, templates are not automatically available on existing SolrCloud deployments, even after upgrading to CDH 5.5.0.

If you do not need to retain information in your SolrCloud cluster, you can reinitialize solr using Cloudera Manager or using solrctl init. If you need to retain information in your cluster, there is no automated way to create the templates, but you can access the templates on a host with Solr installed at /usr/lib/solr/templateName for packages and /opt/cloudera/parcels/CDH/lib/solr/templateName for parcels. The templates can be uploaded to ZooKeeper using instancedir commands such as solrctl instancedir --create templateName /path/to/templateName, although they are not protected by ZooKeeper ACLs.

For more information on enabling configuration templates in CDH 5.5.0, see Enabling Solr as a Client for the Sentry Service Using the Command Line.

Cloudera Bug: CDH-34052

Affected Versions:CDH 5.5.0 and higher

Solr ZooKeeper ACLs Are Not Automatically Applied to Existing ZNodes

As of CDH 5.4, in Kerberos-enabled environments, ZooKeeper ACLs restrict access to Solr metadata stored in ZooKeeper to the solr user. This metadata cannot be modified by other users. These ACLs that limit access to the solr user are only applied automatically to new znodes.

This protection is not automatically applied to existing deployments.

To enable Solr ZooKeeper ACLs without retaining the existing cluster's Solr state, remove the solr znodes and reinitialize solr.

To remove solr znodes and reinitialize solr:

Using the zookeeper-client, enter the command rmr /solr.
Reinitialize Solr:
- Select Initialize Solr in Cloudera Manager OR
- Use solrctl init

To set ACLs using a script (CDH 5.7 and higher):

Create a jaas.conf file containing the following:

Client {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=false
  useTicketCache=true
  principal="solr@EXAMPLE.COM";
};

Replace EXAMPLE.COM with your Kerberos realm name.

Set the LOG4J_PROPS environment variable to a log4j.properties file:
```
export LOG4J_PROPS=/etc/zookeeper/conf/log4j.properties
```

Set the ZKCLI_JVM_FLAGS environment variable:

export ZKCLI_JVM_FLAGS="-Djava.security.auth.login.config=/path/to/jaas.conf \
-DzkACLProvider=org.apache.solr.common.cloud.ConfigAwareSaslZkACLProvider \
-Droot.logger=INFO,console"

Authenticate as the solr user:
```
kinit solr@EXAMPLE.COM
```
Replace EXAMPLE.COM with your Kerberos realm name.

Run the zkcli.sh script as follows:

Cloudera Manager:

/opt/cloudera/parcels/CDH/lib/solr/bin/zkcli.sh -zkhost zk01.example.com:2181 -cmd updateacls /solr

Unmanaged:

/usr/lib/solr/bin/zkcli.sh -zkhost zk01.example.com:2181 -cmd updateacls /solr

Replace zk01.example.com with the hostname of a ZooKeeper server.

To enable Solr ZooKeeper ACLs while retaining the existing cluster's Solr state in versions lower than CDH 5.7, manually modify the existing znode's ACL information. For example, using zookeeper-client, run the command setAcl [path] sasl:solr:cdrwa,world:anyone:r. This grants the solr user ownership of the specified path. Run this command for /solr and every znode under /solr except for the configuration znodes under and including /solr/configs.

Cloudera Bug: CDH-26416

Affected Versions: CDH 5.4.0 and higher

HBase Indexer ACLs Are Not Automatically Applied to Existing ZNodes

As of CDH 5.4, in Kerberos-enabled environments, ZooKeeper ACLs restrict access to Lily HBase Indexer metadata stored in ZooKeeper to hbase user. This metadata cannot be modified by other users. These ACLs that limit access to the hbase user are only applied automatically to new znodes.

This protection is not automatically applied to existing deployments.

To enable Lily HBase Indexer ACLs without retaining the existing cluster's Lily HBase Indexer state, turn off the Lily HBase Indexer, remove the hbase-indexer znodes, and then restart the Lily HBase Indexer.

To remove hbase-indexer znodes and reinitialize Lily HBase Indexer:

In Cloudera Manager, click to the right of the Lily HBase Indexer service and select Stop.
Using the zookeeper-client, enter the command rmr /ngdata.
In Cloudera Manager, click to the right of the Lily HBase Indexer service and select Start.
The Lily HBase Indexer automatically creates all required znodes when it is started.

To enable Lily HBase Indexer while retaining the existing HBase-Indexer state, manually modify the existing znode's ACL information. For example, using zookeeper-client, run the command setAcl [path]sasl:hbase:cdrwa,world:anyone:r. This grants the hbase user ownership of every znode under /ngdata (inclusive of /ngdata).

Cloudera Bug: CDH-26417

Affected Versions: CDH 5.4.0 and higher

CrunchIndexerTool which includes Spark indexer requires specific input file format specifications

If the --input-file-format option is specified with CrunchIndexerTool, then its argument must be text, avro, or avroParquet, rather than a fully qualified class name.

Cloudera Bug: CDH-22190

Previously deleted empty shards may reappear after restarting the leader host

It is possible to be in the process of deleting a collection when hosts are shut down. In such a case, when hosts are restarted, some shards from the deleted collection may still exist, but be empty.

Workaround: To delete these empty shards, manually delete the folder matching the shard. On the hosts on which the shards exist, remove folders under /var/lib/solr/ that match the collection and shard. For example, if you had an empty shard 1 and empty shard 2 in a collection called MyCollection, you might delete all folders matching /var/lib/solr/MyCollection{1,2}_replica*/.

Cloudera Bug: CDH-20256

The `quickstart.sh` file does not validate ZooKeeper and the NameNode on some operating systems

The quickstart.sh file uses the timeout function to determine if ZooKeeper and the NameNode are available. To ensure this check can be complete as intended, the quickstart.sh determines if the operating system on which the script is running supports timeout. If the script detects that the operating system does not support timeout, the script continues without checking if the NameNode and ZooKeeper are available. If your environment is configured properly or you are using an operating system that supports timeout, this issue does not apply.

Workaround: This issue only occurs in some operating systems. If timeout is not available, a warning is displayed, but the quickstart continues and final validation is always done by the MapReduce jobs and Solr commands that are run by the quickstart.

Cloudera Bug: CDH-19923

Field value class guessing and Automatic schema field addition are not supported with the MapReduceIndexerTool nor the HBaseMapReduceIndexerTool

The MapReduceIndexerTool and the HBaseMapReduceIndexerTool can be used with a Managed Schema created via NRT indexing of documents or via the Solr Schema API. However, neither tool supports adding fields automatically to the schema during ingest.

Workaround: Define the schema before running the MapReduceIndexerTool or HBaseMapReduceIndexerTool. In non-schemaless mode, define in the schema using the schema.xml file. In schemaless mode, either define the schema using the Solr Schema API or index sample documents using NRT indexing before invoking the tools. In either case, Cloudera recommends that you verify that the schema is what you expect using the List Fields API command.

Cloudera Bug: CDH-26856

The “Browse” and “Spell” Request Handlers are not enabled in schemaless mode

The “Browse” and “Spell” Request Handlers require certain fields be present in the schema. Since those fields cannot be guaranteed to exist in a Schemaless setup, the “Browse” and “Spell” Request Handlers are not enabled by default.

Workaround: If you require the “Browse” and “Spell” Request Handlers, add them to the solrconfig.xml configuration file. Generate a non-schemaless configuration to see the usual settings and modify the required fields to fit your schema.

Cloudera Bug: CDH-19407

Enabling blockcache writing may result in unusable indexes

It is possible to create indexes with solr.hdfs.blockcache.write.enabled set to true. Such indexes may appear corrupt to readers, and reading these indexes may irrecoverably corrupt indexes. Blockcache writing is disabled by default.

Workaround: Do not enable blockcache writing.

Cloudera Bug: CDH-17978

Solr fails to start when Trusted Realms are added for Solr into Cloudera Manager

Cloudera Manager generates name rules with spaces as a result of entries in the Trusted Realms, which do not work with Solr. This causes Solr to not start.

Workaround: Do not use the Trusted Realm field for Solr in Cloudera Manager. To write your own name rule mapping, add an environment variable SOLR_AUTHENTICATION_KERBEROS_NAME_RULES with the mapping. See Cloudera Security for more information.

Cloudera Bug: CDH-17289

Lily HBase batch indexer jobs fail to launch

A symptom of this issue is an exception similar to the following:

Exception in thread "main" java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString

at java.lang. ClassLoader.defineClass1(Native Method)

at java.lang. ClassLoader.defineClass( ClassLoader.java:792)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang. ClassLoader.loadClass( ClassLoader.java:424)

at java.lang. ClassLoader.loadClass( ClassLoader.java:357)

at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818)

at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433)

at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186)

at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147)

at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270)

at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100)

at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124)

at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

This is because of an optimization introduced in HBASE-9867 that inadvertently introduced a classloader dependency. In order to satisfy the new classloader requirements, hbase-protocol.jar must be included in Hadoop's classpath. This can be resolved on a per-job launch basis by including it in the HADOOP_CLASSPATH environment variable when you submit the job.

Workaround: Run the following command before issuing Lily HBase MapReduce jobs. Replace the .jar file names and filepaths as appropriate.

$ export HADOOP_CLASSPATH=</path/to/hbase-protocol>.jar; hadoop jar <MyJob>.jar <MyJobMainClass>

Cloudera Bug: CDH-16539

Users with insufficient Solr permissions may receive a "Page Loading" message from the Solr Web Admin UI

Users who are not authorized to use the Solr Admin UI are not given page explaining that access is denied, and instead receive a web page that never finishes loading.

Workaround: None

Cloudera Bug: CDH-58276

Using MapReduceIndexerTool or HBaseMapReduceIndexerTool multiple times may produce duplicate entries in a collection.

Repeatedly running the MapReduceIndexerTool on the same set of input files can result in duplicate entries in the Solr collection. This occurs because the tool can only insert documents and cannot update or delete existing Solr documents. This issue does not apply to the HBaseMapReduceIndexerTool unless it is run with more than zero reducers.

Workaround: To avoid this issue, use HBaseMapReduceIndexerTool with zero reducers. This must be done without Kerberos.

Cloudera Bug: CDH-15441

Deleting collections may fail if hosts are unavailable.

It is possible to delete a collection when hosts that host some of the collection are unavailable. After such a deletion, if the previously unavailable hosts are brought back online, the deleted collection may be restored.

Workaround: Ensure all hosts are online before deleting collections.

Cloudera Bug: CDH-58694

Saving search results is not supported.

Cloudera Search does not support the ability to save search results.

Workaround: None

Cloudera Bug: CDH-21162

HDFS Federation is not supported.

Cloudera Search does not support HDFS Federation.

Workaround: None

Cloudera Bug: CDH-11357

Apache Pig Known Issues

Apache Sentry Known Issues

Cloudera Search Known Issues

Default Solr core names cannot be changed (limitation)

The solrconfig.xml.secure Template Does Not Enforce Apache Sentry Authorization

Collection Creation No Longer Supports Automatically Selecting A Configuration If Only One Exists

Creating an Instance Directory Fails If Matching Name Exists in /solr/configs Znode

Configuration Templates Are Not Automatically Created on Existing SolrCloud Deployments

Solr ZooKeeper ACLs Are Not Automatically Applied to Existing ZNodes

HBase Indexer ACLs Are Not Automatically Applied to Existing ZNodes

CrunchIndexerTool which includes Spark indexer requires specific input file format specifications

Previously deleted empty shards may reappear after restarting the leader host

The quickstart.sh file does not validate ZooKeeper and the NameNode on some operating systems

Field value class guessing and Automatic schema field addition are not supported with the MapReduceIndexerTool nor the HBaseMapReduceIndexerTool

The “Browse” and “Spell” Request Handlers are not enabled in schemaless mode

Enabling blockcache writing may result in unusable indexes

Solr fails to start when Trusted Realms are added for Solr into Cloudera Manager

Lily HBase batch indexer jobs fail to launch

Users with insufficient Solr permissions may receive a "Page Loading" message from the Solr Web Admin UI

Using MapReduceIndexerTool or HBaseMapReduceIndexerTool multiple times may produce duplicate entries in a collection.

Deleting collections may fail if hosts are unavailable.

Saving search results is not supported.

HDFS Federation is not supported.

The `solrconfig.xml.secure` Template Does Not Enforce Apache Sentry Authorization

Creating an Instance Directory Fails If Matching Name Exists in `/solr/configs` Znode

The `quickstart.sh` file does not validate ZooKeeper and the NameNode on some operating systems