Metadata Search Syntax and Properties

In Cloudera Navigator, metadata search is implemented by an embedded Solr engine that supports the syntax described in LuceneQParserPlugin.

Search Syntax

You construct search strings by specifying the value of a default property and four types of key-value pairs, using the indicated syntax:

  • Technical metadata key-value pairs - key:value
    • key is one of the properties listed in Search Properties.
    • value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values, you must escape special characters :, -, /, and * with the backslash character (\), or enclose the property value in quotes.
    Technical metadata key-value pairs are read-only and cannot be modified.
  • Custom metadata key-value pairs - up_key:value
    • key is a user-defined property.
    • value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values, you must escape special characters :, -, /, and * with the backslash character (\), or enclose the property value in quotes.
    Custom metadata key-value pairs can be modified.
  • Hive extended attribute key-value pairs - tp_key:value
    • key is an extended attribute set on a Hive entity. The syntax of the attribute is specific to Hive.
    • value is a single value supported by the entity type.
    Hive extended attribute key-value pairs are read-only and cannot be modified.
  • Managed metadata key-value pairs - namespace.key:value
    • namespace is the namespace containing the property. See Defining Managed Metadata.
    • key is the name of a managed metadata property.
    • value is a single value, a range of values specified as [value1 TO value2], or a set of values separated by spaces. In a value, * is a wildcard. In property values, you must escape special characters :, -, /, and * with the backslash character (\), or enclose the property value in quotes.
    Only the values of managed metadata key-value pairs can be modified.

Constructing Compound Search Strings

To construct compound search strings, you can join multiple property-value pairs using the Lucene Query Parser Boolean operators:
  • , +, -
  • OR, AND, NOT
In both syntaxes, you use () to group multiple clauses into a single field and to form subqueries. When you filter results in the Navigator Metadata UI, the constructed search strings use the , +, - syntax.

Example Search Strings

  • Entities in the path /user/hive that have not been deleted - +("/user/hive") +(-deleted:true)
  • Descriptions that start with the string "Banking" - description:Banking*
  • Entities of type MapReduce or entities of type Hive - sourceType:mapreduce sourceType:hive or sourceType:mapreduce OR sourceType:hive
  • Entities of type HDFS with size equal to or greater than 1024 MiB or entities of type Impala - (+sourceType:hdfs +size:[1073741824 TO *]) sourceType:impala
  • Directories owned by hdfs in the path /user/hdfs/input - +owner:hdfs +type:directory +fileSystemPath:"/user/hdfs/input" or owner:hdfs AND type:directory AND fileSystemPath:"/user/hdfs/input"
  • Job started between 20:00 to 21:00 UTC - started:[2013-10-21T20:00:00.000Z TO 2013-10-21T21:00:00.000Z]
  • Custom key-value - project-customer1 - up_project:customer1
  • Technical key-value - In Hive, specify table properties like this:
    ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1');
    To search for this property, specify tp_key1:value1.
  • Managed key-value with multivalued property - MailAnnotation.emailTo:"dana@example.com" MailAnnotation.emailTo:"lee@example.com"

Search Properties

Default Properties

The following properties can be searched by specifying a property value: type, fileSystemPath, inputs, jobId, mapper, mimeType, name, originalName, outputs, owner, principal, reducer, and tags.

Common Properties

Name Type Description
description text Description of the entity.
group caseInsensitiveText The group to which the owner of the entity belongs.
name ngramedText The overridden name of the entity. If the name has not been overridden, this value is empty. Names cannot contain spaces.
operationType ngramedText The type of an operation:
  • Pig - SCRIPT
  • Sqoop - Table Export, Query Import
originalName ngramedText The name of the entity when it was extracted.
originalDescription text The description of the entity when it was extracted.
owner caseInsensitiveText The owner of the entity.
principal caseInsensitiveText For entities with type OPERATION_EXECUTION, the initiator of the entity.
properties string A set of key-value pairs that describe the entity.
tags ngramedText A set of tags that describe the entity.
type tokenizedCaseInsensitiveText The type of the entity. The available types depend on the entity's source type:
  • hdfs - DIRECTORY, FILE, DATASET, FIELD
  • hive - DATABASE, TABLE, FIELD, OPERATION, OPERATION_EXECUTION, SUB_OPERATION, PARTITION, RESOURCE, VIEW
  • impala - OPERATION, OPERATION_EXECUTION, SUB_OPERATION
  • mapreduce - OPERATION, OPERATION_EXECUTION
  • oozie - OPERATION, OPERATION_EXECUTION
  • pig - OPERATION, OPERATION_EXECUTION
  • spark - OPERATION, OPERATION_EXECUTION
  • sqoop - OPERATION, OPERATION_EXECUTION, SUB_OPERATION
  • yarn - OPERATION, OPERATION_EXECUTION, SUB_OPERATION
userEntity Boolean Indicates whether an entity was added using the Cloudera Navigator SDK.
Query
queryText string The text of a Hive, Impala, or Sqoop query.
Source
clusterName string The name of the cluster in which the source is managed.
sourceId string The ID of the source type.
sourceType caseInsensitiveText The source type of the entity: hdfs, hive, impala, mapreduce, oozie, pig, spark, sqoop, or yarn.
sourceUrl string The URL of web application for a resource.
Timestamps
The available timestamp fields vary by the source type:
  • hdfs - created, lastAccessed, lastModified
  • hive - created, lastModified
  • impala, mapreduce, pig, spark, sqoop, and yarn - started, ended
date Timestamps in the Solr Date Format. For example:
  • lastAccessed:[* TO NOW]
  • created:[1976-03-06T23:59:59.999Z TO *]
  • started:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]
  • ended:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]
  • created:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR]
  • lastAccessed:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z]

Dataset Properties

Name Type Description
compressionType tokenizedCaseInsensitiveText The type of compression of a dataset file.
dataType string The data type: record.
datasetType tokenizedCaseInsensitiveText The type of the dataset: Kite.
fileFormat tokenizedCaseInsensitiveText The format of a dataset file: Avro or Parquet.
fullDataType string The full data type: record.
partitionType string The type of the partition.
schemaName string The name of the dataset schema.
schemaNameSpace string The namespace of the dataset schema.

HDFS Properties

Name Type Description
blockSize long The block size of an HDFS file.
deleted Boolean Indicates whether the entity has been moved to the Trash folder.
deleteTime date The time the entity was moved to the Trash folder.
fileSystemPath path The path to the entity.
mimeType ngramedText The MIME type of an HDFS file.
parentPath string The path to the parent entity of a child entity. For example: parent path:/default/sample_07 for the table sample_07 from the Hive database default.
permissions string The UNIX access permissions of the entity.
replication int The number of copies of HDFS file blocks.
size long The exact size of the entity in bytes or a range of sizes. Range examples: size:[1000 TO *], size: [* TO 2000], and size:[* TO *] to find all fields with a size value.

Hive Properties

Name Type Description
Field
dataType ngramedText The type of data stored in a field (column).
Table
compressed Boolean Indicates whether a table is compressed.
serDeLibName string The name of the library containing the SerDe class.
serDeName string The fully qualified name of the SerDe class.
Partition
partitionColNames string The table columns that define the partition.
partitionColValues string The table column values that define the partition.
technical_properties string Hive extended attributes.
clusteredByColNames string The column names that identify how table content is divided into buckets.
sortByColNames string The column names that identify how table content is sorted within a bucket.

MapReduce and YARN Properties

Name Type Description
inputRecursive Boolean Indicates whether files are searched recursively under the input directories, or only files directly under the input directories are considered.
jobId ngramedText The ID of the job. For a job spawned by Oozie, the workflow ID.
mapper string The fully qualified name of the mapper class.
outputKey string The fully qualified name of the class of the output key.
outputValue string The fully qualified name of the class of the output value.
reducer string The fully qualified name of the reducer class.

Operation Properties

Name Type Description
Operation
inputFormat string The fully qualified name of the class of the input format.
outputFormat string The fully qualified name of the class of the output format.
Operation Execution
inputs string The name of the entity input to an operation execution. For entities of resource type mapreduce, yarn, and spark, it is usually a directory. For entities of resource type hive, it is usually a table.
outputs string The name of the entity output from an operation execution. For entities of resource type mapreduce, yarn, and spark, it is usually a directory. For entities of resource type hive, it is usually a table.
engineType string The type of the engine used for an operation: MR or Spark.

Oozie Properties

Name Type Description
status string The status of the Oozie workflow: RUNNING, SUCCEEDED, or FAILED.

Pig Properties

Name Type Description
scriptId string The ID of the Pig script.

Sqoop Properties

Name Type Description
dbURL string The URL of the database from or to which the data was imported or exported.
dbTable string The table from or to which the data was imported or exported.
dbUser string The database user.
dbWhere string A where clause that identifies which rows were imported.
dbColumnExpression string An expression that identifies which columns were imported.