Metadata Search Syntax and Properties

In Cloudera Navigator, metadata search is implemented by an embedded Solr engine that supports the syntax described in LuceneQParserPlugin.

Search Syntax

You construct search strings by specifying the value of a default property and the following three types of key-value pairs using the given syntax:

  • Technical metadata key-value pairs - key:value, where
    • key is one of the properties listed in Search Properties.
    • value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values you must escape special characters :, -, /, and * with the backslash character \ or enclose the property value in quotes. For example, fileSystemPath:/tmp/hbase\-staging.
    These key-value pairs are read-only and cannot be modified.
  • Custom metadata key-value pairs - up_key:value, where
    • key is a user-defined property defined on an entity after extraction.
    • value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values you must escape special characters :, -, /, and * with the backslash character \ or enclose the property value in quotes. For example, fileSystemPath:/tmp/hbase\-staging.
    Custom metadata key-value pairs can be modified.
  • Hive extended attribute key-value pairs - tp_key:value, where
    • key is an extended attribute defined on a Hive entity before extraction. The syntax of the attribute is specific to Hive.
    • value is a single value supported by the entity type.
    These key-value pairs are read-only and cannot be modified.

To construct complex strings, join multiple property-value pairs using the or and and operators.

Example Search Strings

  • Filesystem path /user/admin - fileSystemPath:\/user\/admin
  • Descriptions that start with the string "Banking" - description:Banking*
  • Sources of type MapReduce or Hive - sourceType:mapreduce or sourceType:hive
  • Directories owned by hdfs in the path /user/hdfs/input - owner:hdfs and type:directory and fileSystemPath:"/user/hdfs/input"
  • Job started between 20:00 to 21:00 UTC - started:[2013-10-21T20:00:00.000Z TO 2013-10-21T21:00:00.000Z]
  • User-defined key-value project-customer1 - up_project:customer1
  • Technical key-value - In Hive you can specify table properties like this:
    ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1');
    To query for this property, specify tp_key1:value1.

Search Properties

Default Properties

The following properties can be searched by specifying a property value: type, fileSystemPath, inputs, jobId, mapper, mimeType, name, originalName, outputs, owner, principal, reducer, and tags.

Common Properties

Name Type Description
description text Description of the entity.
group caseInsensitiveText The group to which the owner of the entity belongs.
name ngramedText The overridden name of the entity. If the name has not been overridden, this value is empty. Names cannot contain spaces.
operationType ngramedText The type of an operation:
  • Pig - SCRIPT
  • Sqoop - Table Export, Query Import
originalName ngramedText The name of the entity when it was extracted.
originalDescription text The description of the entity when it was extracted.
owner caseInsensitiveText The owner of the entity.
principal caseInsensitiveText For entities with type OPERATION_EXECUTION, the initiator of the entity.
properties string A set of key-value pairs that describe the entity.
tags ngramedText A set of tags that describe the entity.
type tokenizedCaseInsensitiveText The type of the entity. The available types depend on the entity's source type:
  • hdfs - DIRECTORY, FILE, DATASET, FIELD
  • hive - DATABASE, TABLE, FIELD, OPERATION, OPERATION_EXECUTION, SUB_OPERATION, PARTITION, RESOURCE, VIEW
  • impala - OPERATION, OPERATION_EXECUTION, SUB_OPERATION
  • mapreduce - OPERATION, OPERATION_EXECUTION
  • oozie - OPERATION, OPERATION_EXECUTION
  • pig - OPERATION, OPERATION_EXECUTION
  • spark - OPERATION, OPERATION_EXECUTION
  • sqoop - OPERATION, OPERATION_EXECUTION, SUB_OPERATION
  • yarn - OPERATION, OPERATION_EXECUTION, SUB_OPERATION
userEntity Boolean Indicates whether an entity was added using the Cloudera Navigator SDK.
Query
queryText string The text of a Hive, Impala, or Sqoop query.
Source
clusterName string The name of the cluster in which the source is managed.
sourceId string The ID of the source type.
sourceType caseInsensitiveText The source type of the entity: hdfs, hive, impala, mapreduce, oozie, pig, spark, sqoop, or yarn.
sourceUrl string The URL of web application for a resource.
Timestamps
The available timestamp fields vary by the source type:
  • hdfs - created, lastAccessed, lastModified
  • hive - created, lastModified
  • impala, mapreduce, pig, spark, sqoop, and yarn - started, ended
date Timestamps in the Solr Date Format. For example:
  • lastAccessed:[* TO NOW]
  • created:[1976-03-06T23:59:59.999Z TO *]
  • started:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]
  • ended:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]
  • created:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR]
  • lastAccessed:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z]

HDFS Properties

Name Type Description
blockSize long The block size of an HDFS file.
deleted Boolean Indicates whether the entity has been moved to the Trash folder.
deleteTime date The time the entity was moved to the Trash folder.
fileSystemPath path The path to the entity.
mimeType ngramedText The MIME type of an HDFS file.
parentPath string The path to the parent entity of a child entity. For example: parent path:/default/sample_07 for the table sample_07 from the Hive database default.
permissions string The UNIX access permissions of the entity.
replication int The number of copies of HDFS file blocks.
size long The exact size of the entity in bytes or a range of sizes. Range examples: size:[1000 TO *], size: [* TO 2000], and size:[* TO *] to find all fields with a size value.

Dataset Properties

Name Type Description
compressionType tokenizedCaseInsensitiveText The type of compression of a dataset file.
dataType string The data type: record.
datasetType tokenizedCaseInsensitiveText The type of the dataset: Kite.
fileFormat tokenizedCaseInsensitiveText The format of a dataset file: Avro or Parquet.
fullDataType string The full data type: record.
partitionType string The type of the partition.
schemaName string The name of the dataset schema.
schemaNameSpace string The namespace of the dataset schema.

MapReduce and YARN Properties

Name Type Description
inputRecursive Boolean Indicates whether files are searched recursively under the input directories, or just files directly under the input directories are considered.
jobId ngramedText The ID of the job. For a job spawned by Oozie, the workflow ID.
mapper string The fully-qualified name of the mapper class.
outputKey string The fully-qualified name of the class of the output key.
outputValue string The fully-qualified name of the class of the output value.
reducer string The fully-qualified name of the reducer class.

Operation Properties

Name Type Description
Operation
inputFormat string The fully-qualified name of the class of the input format.
outputFormat string The fully-qualified name of the class of the output format.
Operation Execution
inputs string The name of the entity input to an operation execution. For entities of resource type mapreduce, yarn, and spark, it is usually a directory. For entities of resource type hive, it is usually a table.
outputs string The name of the entity output from an operation execution. For entities of resource type mapreduce, yarn, and spark, it is usually a directory. For entities of resource type hive, it is usually a table.
engineType string The type of the engine used for an operation: MR or Spark.

Hive Properties

Name Type Description
Field
dataType ngramedText The type of data stored in a field (column).
Table
compressed Boolean Indicates whether a table is compressed.
serDeLibName string The name of the library containing the SerDe class.
serDeName string The fully-qualified name of the SerDe class.
Partition
partitionColNames string The table columns that define the partition.
partitionColValues string The table column values that define the partition.
technical_properties string Hive extended attributes.
clusteredByColNames string The column names that identify how table content is divided into buckets.
sortByColNames string The column names that identify how table content is sorted within a bucket.

Oozie Properties

Name Type Description
status string The status of the Oozie workflow: RUNNING, SUCCEEDED, or FAILED.

Pig Properties

Name Type Description
scriptId string The ID of the Pig script.

Sqoop Properties

Name Type Description
dbURL string The URL of the database from or to which the data was imported or exported.
dbTable string The table from or to which the data was imported or exported.
dbUser string The database user.
dbWhere string A where clause that identifies which rows were imported.
dbColumnExpression string An expression that identifies which columns were imported.