Schema

Required Role:

A table schema contains information about the names and types of the columns of a table.

A Kite dataset ingested into HDFS contains information about the names and types of the fields in an HDFS Avro or Parquet file used to create the dataset.

Displaying Hive, Impala, and Sqoop Table Schema

  1. Perform a metadata search for entities of source type Hive and type Table.
  2. In the list of results, click a result entry. The table schema displays in the Details tab.

Displaying Pig Table Schema

  1. Perform a metadata search for entities of source type Pig.
  2. In the list of results, click a result entry of type Table. The table schema displays in the Details tab.

Displaying HDFS Dataset Schema

If you ingest a Kite dataset into HDFS you can view the schema of the dataset. The schema is represented as an entity of type Dataset and is implemented as an HDFS directory.

For Avro datasets, primitive types such as null, string, int, and so on, are not separate entities. For example, if you have a record type with a field A that's a record type and a field B that's a string, the subfields of A become entities themselves, but B has no children. Another example would be if you had a union of null, string, map, array, and record types; the union has 3 children - the map, array, and record subtypes.

To display an HDFS dataset schema:

  1. Perform a metadata search for entities of type Dataset.
  2. Click a result entry. The dataset schema displays in the Details tab.

Stocks Schema

  1. Use the Stocks Avro schema file:
    {
      "type" : "record",
      "name" : "Stocks",
      "namespace" : "com.example.stocks",
      "doc" : "Schema generated by Kite",
      "fields" : [ {
        "name" : "Symbol",
        "type" : [ "null", "string" ],
        "doc" : "Type inferred from 'AAIT'"
      }, {
        "name" : "Date",
        "type" : [ "null", "string" ],
        "doc" : "Type inferred from '28-Oct-2014'"
      }, {
        "name" : "Open",
        "type" : [ "null", "double" ],
        "doc" : "Type inferred from '33.1'"
      }, {
        "name" : "High",
        "type" : [ "null", "double" ],
        "doc" : "Type inferred from '33.13'"
      }, {
        "name" : "Low",
        "type" : [ "null", "double" ],
        "doc" : "Type inferred from '33.1'"
      }, {
        "name" : "Close",
        "type" : [ "null", "double" ],
        "doc" : "Type inferred from '33.13'"
      }, {
        "name" : "Volume",
        "type" : [ "null", "long" ],
        "doc" : "Type inferred from '400'"
      } ]
    }
    and the kite-dataset command to create a Stocks dataset:
    kite-dataset create dataset:hdfs:/user/hdfs/Stocks -s Stocks.avsc
    The following directory is created in HDFS:

  2. In search results, the Stocks dataset appears as follows:

  3. Click the Stocks link. The schema displays at the right of the Details tab.

    Each subfield of the Stocks record is an entity of type Field.

  4. Then use the kite-dataset command to import structured data:
    kite-dataset csv-import ./Stocks.csv dataset:hdfs:/user/hdfs/Stocks --no-header
    where Stocks.csv is:
    AAPL,20150206,120.02,120.25,118.45,118.93,43372000
    AAPL,20150205,120.02,120.23,119.25,119.94,42246200
    GOOG,20150304,571.87,577.11,568.01,573.37,1713800
    GOOG,20150303,570.45,575.39,566.52,573.64,1694300
    GOOG,20150302,560.53,572.15,558.75,571.34,2118400
    GOOG,20150209,528,532,526.02,527.83,1264300
    GOOG,20150206,527.64,537.2,526.41,531,1744600
    GOOG,20150205,523.79,528.5,522.09,527.58,1844700
    FB,20150304,79.3,81.15,78.85,80.9,28014500
    FB,20150303,79.61,79.7,78.52,79.6,18567300
    FB,20150302,79,79.86,78.52,79.75,21604400
    FB,20150227,80.68,81.23,78.62,78.97,30635700
    FB,20150226,79.88,81.37,79.72,80.41,31111900
    TWTR,20150211,46.27,47.78,46.11,47.5,24747000
    TWTR,20150210,47.35,47.39,45.57,46.26,32287800
    TWTR,20150209,46.73,47.69,46.5,47.32,36177900
    TWTR,20150206,46.12,48.5,45.8,48.01,102669800
    TWTR,20150205,42.04,42.47,40.91,41.26,61997300
    MSFT,20150304,43.01,43.21,42.88,43.06,25705800
    MSFT,20150303,43.56,43.83,43.09,43.28,31748600
    MSFT,20150302,43.67,44.19,43.55,43.88,31924000
    MSFT,20150227,44.13,44.2,43.66,43.85,33807700
    MSFT,20150226,43.99,44.23,43.89,44.06,28957300
    ORCL,20150304,43.2,43.66,42.82,43.61,14663900
    ORCL,20150303,43.83,43.88,43.17,43.38,10058700
    ORCL,20150302,43.81,44.04,43.48,44.03,11091000
    ORCL,20150227,43.77,44.11,43.68,43.82,9549500
    ORCL,20150226,43.8,44.15,43.71,43.89,8519300
    ORCL,20150225,43.83,44.09,43.38,43.73,11785400