Beeswax

The Beeswax application enables you to perform queries on Apache Hive, a data warehousing system designed to work with Hadoop. For information about Hive, see Hive Documentation. You can create Hive tables, load data, create, run, and manage queries, and download the results in a Microsoft Office Excel worksheet file or a comma-separated values file.

Beeswax and Hive Installation and Configuration

Beeswax is installed and configured as part of Hue. For information about installing and configuring Hue, see Hue Installation.

Beeswax assumes an existing Hive installation. The Hue installation instructions include the configuration necessary for Beeswax to access Hive. You can view the current Hive configuration from from the Settings tab in the Beeswax application.

By default, a Beeswax user can see the saved queries for all users – both his/her own queries and those of other Beeswax users. To restrict viewing saved queries to the query owner and Hue administrators, set the share_saved_queries property under the [beeswax] section in the Hue configuration file to false.

Starting Beeswax

To start the Beeswax application, click the Beeswax icon ( images/image6.png ) in the navigation bar at the top of the Hue browser page.

Installing the Sample Tables

You can install two sample tables to use as examples.

  1. In the Beeswax window, click Tables.
  2. In the ACTIONS pane, click Install samples.

Once you have installed the sample data, you will no longer see the Install samples link.

Importing Your Own Data

If you want to import your own data instead of installing the sample tables, follow the procedure in Creating Tables.

Working with Queries

The Query Editor view lets you create queries in the Hive Query Language (HQL), which is similar to Structured Query Language (SQL). You can name and save your queries to use later. When you submit a query, the Beeswax Server uses Hive to run the queries. You can either wait for the query to complete, or return later to find the queries in the History view. You can also request receive an email message after the query is completed.

Creating and Running Queries

  Note:

To run a query, you must be logged in to Hue as a user that also has a Unix user account on the remote server.

To create and run a query:

  1. In the Query Editor window, type the query. For example, to select all data from the sample_08 table, you would type:
    SELECT * FROM sample_08
  2. In the box to the left of the Query field, you can override the default Hive and Hadoop settings, specify file resources and user-defined functions, and enable users to enter parameters at run-time, and request email notification when the job is complete. See Advanced Query Settings for details on using these settings.
  3. To save your query and advanced settings to use again later, click Save As, enter a name and description, and then click OK. To save changes to an existing query, click Save.
  4. If you want to view the execution plan for the query, click Explain. For more information, see http://wiki.apache.org/hadoop/Hive/LanguageManual/Explain.
  5. To run the query, click Execute. The Query Results window displays with the results of the query.
  6. Do any of the following to download or save the query results:
    • Click Download as CSV to download the results in a comma-separated values file suitable for use in other applications.
    • Click Download as XLS to download the results in a Microsoft Office Excel worksheet file.
    • Click Save to save the results in a table or HDFS file.
      • To save the results in a new table, select In a new table, enter a table name, and then click Save.
      • To save the results in an HDFS file, select In an HDFS directory, enter a path and then click Save. You can then download the file with FileBrowser.
      Important:
    • You can only save results to a file when the results were generated by a MapReduce job.
    • This is the preferred way to save when the result is large (for example > 1M rows).
  • Under MR Jobs, you can view any MapReduce jobs that the query started.
  • To view a log of the query execution, click Log at the top of the results display. You can use the information in this tab to debug your query.
  • To view the query that generated these results, click Query at the top of the results display.
  • To view the columns of the query, click Columns.
  • To return to the query in the Query Editor, click Unsaved Query.

Advanced Query Settings

The pane to the left of the Query Editor lets you specify the following options:

Option

Description

DATABASE

The database containing the table definitions.

SETTINGS

Override the Hive and Hadoop default settings. Click Add to configure a new setting. »   For Key, enter a Hive or Hadoop configuration variable name. »   For Value, enter the value you want to use for the variable. For example, to override the directory where structured Hive query logs are created, you would enter hive.querylog.location for Key, and a path for Value. To view the default settings, click the Settings tab at the top of the page. For information about Hive configuration variables, see: http://wiki.apache.org/hadoop/Hive/AdminManual/Configuration. For information about Hadoop configuration variables, see: http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

FILE RESOURCES

Make locally accessible files available at query execution time on the entire Hadoop cluster. Hive uses Hadoop's Distributed Cache to distribute the added files to all machines in the cluster at query execution time. Click Add to configure a new setting. From the Type drop-down menu, choose one of the following: jar — Adds the resources to the Java classpath. This is required in order to reference objects such as user defined functions. archive — Automatically unarchives resources when distributing them. file — Adds resources to the distributed cache. Typically, this might be a transform script (or similar) to be executed. For Path, enter the path to the file or click Choose a File to browse and select the file. 

  Note: It is not necessary to specify files used in a transform script if the files are available in the same path on all machines in the Hadoop cluster.

USER-DEFINED FUNCTIONS

Specify user-defined functions in a query. Specify the function name for Name, and specify the class name for Class name. Click Add to configure a new setting. You must specify a JAR file for the user-defined functions in File Resources. To include a user-defined function in a query, add a $ (dollar sign) before the function name in the query. For example, if MyTable is a user-defined function name in the query, you would type: SELECT * $MyTable

PARAMETERIZATION

Indicate that a dialog box should display to enter parameter values when a query containing the string $<parametername> is executed. Enabled by default.

EMAIL NOTIFICATION

Indicate that an email message should be sent after a query completes. The email is sent to the email address specified in the logged-in user's profile.

Viewing Query History

Beeswax enables you to view the history of queries that you have previously run. Results for these queries are available for one week or until Hue is restarted.

To view query history:

  1. In the Beeswax window, click History. Beeswax displays a list of your saved and unsaved queries in the Query History window.
  2. To display the queries for all users, click Show everyone's queries. To display your queries only, click Show my queries.
  3. To display the automatically generated actions that Beeswax performed on a user's behalf, click Show auto actions. To display user queries again, click Show user queries.

Viewing, Editing, or Deleting My Queries

You can view a list of saved queries of all users by clicking Saved Queries in the Beeswax window. You can copy any user's query, but you can only edit, delete, and view the history of your own queries.

To edit a saved query:

  1. In the Beeswax window, click Saved Queries. The Queries window displays.
  2. Click the Options button next to the query and choose Edit from the context menu. The query displays in the Query Editor window.
  3. Change the query and then click Save. You can also click Save As, enter a new name, and click OK to save a copy of the query.

To delete a saved query:

  1. In the Beeswax window, click Saved Queries. The Queries window displays.
  2. Click the Options button next to the query and choose Delete from the context menu.
  3. Click Yes to confirm the deletion.

To copy a saved query:

  1. In the Beeswax window, click Saved Queries. The Queries window displays.
  2. Click the Options button next to the query and choose Clone from the context menu. Beeswax displays the query in the Query Editor window.
  3. Change the query as necessary and then click Save. You can also click Save As, enter a new name, and click Ok to save a copy of the query.

To copy a query in the Beeswax Query History window:

  1. In the Beeswax window, click History. The Query History window displays.
  2. To display the queries for all users, click Show everyone's queries. The queries for all users display in the Query History window.
  3. Click the Clone link next to the query you want to copy. A copy of the query displays in the Query Editor window.
  4. Change the query, if necessary, and then click Save As, enter a new name, and click OK to save the query.

Working with Tables

Selecting the Database

  1. In the pane on the left, select the database from the DATABASE drop-down list.

Creating Tables

Although you can create tables by executing the appropriate HQL DDL query commands, it is easier to create a table using the Beeswax table creation wizard.

There are two ways to create a table: from a file or manually.

If you create a table from a file, the format of the data in the file will determine some of the properties of the table, such as the record and file formats. The data from the file you specify is imported automatically upon table creation.

When you create a file manually, you specify all the properties of the table, and then execute the resulting query to actually create the table. You then import data into the table as an additional step.

To create a table from a file:

  1. In the Beeswax window, click Tables.
  2. In the ACTIONS pane, click Create a new table from a file. The table creation wizard starts.
  3. Follow the instructions in the wizard to create the table. The basic steps are:
    • Choose your input file. The input file you specify must exist. Note that you can choose to have Beeswax create the table definition only based on the import file you select, without actually importing data from that file.
    • Specify the column delimiter.
    • Define your columns, providing a name and selecting the type.
  4. Click Create Table to create the table. The new table's metadata displays on the right side of the Table Metadata window. At this point, you can view the metadata or a sample of the data in the table. From the ACTIONS pane you can import new data into the table, browse the table, drop it, or go to the File Browser to see the location of the data.

To create a table manually:

  1. In the Beeswax window, click Tables.
  2. In the ACTIONS pane, click Create a new table manually. The table creation wizard starts.
  3. Follow the instructions in the wizard to create the table. The basic steps are:
    • Name your table.
    • Choose the record format.
    • Configure record serialization by specifying delimiters for columns, collections, and map keys.
    • Choose the file format.
    • Specify the location for your table's data.
    • Define your columns, providing a name and selecting the type.
    • Add partitions, if appropriate.
  4. Click Create table. The Table Metadata window displays.

Browsing Tables

To browse the data in a table:

  1. In the Table List window, click the Browse Data button next to the table you want to browse. The table's data displays in the Query Results window.

To browse the metadata in a table:

  1. In the Table List window, click the table name. The table's metadata displays opened to the Columns tab. You can view the data in the table by selecting the Sample tab.

Importing Data into Tables

When importing data, you can choose to append or overwrite the table's data with data from a file.

To import data into a table:

  1. In the Table List window, click the table name. The Table Metadata window displays.
  2. In the ACTIONS pane, click Import Data.
  3. For Path, enter the path to the file that contains the data you want to import.
  4. Check Overwrite existing data to replace the data in the selected table with the imported data. Leave this unchecked to append to the table.
  5. Click Submit.

Dropping Tables

To drop a table:

  1. In the Table List window, click the table name. The Table Metadata window displays.
  2. In the ACTIONS pane, click Drop Table.
  3. Click Yes to confirm the deletion.

Viewing a Table's Location

To view a table's location:

  1. In the Table List window, click the table name. The Table Metadata window displays.
  2. Click View File Location. The file location of the selected table displays in its directory in the File Browser window.