Your browser is out of date

Update your browser to view this website correctly. Update my browser now


Apache Phoenix

Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. It enables developers to access large dataset in real-time with familiar SQL interface.

  • Standard SQL and JDBC APIs with full ACID transaction capabilities
  • Support for late-bound, schema-on-read with existing data in HBase
  • Access data stored and produced in other Hadoop products such as Spark, Hive, Pig, Flume, and MapReduce

What Phoenix Does

Apache HBase provides random, real time access to data in Hadoop. It’s well adopted in the Hadoop ecosystem. Apache Phoenix abstract away the underlying data store by enable you to query the data with standard SQL via JDBC driver. Apache Phoenix provides features such as secondary indexes to help you speed up the queries without relying on specific row key designs.

Apache Phoenix is also massively parallel where aggregation queries are executed on the nodes where data is stored, greatly reduce the need to send data over the network.

Feature Description
Familiar Query data with a SQL-based language
Fast Real-time queries
Reliable Built on top of proven data store HBase
Platform agnostic Hortonworks’ Phoenix provides ODBC connector drivers, allowing you to connect to your dataset using familiar BI tools.

How Phoenix works

Phoenix provides fast access to large amount of data. Full table scan of 100M rows usually completes in 20 seconds (narrow table on a medium sized cluster). This time come down to few milliseconds if query contains filter on key columns. For filters on non-key columns or non-leading key columns, you can add secondary indexes on these columns which leads to performance equivalent to filtering on key column by making copy of table with indexed column(s) part of key.

Why is Phoenix fast even when doing full scan:

  1. Phoenix chunks up your query using the region boundaries and runs them in parallel on the client using a configurable number of threads
  2. The aggregation will be done in a coprocessor on the server-side, collapsing the amount of data that gets returned back to the client rather than returning it all.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.