Apache Phoenix
Apache Phoenix is an add-on for Apache HBase that provides a programmatic ANSI SQL interface. Apache Phoenix implements best-practice optimizations to enable software engineers to develop next-generation data-driven applications based on HBase. Using Phoenix, you can create and interact with tables in the form of typical DDL/DML statements using the standard JDBC API.
Phoenix provides:
- SQL and JDBC API support
- Support for late-bound, schema-on-read
- Access to data stored and produced in other components such as Apache Spark and Apache Hive

What Phoenix does
Apache HBase is an OLTP database for applications that want to leverage big data or need high-availability and seamless scalability. Many customers use this data store for deploying machine learning-based applications, high concurrency apps like web scale and mobile apps, customer-facing dashboards, fraud analysis, and more.
Apache Phoenix abstracts the underlying data store and allows you to query data using standard SQL. Apache Phoenix provides features such as secondary indexes to help you boost the speed of your queries without relying on specific row-key designs and enables users to use star schemes.
Apache Phoenix is massively parallel where aggregation queries are executed on the nodes where the data is stored, therefore reducing the need to send data over the network.
Feature | Description |
---|---|
Familiar | Query data using an SQL-based language |
Fast | Real-time queries |
Scalable & reliable | Leverages HBase’s proven scalability and availability to scale seamlessly to petabytes of data, and thousands of clients |
Platform agnostic | Phoenix provides ODBC connector drivers, allowing you to connect to your dataset using familiar business intelligence (BI) tools |
How Phoenix works
Phoenix provides fast access to large amounts of data; performing single millisecond reads, writes, and updates, as well as fast table scans; for example, you can scan 100 million rows in 20 seconds (narrow table on a medium-sized cluster). This query time can be reduced to a few milliseconds if the query contains filters on key columns. For filters on non-key columns or non-leading key columns, you can add secondary indexes on these columns, which leads to the performance equivalent of filtering on key columns by making a copy of the table with indexed column(s) part of the key.
Why is Phoenix fast even when doing a full scan?
- Phoenix chunks up your query using guidePosts, which means more threads working on a single region.
- Phoenix runs the queries in parallel on the client using a configurable number of threads.
- Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.