What Phoenix Does
Apache HBase provides random, real time access to data in Hadoop. It’s well adopted in the Hadoop ecosystem. Apache Phoenix abstract away the underlying data store by enable you to query the data with standard SQL via JDBC driver. Apache Phoenix provides features such as secondary indexes to help you speed up the queries without relying on specific row key designs.
Apache Phoenix is also massively parallel where aggregation queries are executed on the nodes where data is stored, greatly reduce the need to send data over the network.
|Familiar||Query data with a SQL-based language|
|Reliable||Built on top of proven data store HBase|
|Platform agnostic||Hortonworks’ Phoenix provides ODBC connector drivers, allowing you to connect to your dataset using familiar BI tools.|
How Phoenix works
Phoenix provides fast access to large amount of data. Full table scan of 100M rows usually completes in 20 seconds (narrow table on a medium sized cluster). This time come down to few milliseconds if query contains filter on key columns. For filters on non-key columns or non-leading key columns, you can add secondary indexes on these columns which leads to performance equivalent to filtering on key column by making copy of table with indexed column(s) part of key.
Why is Phoenix fast even when doing full scan:
- Phoenix chunks up your query using the region boundaries and runs them in parallel on the client using a configurable number of threads
- The aggregation will be done in a coprocessor on the server-side, collapsing the amount of data that gets returned back to the client rather than returning it all.