YCSB joins Ibis, Apache Hive-on-Apache Spark, Apache Kafka (graduated), and Apache Phoenix Projects
What: Cloudera announces that the Yahoo! Cloud Serving Benchmark (YCSB), an open source framework for evaluating and comparing the performance of multiple types of data-serving systems (including NoSQL stores such as Apache HBase, Apache Cassandra, Redis, MongoDB, and Voldemort) is now available to CDH users.
Why: YCSB has long been the de facto open standard for comparative performance evaluation of NoSQL data stores. Many factors go into deciding which data store to use for production applications, including basic features, data model, and performance characteristics on a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that sound architectural decisions can be made.
From the perspective of a generic, database-neutral, performance evaluation utility, YCSB is currently the de-facto comparative benchmark for NoSQL stores. It includes support for a wide range of database bindings and is commonly used to compare their performance for a set of desired workloads. Being open source and extensible, support for additional databases is regularly added.
Who: Cloudera sees great value in the YCSB project for the HBase community, and recently Cloudera engineers have been working with Brian Cooper, the original author of YCSB, to reinvigorate the project within the developer community. A number of enhancements have already been added, and a regular release cycle has been established. Some of the recent improvements to YCSB include:
- Latency capture via HDRHistogram
- Measuring transaction latency against a fixed schedule
- Support for an additional JSON format
- Better reporting and status output
- New database bindings
When: Available now, Cloudera CDH users can now easily install and use YCSB to evaluate the performance of their HBase deployments by taking advantage of new packages in Cloudera Labs. (As with all Cloudera Labs projects, although these packages are not currently supported we do strongly encourage you to experiment with them.)
How YCSB Works:
YCSB was developed at Yahoo! Labs to provide a framework and common set of workloads for evaluating the performance of different key-value stores. It has two parts:
- The YCSB Client, an extensible workload generator
- The core workloads, a set of workload scenarios to be executed by the generator
The core workloads provide a well rounded picture of a system’s performance, and the client is extensible so that you can define additional workloads to examine system aspects or application scenarios not covered by the core workload; the client can also be extended to benchmark different databases. YCSB ships with bindings for a long list of databases including HBase, Cassandra, Apache Accumulo, MongoDB, and Voldemort, and support for a different data store can be added by writing an interface layer.
To benchmark multiple data stores and compare them, you can install multiple data stores on multiple instances of an identical hardware configuration and run the same workloads against each instance. Next, plot the performance of each system, to see their relative performance profiles. One example of a good visualization to try is latency versus throughput curves.
Installing YCSB with CDH
YCSB packages and parcels for CDH, including basic documentation, can be downloaded from here. (YCSB 0.3.0 is the version packaged in Cloudera Labs.) To install this version, within Cloudera Manager, click on the “Parcels” icon in the top bar, then click on the “Edit Settings” button. Add the following link to the list of URLs enumerated in the “Remote Parcel Repository URLs” setting:http://archive.cloudera.com/cloudera-labs/ycsb/parcels/latest. Then, install the parcel and activate it.
Users will see value in YCSB for benchmarking HBase deployments. Share feedback or questions on the Cloudera Labs area at community.cloudera.com.
Useful additional reading:
- Official YCSB documentation
- The original paper on YCSB published by Brian Cooper et al
- Blog: YCSB, the Open Standard for NoSQL Benchmarking, Joins Cloudera Labs
Cloudera is revolutionizing enterprise data management by offering the first unified Platform for big data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera's open source big data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 40,000 individuals worldwide. Over 1,700 partners and a seasoned professional services team help deliver greater time to value. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production.
Connect With Cloudera
Follow us on Twitter: http://twitter.com/cloudera
Visit us on Facebook: http://www.facebook.com/cloudera
Join the Cloudera Community: http://cloudera.com/community
Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Editionand CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.