Have you ever wondered how massive business and consumer apps handle that kind of scale with concurrent users? To deploy high-performance applications at scale, a rugged operational database is essential. Cloudera Operational Database (COD) is a high-performance and highly scalable operational database designed for powering the biggest data applications on the planet at any scale. Powered by Apache HBase and Apache Phoenix, COD ships out of the box with Cloudera Data Platform (CDP) in the public cloud. It’s also multi-cloud ready to meet your business where it is today, whether AWS, Microsoft Azure, or GCP.
Support for cloud storage is an important capability of COD that, in addition to the pre-existing support for HDFS on local storage, offers a choice of price performance characteristics to the customers.
To understand how COD delivers the best cost-efficient performance for your applications, let’s dive into benchmarking results comparing COD using cloud storage vs COD on premises.
Test Environment:
The performance comparison was done to measure the performance differences between COD using storage on Hadoop Distributed File System (HDFS) and COD using cloud storage. We tested for two cloud storages, AWS S3 and Azure ABFS. These performance measurements were done on COD 7.2.15 runtime version.
The performance benchmark was done to measure the following aspects:
The following configuration was used to setup a sidecar cluster:
The cluster running with HBase on cloud storage was configured with a combined bucket cache size across the cluster as 32TB, with L2 bucket cache configured to use file-based cache storage on ephemeral storage volumes of 1.6TB capacity each. We ensured that this bucket cache was warmed up almost completely, i.e. all the regions on all the region servers were read into the bucket cache. This is done automatically whenever the region servers are started.
All the tests were run using YCSB benchmarking tool on COD with the following configurations:
Here is some important information regarding the test methodology:
The following parameters were used to run the workloads using YCSB:
The test started by loading 20TB of data into a COD cluster running HBase on HDFS. This load was carried out using the 10 node sidecar on the 20 node COD cluster running HBase on HDFS. Subsequently, a snapshot of this loaded data was taken and restored to the other COD clusters running HBase on Amazon S3 and Microsoft Azure ABFS. The following observations were made during this activity:
Based on the data shown above, we made the following observations:
As mentioned above, the cache was warmed to its full capacity in case of S3 based cluster. This cache warming took around 130 minutes with an average throughput of 2.62 GB/s.
The following chart shows the cache warming throughput with S3:
The following charts show the throughput and latencies observed in different run configurations:
The following few charts show comparative representation of various parameters when HBase is running on HDFS as compared to HBase running on S3.
AWS-HBase-Throughput (Ops/sec)
The following chart shows the throughput observed while running workloads on HDFS and AWS S3. Overall, AWS shows a better throughput performance as compared to HDFS.
AWS-HBase-Read Latency
The chart below shows the read latency observed while running the read workloads. Overall, the read latency is improved with AWS with ephemeral storage when compared to the HDFS.
AWS-HBase-Write Latency
The chart below shows the write latency observed while running the workloads A and F. The S3 shows an overall improvement in the write latency during the write heavy workloads.
The tests were also run to compare the performance of Phoenix when run with HBase running on HDFS as compared to HBase running on S3. The following charts show the performance comparison of a few key indicators.
AWS-Phoenix-Throughput(ops/sec)
The chart below shows the average throughput when the workloads were run with Phoenix against HDFS and S3. The overall read throughput is found to be better than the write throughput during the tests.
AWS-Phoenix Read Latency
The overall read latency for the read heavy workloads shows improvement when using S3. The chart below shows that the read latency observed with S3 is better by multifold when compared with the latency observed while running the workloads on HDFS.
AWS-Phoenix-Write Latency
The write heavy workload shows tremendous improvement in the performance because of the reduced write latency in S3 when compared to HDFS.
Azure
The performance measurements were also conducted on HBase running on Azure ABFS storage and the results were compared with HBase running on HDFS. The following few charts show the comparison of key performance metrics when HBase is running on HDFS vs. HBase running on ABFS.
Azure-HBase-Throughput(ops/sec)
The workloads running on HBase ABFS show almost 2x improvement when compared to HBase running on HDFS as depicted in the chart below.
Azure-Hbase-Read Latency
The chart below shows the read latency observed while running the read heavy workloads on HBase running on HDFS vs. HBase running on ABFS. Overall, the read latency in HBase running on ABFS is found to be more than 2x better when compared to HBase running on HDFS.
Azure-Hbase-Write Latency
The write-heavy workload results shown in the below chart show an improvement of almost 1.5x in the write latency in HBase running on ABFS as compared to HBase running on HDFS.
The duration of this warming-up phase typically falls within the range of three to five hours for a cluster configured with 1.5TB of cache per worker. This initial phase ensures optimized performance once the cache is fully populated and the cluster is running at its peak efficiency.
The inherent latency linked with such storage solutions is expected to cause slowness in retrieving data from cloud storage. And also, each access results in incurring a cost. However, the cloud storage's built-in throttling mechanism stands as another significant factor that affects performance and resilience. This mechanism confines the number of allowed calls per second per prefix. Exceeding this limit results in unattended requests, with the potential consequence of halting cluster operations.
In this scenario, cache warming takes on a pivotal role in avoiding such situations. By proactively populating the cache with the data, the cluster can bypass a reliance on frequent and potentially throttled storage requests.
The table below shows the total cost of ownership (TCO) for a cluster running COD on S3 without ephemeral cache (DR scenario) and with ephemeral cache (production scenario) as compared with a cluster running COD on HDFS.
We observed that the overall throughput of HBase with cloud storages with bucket cache is better than HBase running on HDFS with HDD. Here’s some highlights:
DR Cluster: This cluster is dedicated to disaster recovery efforts and typically handles write operations from less critical applications. Leveraging cloud storage without local storage to support cache, users can expect to achieve approximately 25% cost savings compared to an HDFS-based cluster.
Prod Cluster: Serving as the primary cluster, this environment functions as the definitive source of truth for all read and write activities generated by applications. By utilizing cloud storage solution with local storage to support cache, users can realize a substantial 40% reduction in costs
Visit the product page to learn more about Cloudera Operational Database or reach out to your account team.
This may have been caused by one of the following: