Are you struggling to manage the ever-increasing volume and variety of data in today’s constantly evolving landscape of modern data architectures? The vast tapestry of data types spanning structured, semi-structured, and unstructured data means data professionals need to be proficient with various data formats such as ORC, Parquet, Avro, CSV, and Apache Iceberg tables, to cover the ever growing spectrum of datasets – be they images, videos, sensor data, or other type of media content. Navigating this intricate maze of data can be challenging, and that’s why Apache Ozone has become a popular, cloud-native storage solution that spans any data use case with the performance needed for today’s data architectures.
Apache Ozone, a highly scalable, high performance distributed object store, provides the ideal solution to this requirement with its bucket layout flexibility and multi-protocol support. Apache Ozone is compatible with Amazon S3 and Hadoop FileSystem protocols and provides bucket layouts that are optimized for both Object Store and File system semantics. With these features, Apache Ozone can be used as a pure object store, a Hadoop Compatible FileSystem (HCFS), or both, enabling users to store different types of data in a single store and access the same data using multiple protocols providing the scale of an object store and the flexibility of the Hadoop File system.
A previous blog post describes the different bucket layouts available in Ozone. This blog post is intended to provide guidance to Ozone administrators and application developers on the optimal usage of the bucket layouts for different applications.
To start with, Ozone’s namespace includes the following conceptual entities:
File System Optimized (FSO) and Object Store (OBS) are the two new bucket layouts in Ozone for unified and optimized storage as well as access to files, directories, and objects. Bucket layouts provide a single Ozone cluster with the capabilities of both a Hadoop Compatible File System (HCFS) and Object Store (like Amazon S3). One of these two layouts should be used for all new storage needs.
A description of the bucket layouts and their features are below.
Users can store their data in Apache Ozone and can access the data with multiple protocols.
Protocols provided by Ozone:
1- Ingesting data using S3 interface into FSO buckets for low latency analytics using the ofs protocol.
2- Storing data on-premises for security and compliance which can also be accessed using cloud-compatible API.
Bucket layouts are a powerful feature that allow Apache Ozone to be used as both an Object Store and Hadoop Compatible File System. In this article, we have covered the benefits of each bucket layout and how to choose the best bucket layout for each workload.
If you are interested in learning more about how to use Apache Ozone to power data science, this is a great article. If you want to know more about Cloudera on private cloud, see here.
Our Professional Services, Support and Engineering teams are available to share their knowledge and expertise with you to choose the right bucket layouts for your various data and workload needs and optimize your data architecture. Please reach out to your Cloudera account team or get in touch with us here.
References:
[1] https://blog.cloudera.com/apache-ozone-a-high-performance-object-store-for-cdp-private-cloud/
[2] https://blog.cloudera.com/a-flexible-and-efficient-storage-system-for-diverse-workloads/
This may have been caused by one of the following: