In the digital age, organizations put a lot of effort into generating, capturing, and organizing data. After all, any meaningful outcome we expect from automation, artificial intelligence (AI), virtual reality (VR), and any other advanced technology boils down to how well we can manage data. Whether it's simply moving it to a place where it can be used or performing complex analytics, decision making starts with data.
But proper data management requires a solid understanding of the data behind the data — the metadata. Meta is a Greek term for “self,” so metadata refers to the information on details and characteristics that describe the data itself.
Tracking and recording metadata is critical; it simplifies the process of finding files and documents when you need them, much like the card catalog did for locating library books before the digital age. Metadata helps track details such as author, file size, and dates of data creation and modification.
Without a well-defined metadata management strategy, you can spend hours trying to find a file. Anyone who has searched for a hidden file in a system knows how frustrating it gets. Like trying to find something lost in the garage, the first step is organizing what's there! And if multiple systems across an organization use different metadata standards and procedures, the likely result will be frustration.
A metadata strategy boosts efficiency and productivity by answering these basic, but critical, questions about all of an organization’s data:
What is it?
Where does it come from?
Who uses it?
What is it used for
Metadata comes in different types, and organizations need to figure out which types of metadata are relevant for their purposes. Then you can develop a strategy accordingly. The most common types of metadata include:
Descriptive — details such as author and dates of creation and modification, which make it possible to identify a resource
Structural — information on the relationship of a resource’s components and how they are put together
Administrative — technical information about the resource, including permissions and how it was created
Reference — information about the content quality and methods / standards used in data creation
Critical to a metadata strategy is deciding how metadata is used within an organization and by whom. Data engineers and analysts, as well as people in charge of software development and database management, are among those who should be involved in making metadata-related decisions.
But don’t forget the rank-and-file system users. If search terms don’t make sense, users may end up fumbling and wasting time when trying to access what they need. So it’s important to get input from system users to ensure data is organized in intuitive ways and easily accessible.
Metadata is a crucial piece of an effective enterprise data governance strategy. Without the right approach to governance, organizations are susceptible to many risks. Read this blog post to learn about some major governance fails – and how to get it right.
A critical element of a metadata strategy is to identify data sources and attributes. As organizations have come to rely more and more on data to run operations, the number of sources such as databases, applications, websites, and external files keeps growing.
The quality of the sources can vary widely, complicating the task of organizing the data. Identifying the authoritative source is key. Data is often structured in different ways, depending on its origin and how it enters a company’s environment. It takes some effort to consolidate metadata from all the different sources and then get it to conform to a structure that makes sense for use across the organization. This requires establishing standards and procedures for how users interact with systems and files to find and store data.
Using a consistent naming standard for files is important. Tags and identifiers should be brief and consistent from one file to another, using a standard vocabulary that is familiar and relevant to stakeholders. Failing to accomplish this simple step likely will also lead to frustration.
Metadata management is an ongoing effort. Data engineers and analysts have to identify and assess data sources on an ongoing basis so they can track, classify, and locate data to make it easy to use and to ensure their processes and procedures comply with any applicable regulations. This key element of decision making is essential to manage and maintain to ensure reliable, timely decision making to support business or government mission outcomes.
Data governance is crucial for enterprise businesses and Cloudera solutions such as Cloudera Data Platform (CDP) helps users manage metadata (and data) faster and easier — in any location.
As environments become more complex and data sources grow, having an effective metadata strategy is essential to an organization’s day-to-day operations and future planning.
Rob Carey is currently the President of Cloudera Government Solutions, Inc and SVP of Public Sector at Cloudera. He is dedicated to building strong teams to accomplish organizational goals in support of Government missions. He brings more than 32 years of federal government leadership experience as well as 25 years serving in the United States Navy Reserve to his role, with expertise in cybersecurity, cloud computing, program management, systems/process analysis, enterprise architecture, and operational planning.