Data Catalogs: Extreme Makeover Edition
Here’s the cold, hard truth: data catalogs don’t get most people excited. In fact, it’s kind of like going to the dentist; almost nobody likes doing it, but the alternative is worse. In other words, while most people recognize the importance of a data catalog, it’s often treated as a necessary evil.
It’s definitely not a fun process, but it is an important process. What’s more, data cataloging has a lot more potential than most organizations are currently taking advantage of. A data catalog is instrumental in enabling your long-term data quality initiatives – and there is a better way to do it so the pay-off is greater.
Those Darn Data Catalogs
Everyone wants a data catalog, but nobody really wants to do the work. Imagine your boss telling you that you’ll now be expected to document everything you do, all those things that live in your head. You would know that it’s an admirable goal that will also be a ton of work. It’s a massive data entry undertaking.
And even worse, a data catalog’s only useful when it gets a critical mass of data. You can’t just turn on a data catalog in one day and expect it will have everything that anyone wants to look for. Everyone wants a catalog, and everyone understands the benefits of a catalog, but it’s hard and tedious work.
However, when you go to tackle a data migration project, what you’ll find is that a key part of this project is creating all the assets that will eventually become the data catalog. Maybe an organization brings in a team of people to document what the new system is going to look like in terms of the business rules, the interface and so on. All this documentation is typically done using a tool like Microsoft Word and then put in a system such as SharePoint, but there’s often zero motivation to go back and update it after the fact. Documents that were accurate six months ago no longer are, and the process of going back to update them can seem daunting.
Collecting Data, Driving Migration
We’ve established that while data catalogs can end up driving a lot of interesting insights, the cost of cataloging it all is high. And you don’t necessarily miss having one until you have a question. What if there was a different way to go about this process, in a more meaningful and ultimately fruitful way?
Instead of capturing all this information in third-party, disconnected tools (such as Excel, SharePoint and Word docs), what if you could put it in a tool that’s integrated into the system? That would mean that as you’re collecting the data, this could then drive your data migration process.
In other words, if you must collect and capture all of this information anyways, it’s better to do so in a way that you can use repeatedly. Taking a more strategic approach to data capture brings many potential benefits, because it can ultimately be leveraged for far more in the future. The trick is to make this a part of the migration journey.
The Future of Data Catalogs
By making data catalogs a key part of the migration process, it gives organizations more reason to update the data catalog if something changes. There’s greater motivation to do this because that is what’s going to drive migration. It’s easier to maintain a data catalog if it’s driving a business process. If there’s no penalty for not doing it, there’s no incentive for keeping the data catalog updated.
Once you’ve got the data catalog, you can also drive change. You’ve captured all those rules; that becomes part of your data quality solution. You don’t have to start a data quality project; you just need to refigure your business rules.
You can start leveraging them for data governance. You can’t build an automated governance process if you haven’t created the data catalog; you’re collecting this data to create a better future. A good metaphor is Google Maps: to start, Google’s team had to map the world and then they could overlay the driving directions, but they had to take that first fundamental step of doing the mapping before they could create a meaningful output. Data catalogs are that fundamental step when it comes to bigger and more impactful data projects.
Changes and KPIs Become Easier
Here’s a scenario to consider: Imagine getting a call asking what the impact would be across the system if someone changes a material number. Without a data catalog, trying to figure this out could wind up being an arduous process. But with a data catalog, you have a system that defines this and its relationship to all other things, which enables you to easily do the necessary analysis.
Once you have all this metadata in the system, you can start building key performance indicator (KPI) reports and you can use these KPIs on top of the metadata. Even if you change systems, you don’t have to ask for the queries underlying the KPI report that you send out every month.
All these things ultimately become easier. What’s been often thought of as a necessary evil opens up a whole new world of possibility.
Other functions that become much easier once your data catalog has been established include:
- Automation of tasks – this could include everything from automating the creating of error reports to check for invalid or missing data, to automated creation of governance reports, and much more;
- Browse access – now that you have the data, everyone can access it;
- Data quality measurements and scoring.
And that’s just the tip of the iceberg.
Data Catalogs as Catalysts
While data catalogs are often seen as boring and arduous, their importance and potential shouldn’t be underestimated. Although cataloging data can be a tedious and time-consuming process, it serves as a vital component for long-term data quality initiatives. Rather than treating data catalogs as disconnected tools, organizations can integrate them into their systems, allowing for ongoing updates and leveraging the captured information for future benefits.
By making data catalogs an integral part of the migration process, organizations have a greater incentive to maintain and update them. With a comprehensive data catalog in place, organizations can easily analyze the impact of changes, create KPI reports and explore new possibilities. What was once considered a necessary evil can indeed become a catalyst for a more efficient and transformative data-driven future.
About the author: John Munkberg is the senior vice president of migration products at Syniti, a provider of data management solutions.
Related Items:
What to Look for in a Data Catalog
Data Catalogs Take Center Stage in Eckerson CDO TechVent
What Does It Mean for a Data Catalog to Be Powered by a Knowledge Graph?