Lightup Seeks Data Quality Automation
One of the challenges in data observability is the effort required to develop all the data quality indicators (DQIs) enterprises need to monitor production databases. All too often, companies are left to write their own SQL code to check for bad data. But data observability startup Lightup says it has figured out a way to automate much of that busywork, which the CEO says will pave the way to wider adoption.
It took a couple of years before Lightup realized the initial direction it was taking to build data observability software was not going to pan out in the long run, says CEO Manu Bansal, who co-founded the company in 2019 with Rajiv Ramanathan and Vivek Joshi.
“There have been attempts at solving this problem in a generic sense, but they don’t work very well,” Bansal says. “We started to realize that you have to build very specialized features just to solve the data quality problem, if you’re going to create something easy enough to use that it would go enterprise-wide. So that for us was kind of the ‘aha’ moment two years back.”
Lightup’s initial data observability product focused heavily on anomaly detection, along with some incident management on top. But when it came time to building the DQI itself, LightUp would just require the customer to write custom SQL. It turned out that didn’t appeal much to enterprise customers.
“If we’re just giving them a shell to write SQL for data quality, that’s not tremendously useful,” Bansal tells Datanami. “People don’t want to sit and write code. They don’t have time to do that.
“But if you can give them pre-built data quality checks, then it really starts to make a difference,” he continues. “So that was a big turning point for us, where we said, well anomaly detection is great, but you need to go one step before that in their journey and give them prebuilt data quality metrics or what we call data quality indicators.”
The big challenge is that everybody’s database is a little bit different. The customers may all be using the same analytic databases–Snowflake, Databricks, Google Cloud Big Query, AWS’s Amazon Redshift, and Teradata being popular ones. But every customer’s data model is a little bit different, and therein lies the sticking point that LightUp had to overcome.
“The data models are unique to the customer, but we want to give them checks that are reusable. How do you do that?” Bansal says. “And it turns out that the hard part here is figuring out just the right amount of configurability, of flexibility in those pre-built templates, where you can reuse the majority of the hard work that the user needs to do, but still give them enough flexibility so that they can attach it to their own unique data model.”
For example, one of the pre-built DQIs that LightUp has created checks for null values in a database field. According to Bansal, Lightup gives the customer a way to define what a null value means for them (such as null, zero, or some other placeholder). LightUp has already written the 30 to 50 lines of SQL that constitutes the DQI check, but gives the customer the ability to customize that check to their specific needs.
Lightup, which last week closed a $9 million Series A led by Andreessen Horowitz and Newlands, has created many of these pre-built DQIs for a variety of common data issues that confront enterprise customers. That shift the burden to develop and maintain these data quality checks to LightUp, and the company has embraced the challenge.
“There’s no magic to it,” Bansal says. “Everyone wants it the same way. It’s very complex to build it out yourself in SQL, and you don’t have to, because that’s a common denominator. It’s being able to pick out things that need to be configurable and left to the user without creating too much burden on them, versus things that are extremely complex to program but don’t need to be programmed by the user at all, that the system should provide out-of-the-box. There’s no shortcut to getting to that design point, but that’s the hard problem we have been able to solve, which is to know where that boundary is.”
Another related technical challenge that Lightup was forced to deal with was whether to extract the data from the database as part of the data quality monitoring process–as some data quality vendors do–or run the checks in place. Extracting the data is technically the easier solution, but it raises scalability issues.
“Data living in Snowflake, we would process it in Snowflake,” Bansal says. “When it comes to computing the data quality check or data quality indicator, that’s a pushdown query that runs in Snowflake. We don’t want to move data. That’s why it scales beautifully. But now it does complicate our lives because we have to build those DQIs for Snowflake and then for Databricks then for Teradata and then from then for 10 other systems. But that’s what it takes.”
This approach is starting to gain traction for the Mountain View, California-based company. Large enterprises like McDonald’s, Sketchers, and Gap have purchased and implemented Lightup’s data observability software, and are getting good results, Bansal says.
McDonald’s has embarked upon a customer loyalty program for the first time in its history, and it’s using Lightup to help ensure its data quality remains high. Matt Sandler, McDonald’s senior director of data and analytics, recently co-presented a session with Bansal on its use of LightUp at the Databricks Data + AI Summit.
“Lightup has been a boon for us, enabling high-quality data monitoring while being performant in a cloud-native ecosystem,” Sandler says.
Lightup plans to use the Series A round to boost integration of the software into the big data ecosystem. For instance, data quality information surfaced by Lightup could be consumed in data catalog tools, Bansal says, and likewise, information from the data catalog tools can be useful for Lightup.
While early stage funding has dried up in the tech market, Bansal is happy to have completed the deal with a venture partner as prestigious as A16z, which he says speaks volumes about fundamental way Lightup is attacking the data quality problem.
“That’s why we’re so excited about this funding round, because we have been able to convince investors and show them that look, enterprises are working with us and retaining us and growing with us and putting our product into production,” he says. “And despite all the activity in the space, it turns out there’s not that many companies that can make that claim successful.”
Related Items:
‘Scary Signs’ from Tech Startup-Ville, Crunchbase Says
There Are Four Types of Data Observability. Which One is Right for You?
Lighting Up Data’s GIGO Problem
Editor’s note: This article has been corrected. Newlands invested in Lightup, not Newsland Ventures. Datanami regrets the error.