We’ve entered a period of punctuated equilibrium in the evolution of big data over the past month, thanks to community congregating around open table formats and metadata catalogs for storing data and enabling processing engines to access that data. Now attention is shifting to another element of the stack that’s been living quietly in the shadows: the semantic layer.
The semantic layer is an abstraction that sits between a company’s data and the business metrics that it has chosen as its standard unit of measurement. It’s a critical layer to ensure correctness.
For instance, while various departments in a company may have different opinions of what the best way to measure what “revenue” means, the semantic layer defines what is the correct way to measure revenue for that company, thereby eliminating (or at least greatly reducing) the chance of getting bad analytic output.
Traditionally, the semantic layer has travelled with the business intelligence or data analytics tool. If you were a Tableau shop or a Qlik shop or a Microsoft PowerBI shop or a ThoughtSpot shop or a Looker shop, you used the semantic layer provided by those vendors to define your business metrics.
This approach works well for smaller companies, but it created problems for larger enterprises that used two or more BI and analytic tools. Now the enterprise is faced with the task of hardwiring two or more semantic layers together to ensure that they’re pulling data from the correct tables and applying the right transformations to ensure their reports and dashboards would continue generating accurate information.
In recent years, the concept of a universal semantic layer has started to bubble up. Instead of defining the business metrics in a semantic layer that is tied directly to the BI or analytics tool, the universal semantic layer lives outside of the BI and analytics tools, thereby providing a semantic service that any BI or analytics tool could tap into to ensure accuracy.
As the cloud data estates of companies have grown during the past five years, smaller companies started dealing with the increased complexity that comes from using multiple data stacks. That has helped to drive some interest in the universal semantic layer.
Natural Language AI
More recently, another factor has driven a surge of interest in the semantic layer: generative AI. Large language models (LLMs) like ChatGPT are leading many companies to experiment with using natural language as an interfaces for a range of applications. LLMs have shown an ability to generate text in any number of languages, including English, Spanish, and SQL.
While the English generated by LLMs generally is quite good, the SQL is usually quite poor. In fact, a recent paper found that LLMs generate accurate SQL on average only about one-third of the time, said Tristan Handy, the CEO of dbt Labs, the company behind the popular dbt tool, and the purveyor of a universal semantic layer.
“A lot of people are experimenting in this space are AI engineers or software engineers who don’t actually have knowledge of how BI works,” Handy told Datanami in an interview at Snowflake’s Data Cloud Summit last month. “And so they’re just like, ‘I don’t know, let’s have the model write SQL for me.’ It just doesn’t happen to work that well.”
The good news is that it’s not difficult to introduce a semantic layer into the GenAI call stack. Using a tool like LangChain, one could simply instruct the LLM to use a universal semantic layer to generate the SQL query that will fetch the data form the database, instead of letting the LLM do it itself, Handy said. After all, this is exactly what semantic layers were created for, he pointed out. Using this approach increases the accuracy of natural language queries using LLMs to about 90%, Handy said.
“We are having a lot of conversations about the semantic layer, and a lot of them are driven by the natural language interface question,” he said.
Not Just Semantics
Dbt Labs isn’t the only vendor plying the universal semantic layer waters. Two other vendors have staked a flag in this space, including AtScale and Cube.
AtScale recently announced that its Semantic Layer Platform is now available on the Snowflake Marketplace. This support ensures that Snowflake customers can continue to rely on the data they’re generating, no matter which AI or BI tool they’re using in the Snowflake cloud, said
“The semantic models you define in AtScale represent the metrics, calculated measures, and dimensions your business consumers need to analyze to achieve their business objectives,” AtScale Vice President of Growth Cort Johnson wrote in a recent blog post. “After your semantics are defined in AtScale, they can be consumed by every BI application, AI/ML application, or LLM in your organization.”
Databricks is also getting into the semantic game. At its recent Data + AI Summit, it announced that it has added first-class support for metrics in Unity Catalog, its data catalog and governance tool.
“The idea here is that you can define metrics inside Unity Catalog and manage them together with all the other assets,” Databricks CTO Matei Zaharia said during his keynote address two weeks ago. “We want you to be able to use the metrics in any downstream tool. We’re going to expose them to multiple BI tools, so you can pick the BI tool of your choice. … And you’ll be able to just use them through SQL, through table functions that you can compute on.”
Databricks also announced that it was partnering with dbt, Cube, and AtScale as “external metrics provider,” to make it easy to bring in and manage metrics from those vendors’ tools inside Unity Catalog, Zaharia said.
Cube, meanwhile, last week launched a couple of new products, including a new Semantic Catalog, which is designed to give users “a comprehensive, unified view of connected data assets,” wrote David Jayatillake, the VP of AI at Cube, in a recent blog post.
“Whether you are looking for modeled data in Cube Cloud, downstream BI content, or upstream tables, you can now find it all within a single, cohesive interface,” he continued. “This reduces the time spent jumping between different data sources and platforms, offering a more streamlined and efficient data discovery process for both engineers and consumers.”
The other new product announced by Cube, which recently raised $25 million from Databricks and other venture firms, includes an AI Assistant. This new offering is designed to “empower non-technical users to ask questions in natural language and receive trusted answers based on your existing investment into Cube’s universal semantic layer,” Jayatillake wrote in a blog.
Opening More Data
GenAI may be the biggest factor driving interest in a universal semantic layer today, but the need for it predates GenAI.
According to dbt Labs’ Handy, who is a 2022 Datanami Person to Watch, the rise of the universal semantic layer is happening for the same reason that the database is being decomposed into constituent parts.
Dbt Labs originally got into this universal semantic layer space because the company saw it as “a cross-platform source of truth,” said Handy, who is a 2022 Datanami Person to Watch.
“It should be across your different data tools, it should be across your BI tools,” he said. “In the same way that you govern your data transformation in this independent way, you should be governing your business metrics that way, too.”
The rise of open table formats like Apache Iceberg, Apache Hudi, and Delta Lake–along with open metadata catalogs like Snowflake Polaris and Databricks Unity Catalog–show that there’s an appetite for dismantling the traditional monolithic database and data structures into a collection of independent components, linked through a federated architecture.
At the moment, all of the universal semantic layers are proprietary, which is unlike what’s happening at the table format and metastore layers, where open standards reign, Handy pointed out. Eventually, the market will settle on a standard, but it’s still very early days, he said.
“Semantic layers used to be kind of a niche thing,” he said, “and now it’s becoming a hot topic.”
Related Items:
Cube Secures $25M to Advance Its Semantic Layer Platform
AtScale Announces Major Upgrade To Its Semantic Layer Platform
Semantic Layer Belongs in Middleware, and dbt Wants to Deliver It
December 20, 2024
- Reltio Recognized as Best-of-Breed Representative Vendor in 2024 Gartner Market Guide for Master Data Management Solutions
- CapStorm Releases Salesforce Connector Offering Seamless Data Integration with Snowflake
- LogicMonitor and AppDirect Partner to Bring Hybrid Observability Solutions to IT Service Providers
- Patronus AI Launches Small, High-Performance Judge Model for Fast and Explainable AI Evaluations
- Equinix Unveils Private AI Solution with Dell and NVIDIA for Secure, Scalable AI Workloads
December 19, 2024
- Hydrolix Reports Technology Partner Ecosystem Momentum
- EQTY Lab, Intel, and NVIDIA Introduce Verifiable Compute AI Framework for Governed AI Workflows
- NeuroBlade Empowers Next-Gen Data Analytics on New Amazon Elastic Compute Cloud F2 Instances
- Quantum Announces Support for NVIDIA GPUDirect Storage with Myriad All-Flash File System
- Kurrent Charges Forward with $12M for Event-Native Data Platform
- Timescale Details PostgreSQL’s Growing Adoption Across Industries in 2024
- Altair Enhances RapidMiner with Graph-Powered AI Agent Framework
- Esri Releases 2024 Update of Ready-to-Use US Census Bureau Data for ArcGIS Users
December 18, 2024
- Qlik Shares 2025 AI and Data Trends: Authenticity, Applied Value, and Agents
- Domo Releases 12th Annual ‘Data Never Sleeps’ Report
- Dresner Advisory Services Publishes 2024 Embedded Business Intelligence Market Study
- Starburst Helps Arity Streamline Data Insights with Scalable Lakehouse Architecture
- Vultr Expands Global Reach with New Funding at $3.5B Valuation
- Ataccama Extends Generative AI Capabilities to Accelerate Enterprise Data Quality Initiatives
- Menlo Ventures Announces Cohort Backed by $100M Anthology Fund Launched in Partnership with Anthropic