Cleanlab Launches New Solution to Detect AI Hallucinations in Language Models
SAN FRANCISCO, April 25, 2024 — Cleanlab today launched the Trustworthy Language Model (TLM), a fundamental advance in generative AI that the company says can detect when large language models (LLMs) are hallucinating. Steven Gawthorpe, PhD, Associate Director and Senior Data Scientist at Berkeley Research Group, called the Trustworthy Language Model “the first viable answer to LLM hallucinations that I’ve seen.”
Generative AI is poised to transform every industry and profession, but it faces a major challenge in “hallucinations,” when LLMs generate incorrect or misleading results. A given LLM response might sound convincing. But is it correct? Is it based in reality? LLMs offer no way to be sure. This makes automating sensitive tasks with generative AI all but impossible.
The lack of trust is the major obstacle to business adoption of LLMs. Billions of dollars of productivity gains are locked up behind this dilemma. Cleanlab is the first to crack it.
Cleanlab’s TLM combines world-class uncertainty estimation, auto-ML ensembling and quantum information algorithms repurposed for general computing to add trust to generative AI. Its API wraps around any LLM, producing a reliable trustworthiness score for every response.
In industry-standard benchmarks for LLM reliability, the TLM beats other methods across the board. It delivers performance that’s not just superior, but consistently superior, giving businesses the confidence to rely on generative AI for important jobs.
For example, businesses can use the TLM to automate customer refunds, bringing a human reviewer into the loop whenever an LLM’s response falls below a predetermined level of trustworthiness.
“Cleanlab’s TLM gives us the power of thousands of data scientists to enrich data and strengthen LLM outputs, providing 10x to 100x ROI for many of our clients. Compared to what Cleanlab is doing, other tools aren’t even on the same playing field,” Gawthorpe said.
“Cleanlab’s TLM is a truly pioneering solution for effectively addressing hallucinations,” added Akshay Pachaar, AI Engineer at Lightning.ai. “The integration of Cleanlab’s trustworthiness scores transforms human-in-the-loop workflows, enabling up to 90% automation. It not only conserves hundreds of manpower hours weekly but augments our efficiency in processing substantial datasets for data enrichment, document and chat-log analysis and other large-scale tasks. It has the potential to revolutionize how we manage and derive value from data.”
In addition to making LLMs more trustworthy, the TLM makes them more accurate as well. It functions as a sort of super-LLM, checking LLMs’ output to deliver better results than LLMs on their own. In benchmarks comparing the accuracy of GPT4 with GPT4 + TLM, the combination of GPT4 and the TLM outperforms GPT4 by itself every time. This makes the TLM ideal for scenarios such as:
- RAG (Retrieval Augmented Generation): Providing LLMs with more-reliable context
- Business chatbots: Accurately answering questions from customers and employees
- Data extraction: Extracting complex information from PDFs
- Securities analysis: Scanning stock reviews to find the strongest buy signal.
Like other Cleanlab products, the TLM has its roots in the founders’ groundbreaking research on uncertainty in AI datasets. Its CEO, Curtis Northcutt, spent eight years working with the inventor of the quantum computer to understand how to extract reliable computation from arbitrary data. Its Chief Scientist, Jonas Mueller, led the development of AutoGluon, the open-source and industry-standard Auto-ML platform for AWS. Its CTO, Anish Athlaye, is one of the world’s most renowned ML developers, with more than 30,000 GitHub stars for his personal projects.
AWS, Google, JPMorgan Chase, Tesla and Walmart are a few of the Fortune 500 companies using Cleanlab’s technology to improve their data inputs. Now Cleanlab is applying that same expertise to the output of LLMs — with economic implications that are, if anything, even greater.
“This is a pivot point for generative AI in the enterprise,” said Cleanlab CEO Curtis Northcutt. “Adding trust to LLMs will change the calculus around their use. We’ll always have some version of hallucinations. The difference is that now we have a powerful solution to detect and manage them. That means businesses can deploy generative AI for use cases that were previously undreamt of, and unlock a significant new source of productivity—and revenue.”
To learn more about the Cleanlab TLM for businesses, visit https://cleanlab.ai/tlm.
About Cleanlab
Founded in 2021 by three MIT Computer Science PhDs and trusted by hundreds of top organizations including AWS, Chase, Google and Tesla, Cleanlab adds trust to every input and output of data-driven processes by turning unreliable data into reliable models and insights. Cleanlab’s AI data platform, Cleanlab Studio, automatically finds and fixes errors in both structured and unstructured datasets, such as visual, text, and tabular data, and adds over 30 dimensions of quality/trust scores for data points. Its Trustworthy Language Model (TLM) offers the first reliable way to assess the trustworthiness of LLM outputs. Recognized as a Forbes AI 50 company, Cleanlab is based in San Francisco and backed by leading investors including Menlo Ventures, Bain Capital Ventures, Databricks Ventures, TQ Ventures, Samsung Ventures and angels including the CEOs and founders of Yahoo, GitHub, Mosaic and Okta.
Source: Cleanlab