Groq Opens API Access to Real-time Inference, Fueling Consumer AI Applications with Unprecedented Speed
LAS VEGAS, Jan. 9, 2024 — The need for speed is paramount in consumer generative AI applications and only the Groq LPU Inference Engine generates 300 tokens per second per user on open-source large language models (LLMs), like Llama 2 70B from Meta-AI. At that speed, the same number of words as Shakespeare’s Hamlet can be produced in under seven minutes, which is 75X faster than the average human can type.
With demand surging for real-time inference, the compute process of running data through a trained AI model to provide instant results from AI applications for fluid end-user experiences, Groq has raced to bring its technology to market.
- Early access to the Groq API is available starting January 15, 2024, enabling approved users to experiment with Llama 2 70B, Mistral, Falcon, Vicuna, and Jais running on the Groq LPU Inference Engine.
- The Groq LPU Inference Engine is already being used by leading chat agents, robotics, FinTech, and national labs for research and enterprise applications.
- Groq partner and customer, aiXplain, uses the API in a multi-faceted program to take advantage of real-time inference across its portfolio of innovative products and services.
- As of December 21, 2023, the general public can try it themselves via GroqChat, an alpha release of Meta AI’s foundational LLM running on the Groq LPU Inference Engine.
“Inference is the next big thing in AI,” said aiXplain CEO and Founder Hassan Sawaf. “We were searching for the right solution to bring several production-ready AI ideas to life, but the real-time inference demands of these products and services made that seem like an impossible task. Until we found Groq. Only the Groq LPU Inference Engine delivers the low latency speed necessary to sustain user engagement beyond novelty and make these products successful for the long-term.”
Groq and aiXplain will co-host a cocktail party on January 9 where Groq Founder and CEO Jonathan Ross will demonstrate how real-time inference is changing the trajectory for consumer electronics. Space is limited and registration is required. Please email [email protected] if you would like to attend.
“What aiXplain is doing is nothing short of creating magic for their customers,” said Ross. “At Groq, we aim to create a sense of awe by accelerating generative AI applications to the point that they become immersive experiences. Thanks to the partnership between aiXplain and Groq, truly interactive engagement with AI is here, today.”
Groq API access will be generally available in Q2 2024.
About Groq
Groq is a generative AI solutions company and the creator of the LPU Inference Engine, the fastest language processing accelerator on the market. It is architected from the ground up to achieve low latency, energy-efficient, and repeatable inference performance at scale. Customers rely on the Groq LPU Inference Engine as an end-to-end solution for running Large Language Models (LLMs) and other generative AI applications at 10x the speed. Jonathan Ross, inventor of the Google Tensor Processing Unit (TPU), founded Groq to enable an AI economy powered by human agency.
About aiXplain
Founded in 2020, aiXplain is the industry’s first end-to-end integrated platform for quick development and enterprise-grade deployment of AI projects and solutions. aiXplain’s no-code/low-code integrated development environment (IDE) enables users to develop, manage, benchmark, experiment, and deploy AI assets quickly and efficiently. Users can design their own AI pipeline and benchmark their model against other models, either using their own datasets or benefiting from available datasets—all to easily create and maintain AI systems.
Source: Groq