June 13, 2024

TensorOpera Unveils Fox-1: Pioneering Small Language Model (SLM) for Cloud and Edge

PALO ALTO, Calif., June 13, 2024 –TensorOpera, the company providing `Your Generative AI Platform at Scale’, is excited to announce the launch of TensorOpera Fox-1. This 1.6-billion parameter small language model (SLM) is designed to advance scalability and ownership in the generative AI landscape. Fox-1 stands out by delivering top-tier performance, surpassing comparable SLMs developed by industry giants such as Apple, Google, and Alibaba. The combination of exceptional performance and cost-efficiency makes Fox-1 an ideal choice for developers and enterprises seeking to build their own generative AI models and applications, and deploy them across diverse infrastructures, all without the need for substantial resources.

Small language models with fewer than 2 billion parameters like Fox-1 are transforming the AI landscape by delivering powerful capabilities with much reduced computational and data requirements suitable for deployment on mobile and edge devices. This shift is crucial as it facilitates the deployment of AI applications across various platforms, from mobile devices to servers, all while maintaining exceptional performance.

“The launch of Fox-1 and its integration into TensorOpera’s AI platform is a major step towards our vision of providing an integrated edge-cloud platform for Generative AI,” remarks Salman Avestimehr, Co-Founder and CEO of TensorOpera and Dean’s Professor of ECE and CS at the University of Southern California. “This would enable seamless training, creation, and deployment of generative AI applications across a wide range of platforms and devices, ranging from powerful GPUs in cloud settings to edge devices like smartphones and AI-equipped PCs, enhancing efficiency, privacy, and personalization. This development highlights our commitment to delivering ‘Your Generative AI Platform at Scale’, which promotes ownership and scalability across both cloud and edge computing environments.”

Fox-1 was trained from scratch with a 3-stage data curriculum on 3 trillion tokens of text and code data in 8K sequence length. With a decoder-only transformer structure, 16 attention heads, and grouped query attention, Fox-1 is notably deeper and more capable than its peers (78% deeper than Google’s Gemma-2B, 33% deeper than Alibaba’s Qwen1.5-1.8B, and 15% deeper than Apple’s OpenELM-1.1B). In various benchmarks, such as MMLU, ARC Challenge, TruthfulQA, and GSM8k, Fox-1 performs better or on par with other SLMs in its class including Gemma-2B, Qwen1.5-1.8B, and OpenELM-1.1B.

“SLMs can effectively incorporate data in special domains in the pre-training phase, which leads to optimized expert models that can be trained and deployed in a decentralized fashion”, said Tong Zhang, Chief AI Scientist at TensorOpera and Professor of CS at UIUC. “The integration of domain-specific SLMs into innovative architectures such as Mixture of Experts (MoE) and model federation systems further enhances their utility, enabling the construction of more powerful systems by integrating expert SLMs to handle complex tasks.”

In terms of operational efficiency, Fox-1 achieves an impressive throughput of over 200 tokens per second on the TensorOpera model serving platform, surpassing the performance of Gemma-2B and equaling that of Qwen1.5-1.8B in identical deployment environments. The high throughput of Fox-1 can largely be attributed to its architectural design, which incorporates Grouped Query Attention (GQA) for more efficient query processing. More specifically, by dividing query heads into groups that share a common key and value, Fox-1 significantly improves inference latency and enhances response times.

The integration of Fox-1 into both TensorOpera AI Platform and TensorOpera FedML Platform further enhances its versatility, enabling its deployment and training across both cloud and edge computing environments. It empowers AI developers to train and build their models and applications on the cloud utilizing the comprehensive capabilities of the TensorOpera AI Platform, and then deploy, monitor, and personalize these solutions directly onto smartphones and AI-enabled PCs via the TensorOpera FedML platform. This approach offers cost efficiency, enhanced privacy, and personalized user experiences, all within a unified ecosystem that facilitates seamless collaboration between cloud and edge environments.

Fox-1 is released under the Apache 2.0 license through the TensorOpera AI Platform and Hugging Face, granting the community freedom for both production and research uses.

To learn more and get started on using Fox-1, dig into TensorOpera’s blogpost.

Source: TensorOpera

Tags: GenAI, LLM, model

TensorOpera Unveils Fox-1: Pioneering Small Language Model (SLM) for Cloud and Edge

December 20, 2024

December 19, 2024

December 18, 2024

Sponsored Partner Content

Introducing AIStor, the most powerful version of MinIO to date

Designing a Copilot for Data Transformation

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

IDC Spotlight: Boosting AI Impact with Data Products

Building a Trusted Data Foundation for AI/ML and Business Intelligence (BI)

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

TensorOpera Unveils Fox-1: Pioneering Small Language Model (SLM) for Cloud and Edge

December 20, 2024

December 19, 2024

December 18, 2024

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link