Anyscale and Nvidia In LLM Hookup
GenAI developers building atop large language models (LLMs) are the big winners of a new partnership between Anyscale and Nvidia unveiled this week that will see the GPU maker’s AI software integrated into Anyscale’s computing platform.
Anyscale is best known as the company behind Ray, the open source library from UC Berkeley’s RISELab that turns any Python program developed on a laptop into a super-scalable distributed application able to take advantage of the biggest clusters. The Anyscale Platform, meanwhile, is the company’s commercial Ray service that was launched in 2021.
The partnership with Nvidia has open source and commercial components. On the open source front, the companies will hook several of the GPU manufacturer’s AI frameworks, including TensorRT-LLM, Triton Inference Server, and NeMo, into Ray. On the commercial side, the companies have pledged to get the Nvidia AI Enterprise software suite certified for the Anyscale Platform, as well as integrations for Anyscale Endpoints.
The integration of the TensorRT-LLM library with Ray will enable GenAI developers to utilize the library with the Ray framework. Nvidia says TensorRT-LLM brings an 8x performance boost when running on Nvidia’s latest H100 Tensor Core GPUs compared to the prior generation.
Developers working with Ray can also now use Nvidia’s Triton Inference Server when deploying AI inference workloads using Ray. The Triton Inference Server supports a range of processors and deployment scenarios, including GPU and CPU on cloud, edge, and embedded devices. It also supports TensorFlow, PyTorch, ONNX, OpenVINO, Python, and RAPIDS XGBoost frameworks, thereby increasing deployment flexibility and performance for GenAI developers, the companies say.
Finally, the integration between Ray and Nvidia’s NeMo framework for GenAI applications will enable GenAI developers to combine the benefits of both products. NeMo contains several components, including ML training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models.
Similarly, the integration between Anyscale Platform and Nvidia’s AI Enterprise software is designed to put more capabilites and tools at the disposal of enterprise GenAI developers. The companies have worked to ensure that Anyscale Endpoints, a new service unveiled by Anyscale this week, is supported within the Nvidia AI Enterprise environment. Anyscale Endpoints are designed to enable developers to integrate LLMs into their applications quickly using popular APIs.
“Previously, developers had to assemble machine learning pipelines, train their own models from scratch, then secure, deploy and scale them,” Anyscale said. “This resulted in high costs and slower time-to-market. Anyscale Endpoints lets developers use familiar API calls to seamlessly add ‘LLM superpowers’ to their operational applications without the painstaking process of developing a custom AI platform.”
Robert Nishihara, CEO and co-founder of Anyscale, says the partnership with Nvidia brings more “performance and efficiency” to the Anyscale portfolio. “Realizing the incredible potential of generative AI requires computing platforms that help developers iterate quickly and save costs when building and tuning LLMs,” Nishihara said.
Anyscale made the announcement at Ray Summit, which is taking place this week in San Francisco.
Related Items:
Anycale Bolsters Ray, the Super-Scalable Framework Used to Train ChatGPT
Anyscale Branches Beyond ML Training with Ray 2.0 and AI Runtime
Anyscale Nabs $100M, Unleashes Parallel, Serverless Computing in the Cloud