Snowflake Teams Up with Meta to Host and Optimize New Flagship Model Family in Snowflake Cortex AI
Snowflake has announced a partnership with Meta to host the Llama 3.1 collection of open-source large language models (LLMs) in Snowflake Cortex AI. This includes Meta's largest model, Llama 3.1 405B, optimized for inference and fine-tuning with a 128K context window. Snowflake's AI Research Team has developed a Massive LLM Inference and Fine-Tuning System Stack, offering up to 3x lower latency and 1.4x higher throughput than existing solutions. The system enables fine-tuning on a single GPU node, reducing costs and complexity for developers.
Snowflake is also making Cortex Guard generally available, leveraging Meta's Llama Guard 2 to ensure AI safety. This collaboration aims to provide enterprises with efficient and trusted access to state-of-the-art AI models, supporting various use cases including real-time inference, high-throughput processing, and long context support.
Snowflake ha annunciato una collaborazione con Meta per ospitare la collezione di modelli di linguaggio di grandi dimensioni open-source Llama 3.1 all'interno di Snowflake Cortex AI. Questo include il modello più grande di Meta, Llama 3.1 405B, ottimizzato per l'inferenza e il fine-tuning con una finestra di contesto di 128K. Il team di ricerca AI di Snowflake ha sviluppato un Stack di Sistema di Inferenza e Fine-Tuning per LLM Massivi, che offre una latenza fino a 3 volte inferiore e una capacità di elaborazione 1.4 volte superiore rispetto alle soluzioni esistenti. Il sistema consente il fine-tuning su un singolo nodo GPU, riducendo costi e complessità per gli sviluppatori.
Snowflake sta inoltre rendendo Cortex Guard disponibile per tutti, sfruttando Llama Guard 2 di Meta per garantire la sicurezza dell'IA. Questa collaborazione mira a fornire alle imprese un accesso efficiente e affidabile a modelli di IA all'avanguardia, supportando vari casi d'uso, tra cui inferenza in tempo reale, elaborazione ad alta capacità e supporto per contesti lunghi.
Snowflake ha anunciado una asociación con Meta para albergar la colección de modelos de lenguaje de código abierto Llama 3.1 en Snowflake Cortex AI. Esto incluye el modelo más grande de Meta, Llama 3.1 405B, optimizado para inferencia y ajuste fino con una ventana de contexto de 128K. El equipo de investigación en IA de Snowflake ha desarrollado un Stack de Sistema de Inferencia y Ajuste Fino para LLM Masivos, que ofrece hasta 3 veces menos latencia y 1.4 veces más rendimiento que las soluciones existentes. El sistema permite el ajuste fino en un solo nodo GPU, reduciendo costos y complejidad para los desarrolladores.
Snowflake también está haciendo Cortex Guard disponible de forma general, aprovechando Llama Guard 2 de Meta para garantizar la seguridad de la IA. Esta colaboración tiene como objetivo proporcionar a las empresas un acceso eficiente y confiable a modelos de IA de última generación, apoyando varios casos de uso, incluida la inferencia en tiempo real, el procesamiento de alto rendimiento y el soporte de contexto largo.
Snowflake는 Snowflake Cortex AI에서 오픈 소스 대형 언어 모델(LLM) 컬렉션인 Llama 3.1을 호스팅하기 위해 Meta와 협력 관계를 맺었다고 발표했다. 여기에는 Meta의 가장 큰 모델인 Llama 3.1 405B가 포함되어 있으며, 이는 128K 컨텍스트 윈도우를 갖춘 추론 및 미세 조정을 위해 최적화되어 있다. Snowflake의 AI 연구팀은 기존 솔루션보다 최대 3배 낮은 지연 시간과 1.4배 높은 처리량을 제공하는 대형 LLM 추론 및 미세 조정 시스템 스택을 개발했다. 이 시스템은 단일 GPU 노드에서 미세 조정을 가능하게 하여 개발자에게 비용과 복잡성을 줄여준다.
또한 Snowflake는 AI 안전을 보장하기 위해 Meta의 Llama Guard 2를 활용하는 Cortex Guard를 일반적으로 제공하고 있다. 이 협업은 기업에 최신 AI 모델에 대한 효율적이고 신뢰할 수 있는 접근을 제공하여 실시간 추론, 고처리량 처리 및 긴 컨텍스트 지원을 포함한 다양한 사용 사례를 지원하는 것을 목표로 한다.
Snowflake a annoncé un partenariat avec Meta pour héberger la collection de modèles de langage open-source Llama 3.1 dans Snowflake Cortex AI. Cela inclut le plus grand modèle de Meta, Llama 3.1 405B, optimisé pour l'inférence et le fine-tuning avec une fenêtre de contexte de 128K. L'équipe de recherche en IA de Snowflake a développé un Système Empilable d'Inference et de Fine-Tuning pour LLM Massifs, offrant jusqu'à 3 fois moins de latence et un débit 1,4 fois supérieur à celui des solutions existantes. Le système permet le fine-tuning sur un seul nœud GPU, réduisant ainsi les coûts et la complexité pour les développeurs.
Snowflake met également Cortex Guard à la disposition de tous, tirant parti de Llama Guard 2 de Meta pour garantir la sécurité de l'IA. Cette collaboration vise à fournir aux entreprises un accès efficace et fiable à des modèles d'IA de pointe, soutenant divers cas d'utilisation, y compris l'inférence en temps réel, le traitement à haut débit et le support de longs contextes.
Snowflake hat eine Partnerschaft mit Meta angekündigt, um die Sammlung von Open-Source-Transformersprachmodellen Llama 3.1 in Snowflake Cortex AI zu hosten. Dazu gehört Metas größtes Modell, Llama 3.1 405B, das für Inferenz und Feineinstellung mit einem 128K-Kontextfenster optimiert ist. Das AI-Forschungsteam von Snowflake hat einen Massive LLM Inference and Fine-Tuning System Stack entwickelt, der eine bis zu 3-fach niedrigere Latenz und eine 1,4-fach höhere Durchsatzrate als bestehende Lösungen bietet. Das System ermöglicht das Feintuning auf einem einzelnen GPU-Knoten, was die Kosten und Komplexität für Entwickler reduziert.
Snowflake macht außerdem Cortex Guard allgemein verfügbar, das Llama Guard 2 von Meta nutzt, um die Sicherheit von KI zu gewährleisten. Diese Zusammenarbeit zielt darauf ab, Unternehmen einen effizienten und vertrauenswürdigen Zugang zu modernen KI-Modellen zu bieten, die verschiedene Anwendungsfälle unterstützen, darunter Echtzeiteinfluss, Hochdurchsatzverarbeitung und Unterstützung für lange Kontexte.
- Partnership with Meta to host Llama 3.1 models in Snowflake Cortex AI
- Optimization of Llama 3.1 405B for inference and fine-tuning with 128K context window
- Development of Massive LLM Inference and Fine-Tuning System Stack with improved performance
- Ability to fine-tune Llama 3.1 405B on a single GPU node, reducing costs
- Launch of Cortex Guard for enhanced AI safety measures
- None.
Insights
Snowflake's collaboration with Meta to integrate the Llama 3.1 LLM within Snowflake Cortex AI represents a significant leap in AI capabilities and accessibility. The technical intricacies of this partnership, particularly the fine-tuning and high-throughput inference capabilities, position Snowflake at the forefront of AI innovation. By enabling real-time inference and supporting a vast 128K context window, Snowflake ensures that enterprises can harness AI at unprecedented scales. This move not only enhances Snowflake's tech stack but also broadens its appeal to a wider range of businesses looking to implement advanced AI solutions without the heavy infrastructure costs traditionally associated with such technologies.
Moreover, the open-sourcing of their Massive LLM Inference and Fine-Tuning System Optimization Stack is a game-changer. This transparency and collaboration with the broader AI community, including DeepSpeed and Hugging Face, ensure that Snowflake remains a pivotal player in AI advancements. The emphasis on memory optimization and parallelism techniques further cements Snowflake's reputation as a leading AI research entity. For tech investors, this collaboration signals a robust future growth trajectory for Snowflake, driven by continuous innovation and strategic partnerships.
This partnership between Snowflake and Meta is poised to have a substantial market impact, particularly in the AI and data cloud sectors. Snowflake's ability to integrate and optimize Meta's Llama 3.1 for enterprise use cases positions the company as a key enabler of advanced AI technologies. The promise of reduced latency and increased throughput is likely to attract large-scale enterprises that require efficient and scalable AI solutions. Additionally, the inclusion of safety and trust measures, such as Cortex Guard, addresses a significant market concern, thereby enhancing Snowflake's value proposition.
From a market perspective, this collaboration is likely to boost Snowflake's customer acquisition and retention rates. Enterprises now have a compelling reason to choose Snowflake over competitors, given the ease of access to advanced AI models and the significant cost savings from reduced infrastructure needs. This could lead to increased revenue streams and market share for Snowflake. Investors should keep an eye on customer adoption rates and any upcoming financial disclosures from Snowflake that highlight the impact of this partnership.
Snowflake's strategic partnership with Meta to host and optimize the Llama 3.1 LLM within Snowflake Cortex AI has the potential to significantly affect the company's financial outlook. The collaboration not only enhances Snowflake's product offerings but also positions it as a leader in the AI and data cloud market. The ability to offer high-throughput, real-time AI solutions with reduced latency and infrastructure costs is a compelling value proposition that could drive substantial revenue growth.
Financially, this partnership could translate into higher subscription and usage rates for Snowflake's AI services. By reducing the complexity and cost associated with deploying large AI models, Snowflake can attract a broader range of customers, from startups to large enterprises. This could lead to increased Annual Recurring Revenue (ARR) and improved profit margins. Investors should monitor upcoming earnings reports for any quantitative insights into the financial benefits stemming from this partnership.
Snowflake’s AI Research Team, in collaboration with the open source community, launches a Massive LLM Inference and Fine-Tuning System Stack — establishing a new state-of-the-art solution for open source inference and fine-tuning systems for multi-hundred billion parameter models like Llama 3.1 405B
No-Headquarters/
Snowflake Teams Up with Meta to Host and Optimize New Flagship Model Family in Snowflake Cortex AI (Graphic: Business Wire)
By partnering with Meta, Snowflake is providing customers with easy, efficient, and trusted ways to seamlessly access, fine-tune, and deploy Meta’s newest models in the AI Data Cloud, with a comprehensive approach to trust and safety built-in at the foundational level.
“Snowflake’s world-class AI Research Team is blazing a trail for how enterprises and the open source community can harness state-of-the-art open models like Llama 3.1 405B for inference and fine-tuning in a way that maximizes efficiency,” said Vivek Raghunathan, VP of AI Engineering, Snowflake. “We’re not just bringing Meta’s cutting-edge models directly to our customers through Snowflake Cortex AI. We’re arming enterprises and the AI community with new research and open source code that supports 128K context windows, multi-node inference, pipeline parallelism, 8-bit floating point quantization, and more to advance AI for the broader ecosystem.”
Snowflake’s Industry-Leading AI Research Team Unlocks the Fastest, Most Memory Efficient Open Source Inference and Fine-Tuning
Snowflake’s AI Research Team continues to push the boundaries of open source innovations through its regular contributions to the AI community and transparency around how it is building cutting-edge LLM technologies. In tandem with the launch of Llama 3.1 405B, Snowflake’s AI Research Team is now open sourcing its Massive LLM Inference and Fine-Tuning System Optimization Stack in collaboration with DeepSpeed, Hugging Face, vLLM, and the broader AI community. This breakthrough establishes a new state-of-the-art for open source inference and fine-tuning systems for multi-hundred billion parameter models.
Massive model scale and memory requirements pose significant challenges for users aiming to achieve low-latency inference for real-time use cases, high throughput for cost effectiveness, and long context support for various enterprise-grade generative AI use cases. The memory requirements of storing model and activation states also make fine-tuning extremely challenging, with the large GPU clusters required to fit the model states for training often inaccessible to data scientists.
Snowflake’s Massive LLM Inference and Fine-Tuning System Optimization Stack addresses these challenges. By using advanced parallelism techniques and memory optimizations, Snowflake enables fast and efficient AI processing, without needing complex and expensive infrastructure. For Llama 3.1 405B, Snowflake’s system stack delivers real-time, high-throughput performance on just a single GPU node and supports a massive 128k context windows across multi-node setups. This flexibility extends to both next-generation and legacy hardware, making it accessible to a broader range of businesses. Moreover, data scientists can fine-tune Llama 3.1 405B using mixed precision techniques on fewer GPUs, eliminating the need for large GPU clusters. As a result, organizations can adapt and deploy powerful enterprise-grade generative AI applications easily, efficiently, and safely.
Snowflake’s AI Research Team has also developed optimized infrastructure for fine-tuning inclusive of model distillation, safety guardrails, retrieval augmented generation (RAG), and synthetic data generation so that enterprises can easily get started with these use cases within Cortex AI.
Snowflake Cortex AI Furthers Commitment to Delivering Trustworthy, Responsible AI
AI safety is of the utmost importance to Snowflake and its customers. As a result, Snowflake is making Snowflake Cortex Guard generally available to further safeguard against harmful content for any LLM application or asset built in Cortex AI — either using Meta's latest models, or the LLMs available from other leading providers including AI21 Labs, Google, Mistral AI, Reka, and Snowflake itself. Cortex Guard leverages Meta’s Llama Guard 2, further unlocking trusted AI for enterprises so they can ensure that the models they’re using are safe.
Comments on the News from Snowflake Customers and Partners
“As a leader in the hospitality industry, we rely on generative AI to deeply understand and quantify key topics within our Voice of the Customer platform. Gaining access to Meta’s industry-leading Llama models within Snowflake Cortex AI empowers us to further talk to our data, and glean the necessary insights we need to move the needle for our business,” said Dave Lindley, Sr. Director of Data Products, E15 Group. “We’re looking forward to fine-tuning and testing Llama to drive real-time action in our operations based on live guest feedback."
“Safety and trust are a business imperative when it comes to harnessing generative AI, and Snowflake provides us with the assurances we need to innovate and leverage industry-leading large language models at scale,” said Ryan Klapper, an AI leader at Hakkoda. “The powerful combination of Meta’s Llama models within Snowflake Cortex AI unlocks even more opportunities for us to service internal RAG-based applications. These applications empower our stakeholders to interact seamlessly with comprehensive internal knowledge bases, ensuring they have access to accurate and relevant information whenever needed.”
“By harnessing Meta’s Llama models within Snowflake Cortex AI, we're giving our customers access to the latest open source LLMs," said Matthew Scullion, Matillion CEO and co-founder. “The upcoming addition of Llama 3.1 gives our team and users even more choice and flexibility to access the large language models that suit use cases best, and stay on the cutting-edge of AI innovation. Llama 3.1 within Snowflake Cortex AI will be immediately available with Matillion on Snowflake's launch day."
“As a leader in the customer engagement and customer data platform space, Twilio's customers need access to the right data to create the right message for the right audience at the right time,” said Kevin Niparko VP, Product and Technology Strategy, Twilio Segment. “The ability to choose the right model for their use case within Snowflake Cortex AI empowers our joint customers to generate AI-driven, intelligent insights and easily activate them in downstream tools. In an era of rapid evolution, businesses need to iterate quickly on unified data sets to drive the best outcomes.”
Learn More:
- For enterprises interested in distilling Llama 3.1 405B for their domain-specific use cases and getting additional support from Snowflake’s AI Research Team, fill out this form.
- More details on how to get started with Llama 3.1 405B and Snowflake Cortex AI can be found in this quickstart guide.
- Double click into the various ways developers can harness Llama 3.1 405B within Snowflake Cortex AI in this blog post.
- Dive into the technical details of how Snowflake’s AI Research Team is enabling efficient and cost-effective inference, alongside the fine-tuning of massive multi-hundred billion parameter models.
- Learn more about the continued innovation coming out of Snowflake’s AI Research Team, and meet the experts driving the future of AI forward in the AI Research hub.
- Stay on top of the latest news and announcements from Snowflake on LinkedIn and Twitter / X.
Forward Looking Statements
This press release contains express and implied forward-looking statements, including statements regarding (i) Snowflake’s business strategy, (ii) Snowflake’s products, services, and technology offerings, including those that are under development or not generally available, (iii) market growth, trends, and competitive considerations, and (iv) the integration, interoperability, and availability of Snowflake’s products with and on third-party platforms. These forward-looking statements are subject to a number of risks, uncertainties and assumptions, including those described under the heading “Risk Factors” and elsewhere in the Quarterly Reports on Form 10-Q and the Annual Reports on Form 10-K that Snowflake files with the Securities and Exchange Commission. In light of these risks, uncertainties, and assumptions, actual results could differ materially and adversely from those anticipated or implied in the forward-looking statements. As a result, you should not rely on any forward-looking statements as predictions of future events.
© 2024 Snowflake Inc. All rights reserved. Snowflake, the Snowflake logo, and all other Snowflake product, feature and service names mentioned herein are registered trademarks or trademarks of Snowflake Inc. in
About Snowflake
Snowflake makes enterprise AI easy, efficient and trusted. Thousands of companies around the globe, including hundreds of the world’s largest, use Snowflake’s AI Data Cloud to share data, build applications, and power their business with AI. The era of enterprise AI is here. Learn more at snowflake.com (NYSE: SNOW).
View source version on businesswire.com: https://www.businesswire.com/news/home/20240723098720/en/
Kaitlyn Hopkins
Senior Product PR Lead, Snowflake
press@snowflake.com
Source: Snowflake Inc.
FAQ
What is Snowflake's (SNOW) new partnership with Meta for AI models?
How has Snowflake (SNOW) optimized the Llama 3.1 405B model?
What is the Massive LLM Inference and Fine-Tuning System Stack developed by Snowflake (SNOW)?