Amazon S3 Expands Capabilities with Managed Apache Iceberg Tables for Faster Data Lake Analytics and Automatic Metadata Generation to Simplify Data Discovery and Understanding
Amazon Web Services announced new Amazon S3 features making it the first cloud object store with fully-managed Apache Iceberg support. The key innovations include Amazon S3 Tables, delivering up to 3x faster query performance and 10x higher transactions per second for analytics workloads, and Amazon S3 Metadata, which automatically generates queryable metadata in near real-time.
S3 Tables introduces a new bucket type optimized for tabular data as Iceberg tables, while S3 Metadata streamlines data discovery by capturing and storing object metadata in S3 Tables. These features are Apache Iceberg table-compatible, allowing customers to query data using AWS analytics services and open source tools.
Amazon Web Services ha annunciato nuove funzionalità di Amazon S3, rendendolo il primo servizio di storage a oggetti cloud con supporto completamente gestito per Apache Iceberg. Le principali innovazioni includono Amazon S3 Tables, che fornisce prestazioni di query fino a 3 volte più veloci e 10 volte più transazioni al secondo per carichi di lavoro di analisi, e Amazon S3 Metadata, che genera automaticamente metadati interrogabili in tempo quasi reale.
Le S3 Tables introducono un nuovo tipo di bucket ottimizzato per dati tabulari come le tabelle Iceberg, mentre S3 Metadata semplifica la scoperta dei dati catturando e memorizzando i metadati degli oggetti nelle S3 Tables. Queste funzionalità sono compatibili con le tabelle Apache Iceberg, consentendo ai clienti di interrogare i dati utilizzando i servizi di analisi AWS e strumenti open source.
Amazon Web Services anunció nuevas funciones de Amazon S3, convirtiéndolo en el primer almacenamiento de objetos en la nube con soporte completamente administrado para Apache Iceberg. Las principales innovaciones incluyen Amazon S3 Tables, que ofrece un rendimiento de consulta hasta 3 veces más rápido y 10 veces más transacciones por segundo para cargas de trabajo analíticas, y Amazon S3 Metadata, que genera automáticamente metadatos consultables en casi tiempo real.
Las S3 Tables introducen un nuevo tipo de bucket optimizado para datos tabulares como tablas Iceberg, mientras que S3 Metadata agiliza el descubrimiento de datos al capturar y almacenar los metadatos de los objetos en las S3 Tables. Estas funciones son compatibles con las tablas de Apache Iceberg, lo que permite a los clientes consultar datos utilizando servicios de análisis de AWS y herramientas de código abierto.
아마존 웹 서비스는 아파치 아이스버그(Apache Iceberg)에 대한 완전 관리형 지원을 갖춘 최초의 클라우드 객체 스토어로서 새로운 Amazon S3 기능을 발표했습니다. 주요 혁신 사항으로는 Amazon S3 Tables가 있으며, 이는 분석 작업에 대해 최대 3배 더 빠른 쿼리 성능과 초당 10배 더 많은 트랜잭션을 제공합니다. 또한 Amazon S3 Metadata는 거의 실시간으로 쿼리 가능한 메타데이터를 자동으로 생성합니다.
S3 Tables는 Iceberg 테이블과 같은 표 형식의 데이터를 최적화한 새로운 버킷 유형을 도입하며, S3 Metadata는 S3 Tables에 객체 메타데이터를 캡처하고 저장하여 데이터 검색을 용이하게 합니다. 이러한 기능은 Apache Iceberg 테이블과 호환되어 고객이 AWS 분석 서비스와 오픈 소스 도구를 사용하여 데이터를 쿼리할 수 있게 합니다.
Amazon Web Services a annoncé de nouvelles fonctionnalités pour Amazon S3, le rendant ainsi le premier stockage d'objets cloud avec prise en charge entièrement gérée d'Apache Iceberg. Les principales innovations incluent Amazon S3 Tables, qui offre des performances de requête jusqu'à 3 fois plus rapides et 10 fois plus de transactions par seconde pour les charges de travail analytiques, et Amazon S3 Metadata, qui génère automatiquement des métadonnées consultables en quasi temps réel.
Les S3 Tables introduisent un nouveau type de bucket optimisé pour les données tabulaires sous forme de tables Iceberg, tandis que S3 Metadata facilite la découverte de données en capturant et en stockant les métadonnées des objets dans les S3 Tables. Ces fonctionnalités sont compatibles avec les tables Apache Iceberg, permettant aux clients d'interroger les données en utilisant les services d'analyse AWS et des outils open source.
Amazon Web Services hat neue Funktionen für Amazon S3 angekündigt und es zum ersten cloudbasierten Objektspeicher mit vollständig verwaltetem Support für Apache Iceberg gemacht. Die wichtigsten Neuerungen umfassen Amazon S3 Tables, die eine bis zu 3-fach schnellere Abfrageleistung und 10-fach höhere Transaktionen pro Sekunde für Analysebelastungen bieten, sowie Amazon S3 Metadata, das automatisch abfragbare Metadaten in nahezu Echtzeit generiert.
S3 Tables führt einen neuen Buckettyp ein, der für tabellarische Daten als Iceberg-Tabellen optimiert ist, während S3 Metadata die Datenerkennung erleichtert, indem es die Metadaten von Objekten in S3 Tables erfasst und speichert. Diese Funktionen sind mit Apache Iceberg-Tabellen kompatibel und ermöglichen es den Kunden, Daten mit AWS-Analysetools und Open-Source-Tools abzufragen.
- Delivers up to 3x faster query performance for analytics workloads
- Achieves up to 10x higher transactions per second
- Automatic table maintenance and optimization features reduce operational costs
- Seamless integration with existing AWS analytics services
- Real-time metadata generation capabilities enhance data discovery efficiency
- S3 Metadata feature is currently in preview, not fully available
- S3 Tables' integration with AWS Glue Data Catalog is still in preview phase
Insights
The launch of Amazon S3 Tables with Apache Iceberg support represents a significant technical advancement for AWS's data infrastructure. The
The integration with AWS analytics services and compatibility with open-source tools positions this as a strategic infrastructure upgrade that will likely accelerate enterprise adoption of data lake architectures. The automatic metadata generation and real-time querying capabilities could dramatically reduce data discovery time and improve resource utilization for large-scale analytics operations.
This development has substantial cost implications for enterprises managing large-scale data operations. By eliminating the need for dedicated teams to maintain table management systems and metadata infrastructure, organizations can realize significant operational cost savings. The automation of complex tasks like compaction and snapshot management reduces technical debt while improving performance.
The endorsements from major enterprises like Roche and Genesys validate the business value proposition. For Roche specifically, the metadata capabilities will streamline their AI/ML operations, while Genesys's planned implementation for materialized views demonstrates the solution's versatility for different use cases.
Amazon S3 Tables deliver up to 3x faster query performance and up to 10x higher transactions per second for analytics workloads; Amazon S3 Metadata delivers queryable object metadata in near real time to search, organize, and augment data to accelerate data discovery
- Amazon S3 Tables is the first cloud object store with built-in Apache Iceberg table support and introduces a new bucket type to optimize storage and querying of tabular data as Iceberg tables, delivering up to 3x faster query performance, up to 10x higher transactions per second (TPS), and automated table maintenance and automation for analytics workloads.
- Amazon S3 Metadata streamlines data discovery in near real-time by automatically capturing queryable object metadata, as well as custom metadata using object tags, storing it in S3 Tables for accelerating analytics across data lakes.
“As the leading object store in the world with more than 400 trillion objects, S3 is used by millions of customers, and we continue to innovate to remove the complexity of working with data at an unprecedented scale,” said Andy Warfield, vice president, Storage, and distinguished engineer, AWS. “We have seen the rapid rise of tabular data and, increasingly, customers want to query across tables, improve query performance, and understand and organize troves of data so they can easily find exactly what they need. S3 Tables and S3 Metadata remove the overhead of organizing and operating table and metadata stores on top of objects, so customers can shift their focus back to building with their data.”
S3 Tables and S3 Metadata are Apache Iceberg table-compatible so customers can easily query their data using AWS analytics services and open source tools, including Amazon Athena, Amazon QuickSight, and Apache Spark.
Amazon S3 Tables—the easiest and fastest way to perform analytics on Apache Iceberg tables in S3
Many customers today organize the data they use for analytics as tabular data, most often stored in Apache Parquet, a file format optimized for data queries. Parquet has become one of the fastest growing data types in S3, and customers increasingly want to be able to query these growing tabular data sets—often turning to open table formats (OTF), an open source standard for storing data in tables—because it helps organize, update, and track changes to large amounts of data. Iceberg has become the most popular OTFs to manage Parquet files, with customers using Iceberg to query across billions of files containing petabytes or even exabytes of data. However, Iceberg can be challenging for customers to manage as they scale, often requiring dedicated teams to build and maintain systems to handle table maintenance and data compaction, as well as manage access control. These external systems are costly and complex, and they require skilled teams to maintain, using up valuable resources.
Amazon S3 Tables are purpose-built for managing Apache Iceberg tables for data lakes. S3 Tables are specifically optimized for analytics workloads, delivering up to 3x faster query performance and 10x higher TPS compared to general purpose S3 buckets. S3 Tables automatically manage table maintenance tasks such as compaction for better query performance and snapshot management to continuously optimize query performance and storage costs, even as customers’ data lakes scale and evolve. Customers can use S3 Tables by creating a table bucket that optimizes the storage and querying of tabular data in fully-managed Iceberg tables. With S3 Tables, customers benefit from Iceberg capabilities like row-level transactions, queryable snapshots via time travel functionality, schema evolution, and more. In addition, S3 Tables provide table-level access controls, allowing customers to define permissions.
Genesys, a global leader in AI-powered experience orchestration, plans to leverage Amazon S3 for its data lake. By utilizing S3 Tables' managed Iceberg support, Genesys expects to offer a materialized view layer for its diverse data analysis needs. S3 Tables’ built-in support for Iceberg tables will simplify complex data workflows by automating key maintenance tasks such as table compaction, snapshot management, and unreferenced file cleanup. Genesys is looking forward to improved performance and broad support from Iceberg-compliant analytics tools that can read and write Iceberg tables directly from S3. S3 Tables will be foundational to Genesys' future data strategy, enabling the company to deliver faster, more flexible, and reliable data insights to support its AI-driven customer and employee experience solutions.
Amazon S3 Metadata—the easiest and fastest way to discover and understand data in S3
As more customers use S3 as their central data repository, the volume and variety of data have grown exponentially, with metadata becoming increasingly important as a way to understand and organize large amounts of data so customers can find the exact objects they need. To address this problem, many customers resort to building and maintaining complex metadata capture and storage systems to enrich their understanding of data. But these metadata systems are expensive, time-consuming, and resource-intensive, often requiring data engineers to manually track and update metadata as it flows through their processing pipelines, as well as data analysts to manually inspect massive object stores to find the specific data they need for analytics and AI/ML data processing workflows.
Amazon S3 Metadata automatically generates queryable object metadata in near real-time to help accelerate data discovery and improve data understanding, eliminating the need for customers to build and maintain their own complex metadata systems. S3 Metadata lets customers query, find, and use data for business analytics, real-time inference applications, and more. S3 Metadata automatically generates object metadata, which includes system-defined details like size and source of the object, and makes it queryable via new S3 Tables. S3 Metadata updates object metadata in S3 Tables as objects are added or removed, giving customers an up-to-date view of their data. Customers can add their own custom metadata using object tags to annotate objects with information specific to their business, such as product SKUs, transaction IDs, or content ratings, or with customer details. Customers can easily query metadata using a simple SQL query, enabling them to quickly find and prepare data for use in business analytics and real-time inference applications, as well as fine-tune foundation models, perform retrieval augmented generation (RAG), integrate data warehouse and analytics workflows, perform targeted storage optimization tasks, and more.
Organizations of all sizes are set to benefit from the data discovery and understanding that S3 Metadata will bring. Roche, a leading biotech company, plans to leverage S3 Metadata to accelerate their future generative AI initiatives. As they develop advanced large language model (LLM) applications like sophisticated internal chatbots, they anticipate managing exponentially larger volumes of unstructured data for enhanced RAG. S3 Metadata will simplify the creation of a scalable metadata system, automatically surfacing and updating metadata as new data is ingested. Roche envisions using custom Lambda functions to extract complex, business-specific metadata, integrating it seamlessly with S3 Metadata in a comprehensive Glue catalog. This will enable more efficient organization and rapid identification of relevant datasets for cutting-edge AI applications, allowing Roche to focus on groundbreaking innovations in personalized healthcare.
Cambridge Mobile Telematics (CMT) is the world’s largest telematics service provider. The company gathers sensor data from devices and enriches it with contextual data to create a unified view of vehicle and driver behavior that auto insurers, automakers, commercial mobility companies, and the public sector use to power risk assessment, safety, claims, and driver improvement programs. CMT stores and analyzes multiple petabytes of data from millions of IoT devices worldwide. As CMT scales, locating specific data for developing new insights and models becomes increasingly challenging. S3 Metadata, including system and custom metadata, allows CMT to query petabytes of metadata, making finding relevant data simple and cost-effective.
S3 Tables (generally available) and S3 Metadata (preview) are available today. S3 Tables’ integration with AWS Glue Data Catalog is in preview, allowing customers to query and visualize data—including S3 Metadata tables—using AWS Analytics services such as Amazon Athena, Redshift, EMR, and QuickSight.
To learn more, visit:
- S3 Tables and S3 Metadata AWS News Blog posts for details on today’s announcements.
- S3 Tables and S3 Metadata product detail pages to learn more about their capabilities.
- S3 Tables and S3 Metadata videos for explanations on how they work.
About Amazon Web Services
Since 2006, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud. AWS has been continually expanding its services to support virtually any workload, and it now has more than 240 fully featured services for compute, storage, databases, networking, analytics, machine learning and artificial intelligence (AI), Internet of Things (IoT), mobile, security, hybrid, media, and application development, deployment, and management from 108 Availability Zones within 34 geographic regions, with announced plans for 18 more Availability Zones and six more AWS Regions in
About Amazon
Amazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. Amazon strives to be Earth’s Most Customer-Centric Company, Earth’s Best Employer, and Earth’s Safest Place to Work. Customer reviews, 1-Click shopping, personalized recommendations, Prime, Fulfillment by Amazon, AWS, Kindle Direct Publishing, Kindle, Career Choice, Fire tablets, Fire TV, Amazon Echo, Alexa, Just Walk Out technology, Amazon Studios, and The Climate Pledge are some of the things pioneered by Amazon. For more information, visit amazon.com/about and follow @AmazonNews.
View source version on businesswire.com: https://www.businesswire.com/news/home/20241203462924/en/
Amazon.com, Inc.
Media Hotline
Amazon-pr@amazon.com
www.amazon.com/pr
Source: Amazon.com, Inc.
FAQ
What performance improvements does Amazon S3 Tables offer for AMZN's cloud storage?
When will Amazon S3 Metadata be generally available for AMZN customers?
What are the key features of Amazon S3 Tables for AMZN's cloud storage?