AWS Announces Five New Database and Analytics Capabilities
At AWS re:Invent, Amazon announced five new capabilities for its database and analytics services aimed at enhancing data management and analysis for customers handling petabyte-scale workloads. Key updates include the Amazon DocumentDB Elastic Clusters for scaling document workloads, Amazon OpenSearch Serverless for automatic scaling of search workloads, Amazon Athena for Apache Spark for lightning-fast interactive queries, and AWS Glue Data Quality for improved data monitoring and management. Additionally, Amazon Redshift now supports multi-AZ deployments for heightened availability.
- Amazon DocumentDB Elastic Clusters allow scaling beyond single node limits, supporting millions of writes per second and up to 2 petabytes of data.
- Amazon OpenSearch Serverless automatically scales to handle variable workloads, enhancing performance and efficiency.
- Amazon Athena for Apache Spark reduces interactive analytics startup time to less than one second.
- AWS Glue Data Quality significantly reduces the time for data analysis and rule identification from days to hours.
- Amazon Redshift's multi-AZ support increases reliability and availability for mission-critical workloads.
- None.
AWS Glue Data Quality cuts time for data analysis and rule identification from days to hours by automatically measuring, monitoring, and managing data quality in data lakes and across data pipelines
“Data is inherently dynamic, and harnessing it to its full potential requires an end-to-end data strategy that can scale with a customer’s needs and accommodate all types of use cases—both now and in the future,” said
Organizations today create and store petabytes—or even exabytes—of data from a growing number of sources (e.g., digital media, online transactions, and connected devices). To maximize the value of this data, customers need an end-to-end data strategy that provides access to the right tools for all data workloads and applications, along with the ability to perform reliably at scale as the volume and velocity of data increase. To support customers designing their own end-to-end data strategies, AWS offers the industry’s most comprehensive set of data services and solutions. This includes fully managed databases optimized for customers’ most important use cases, such as
-
Amazon DocumentDB Elastic Clusters power petabyte-scale applications with millions of writes per second: Tens of thousands of customers useAmazon DocumentDB to run their document workloads because it is fast, scalable, highly available, and fully managed. While eachAmazon DocumentDB node can scale up to 64 tebibytes of data and support millions of read requests per second, a subset of customers with extremely demanding workloads needs the ability to scale beyond these limits to support millions of writes per second and store petabytes of data. Previously, these customers had to manually distribute data and manage capacity across multipleAmazon DocumentDB nodes.Amazon DocumentDB Elastic Clusters allow customers to scale beyond the limits of a single database node within minutes, supporting millions of reads and writes per second and storing up to 2 petabytes of data. As workload demands increase,Amazon DocumentDB Elastic Clusters take advantage of a distributed storage system to automatically divide large datasets across multiple nodes. This removes the need for customers to write custom code to distribute datasets and manually manage capacity across nodes. The underlying infrastructure is managed automatically, so customers can easily scale capacity based on their needs without needing to provision, scale, or manage database clusters. To learn more aboutAmazon DocumentDB Elastic Clusters, visit aws.amazon.com/documentdb/features/#elastic_clusters. -
Amazon OpenSearch Serverless automatically scales search and analytics workloads: To power use cases like website search and real-time application monitoring, tens of thousands of customers useAmazon OpenSearch Service. Many of these workloads are prone to sudden, intermittent spikes in usage, making capacity planning difficult.Amazon OpenSearch Serverless automatically provisions, configures, and scales OpenSearch infrastructure to deliver fast data ingestion and millisecond query responses, even for unpredictable and intermittent workloads. WithAmazon OpenSearch Serverless, data ingestion and search resources scale independently, allowing these operations to run concurrently without any performance impact. Customers usingAmazon OpenSearch Serverless get access to serverless benefits (e.g., automatic provisioning, on-demand scaling, and pay-for-use pricing), along withAmazon OpenSearch Service features, such as built-in data visualizations, that help them understand log data, identify anomalies, and see search relevance rankings. To learn more aboutAmazon OpenSearch Serverless, visit aws.amazon.com/opensearch-service/features/serverless. -
Amazon Athena for Apache Spark accelerates startup of interactive analytics to less than one second: Customers useAmazon Athena, a serverless interactive query service, because it is one of the easiest and fastest ways to query petabytes of data inAmazon Simple Storage Service (Amazon S3) using a standard SQL interface. Many customers are looking for that same ease of use when it comes to using Apache Spark, an open-source processing framework for big data workloads that supports popular language frameworks (i.e., Java, Scala, Python, and R). While developers enjoy the fast query speed and ease of use of Apache Spark, they do not want to invest time setting up, managing, and scaling their own Apache Spark infrastructure each time they want to run a query. Now, withAmazon Athena for Apache Spark, customers do not have to provision, configure, and scale resources themselves. Interactive Apache Spark applications start in less than one second and execute faster than open source using AWS’s optimized Spark runtime. BecauseAmazon Athena is integrated with other AWS services, customers can query data from multiple sources, chain calculations together for complex analyses, and visualize the results.Amazon Athena for Apache Spark automatically determines the resources required based on application demand and scales as needed, so customers only pay for the queries they run. To get started withAmazon Athena for Apache Spark, visit aws.amazon.com/athena/spark. -
AWS Glue Data Quality automatically monitors and manages data freshness, accuracy, and integrity: Hundreds of thousands of customers use AWS Glue to build and manage modern data pipelines quickly, easily, and cost-effectively. Organizations need to monitor the data quality—a measure of the freshness, accuracy, and integrity of data—of the information in their data lakes and data pipelines to ensure it is high quality before using it to power their analysis or machine learning applications. But effective data-quality management is a time-consuming and complex process, requiring data engineers to spend days gathering detailed statistics on their data, manually identifying data-quality rules based on those statistics and applying them across thousands of datasets and data pipelines. Once these rules are implemented, data engineers must continuously monitor for errors or changes in the data to adjust rules accordingly. AWS Glue Data Quality automatically measures, monitors, and manages the data quality of
Amazon S3 data lakes and AWS Glue data pipelines, reducing the time for data analysis and rule identification from days to hours. AWS Glue Data Quality computes statistics for customer datasets (e.g., minimums, maximums, histograms, and correlations) and uses them to automatically recommend rules to ensure data freshness, accuracy, and integrity. Customers can schedule AWS Glue Data Quality to run periodically as data changes, automatically analyzing the data and proposing changes to quality rules to ensure relevance. Data engineers can configure actions to alert users or stop data pipelines when quality issues occur, without having to write code. To learn more about AWS Glue Data Quality, visit aws.amazon.com/glue/features/data-quality. -
Amazon Redshift now supports multi-AZ deployments: Tens of thousands of AWS customers collectively process exabytes of data withAmazon Redshift every day. To support these customers’ mission-critical workloads,Amazon Redshift offers capabilities that increase availability and reliability, such as automatic backups and the ability to relocate a cluster to another AZ in minutes. Many databases today use a primary-standby replication mode to support high availability where a single database serves live traffic, and standby copies replicate data from the live version in case they need to replace it. Building on these capabilities,Amazon Redshift now offers a high-availability configuration to enable fast recovery while minimizing the risk of data loss. WithAmazon Redshift Multi-AZ, clusters are deployed across multiple AZs and use all the resources to process read and write queries, eliminating the need for under-utilized standby copies and maximizing price performance for customers. Since a multi-AZ data warehouse is still managed as a singleAmazon Redshift data warehouse with one endpoint, no application changes are required to maintain business continuity. To learn more aboutAmazon Redshift Multi-AZ, visit aws.amazon.com/redshift/reliability.
Rippling brings together payroll, benefits, HR, IT, and more so their customers can manage employee operations in one place. “As our business continues to grow, we need the ability to scale beyond the limits of a single document database node,” said
riskCanvas, a software as a service (SaaS) product offering from Genpact, is a financial crime compliance solution that leverages cutting-edge big data, automation, and machine learning technologies to deliver compliance, efficiency, and automation to its clients. “riskCanvas’ Entity-Centric Monitoring incorporates transaction monitoring, external enrichment, watchlist screening, and negative news to automatically assess risk and alert high-risk customers only as the true risk of a customer exceeds predefined thresholds, substantially reducing the effort to meet regulatory compliance requirements. This requires significant and varied analytic processing that often experiences spiky and unpredictable data load,” said
United Airlines operates a large domestic and international route network, spanning cities large and small across the US and all six inhabited continents. “United Airlines is building hundreds of data- and analytics-driven tools for our customers and employees, which makes managing and maintaining data quality critical to our operations,” said
About
For over 15 years,
About
View source version on businesswire.com: https://www.businesswire.com/news/home/20221130005919/en/
Media Hotline
Amazon-pr@amazon.com
www.amazon.com/pr
Source:
FAQ
What are the new capabilities announced for Amazon (AMZN) at AWS re:Invent?
How does Amazon DocumentDB Elastic Clusters benefit users?
What improvements does Amazon OpenSearch Serverless offer?
What is the significance of Amazon Athena for Apache Spark's new feature?
How does AWS Glue Data Quality improve data management for Amazon (AMZN) users?