AWS Announces Eight New Amazon SageMaker Capabilities
Amazon announced eight new capabilities for SageMaker at AWS re:Invent, enhancing the machine learning (ML) lifecycle for developers and data scientists. Key features include SageMaker Role Manager for governance, Model Cards for streamlined documentation, and a centralized Model Dashboard for performance tracking. Additionally, the new Studio Notebooks facilitate real-time collaboration and quick data preparation. The platform also introduces automated model validation using real-time inference and improves handling of geospatial data for various applications, empowering users to leverage ML effectively.
- Enhanced governance with SageMaker Role Manager streamlines user access and permissions.
- Model Cards simplify documentation, saving time and reducing errors in model development.
- Model Dashboard provides comprehensive tracking of model performance and behavior.
- Studio Notebooks improve collaboration among data science teams, speeding up the development process.
- Automated model validation allows for real-time performance testing before production deployment.
- New geospatial capabilities enable efficient use of location data, speeding up model predictions and deployment.
- None.
New data preparation capability in
Data science teams can now collaborate in real time within
Customers can now automatically convert notebook code into production-ready jobs
Automated model validation enables customers to test new models using real-time inference requests
Support for geospatial data enables customers to more easily develop machine learning models for climate science, urban planning, disaster response, retail planning, precision agriculture, and more
“Today, tens of thousands of customers of all sizes and across industries rely on
The cloud enabled access to ML for more users, but until a few years ago, the process of building, training, and deploying models remained painstaking and tedious, requiring continuous iteration by small teams of data scientists for weeks or months before a model was production-ready.
New ML governance capabilities in
-
Amazon SageMaker Role Manager makes it easier to control access and permissions: Appropriate user-access controls are a cornerstone of governance and support data privacy, prevent information leaks, and ensure practitioners can access the tools they need to do their jobs. Implementing these controls becomes increasingly complex as data science teams swell to dozens or even hundreds of people. ML administrators—individuals who create and monitor an organization’s ML systems—must balance the push to streamline development while controlling access to tasks, resources, and data within ML workflows. Today, administrators create spreadsheets or use ad hoc lists to navigate access policies needed for dozens of different activities (e.g., data prep and training) and roles (e.g., ML engineer and data scientist). Maintaining these tools is manual, and it can take weeks to determine the specific tasks new users will need to do their jobs effectively.Amazon SageMaker Role Manager makes it easier for administrators to control access and define permissions for users. Administrators can select and edit prebuilt templates based on various user roles and responsibilities. The tool then automatically creates the access policies with necessary permissions within minutes, reducing the time and effort to onboard and manage users over time. -
Amazon SageMaker Model Cards simplify model information gathering: Today, most practitioners rely on disparate tools (e.g., email, spreadsheets, and text files) to document the business requirements, key decisions, and observations during model development and evaluation. Practitioners need this information to support approval workflows, registration, audits, customer inquiries, and monitoring, but it can take months to gather these details for each model. Some practitioners try to solve this by building complex recordkeeping systems, which is manual, time consuming, and error-prone.Amazon SageMaker Model Cards provide a single location to store model information in the AWS console, streamlining documentation throughout a model’s lifecycle. The new capability auto-populates training details like input datasets, training environment, and training results directly intoAmazon SageMaker Model Cards. Practitioners can also include additional information using a self-guided questionnaire to document model information (e.g., performance goals, risk rating), training and evaluation results (e.g., bias or accuracy measurements), and observations for future reference to further improve governance and support the responsible use of ML. -
Amazon SageMaker Model Dashboard provides a central interface to track ML models: Once a model has been deployed to production, practitioners want to track their model over time to understand how it performs and to identify potential issues. This task is normally done on an individual basis for each model, but as an organization starts to deploy thousands of models, this becomes increasingly complex and requires more time and resources.Amazon SageMaker Model Dashboard provides a comprehensive overview of deployed models and endpoints, enabling practitioners to track resources and model behavior in one place. From the dashboard, customers can also use built-in integrations withAmazon SageMaker Model Monitor (AWS’s model and data drift monitoring capability) andAmazon SageMaker Clarify (AWS’s ML bias-detection capability). This end-to-end visibility into model behavior and performance provides the necessary information to streamline ML governance processes and quickly troubleshoot model issues.
To learn more about
Next-generation Notebooks
-
Simplified data preparation: Practitioners want to explore datasets directly in notebooks to spot and correct potential data-quality issues (e.g., missing information, extreme values, skewed datasets, and biases) as they prepare data for training. Practitioners can spend months writing boilerplate code to visualize and examine different parts of their dataset to identify and fix problems.
Amazon SageMaker Studio Notebook now offers a built-in data preparation capability that allows practitioners to visually review data characteristics and remediate data-quality problems in just a few clicks—all directly in their notebook environment. When users display a data frame (i.e., a tabular representation of data) in their notebook,Amazon SageMaker Studio Notebook automatically generates charts to help users identify data-quality issues and suggests data transformations to help fix common problems. Once the practitioner selects a data transformation,Amazon SageMaker Studio Notebook generates the corresponding code within the notebook so it can be repeatedly applied every time the notebook is run. -
Accelerate collaboration across data science teams: After data has been prepared, practitioners are ready to start developing a model—an iterative process that may require teammates to collaborate within a single notebook. Today, teams must exchange notebooks and other assets (e.g., models and datasets) over email or chat applications to work on a notebook together in real time, leading to communication fatigue, delayed feedback loops, and version-control issues.
Amazon SageMaker now gives teams a workspace where they can read, edit, and run notebooks together in real time to streamline collaboration and communication. Teammates can review notebook results together to immediately understand how a model performs, without passing information back and forth. With built-in support for services like BitBucket and AWS CodeCommit, teams can easily manage different notebook versions and compare changes over time. Affiliated resources, like experiments and ML models, are also automatically saved to help teams stay organized. -
Automatic conversion of notebook code to production-ready jobs: When practitioners want to move a finished ML model into production, they usually copy snippets of code from the notebook into a script, package the script with all its dependencies into a container, and schedule the container to run. To run this job repeatedly on a schedule, they must set up, configure, and manage a continuous integration and continuous delivery (CI/CD) pipeline to automate their deployments. It can take weeks to get all the necessary infrastructure set up, which takes time away from core ML development activities.
Amazon SageMaker Studio Notebook now allows practitioners to select a notebook and automate it as a job that can run in a production environment. Once a notebook is selected,Amazon SageMaker Studio Notebook takes a snapshot of the entire notebook, packages its dependencies in a container, builds the infrastructure, runs the notebook as an automated job on a schedule set by the practitioner, and deprovisions the infrastructure upon job completion, reducing the time it takes to move a notebook to production from weeks to hours.
To begin using the next generation of
Automated validation of new models using real-time inference requests
Before deploying to production, practitioners test and validate every model to check performance and identify errors that could negatively impact the business. Typically, they use historical inference request data to test the performance of a new model, but this data sometimes fails to account for current, real-world inference requests. For example, historical data for an ML model to plan the fastest route might fail to account for an accident or a sudden road closure that significantly alters the flow of traffic. To address this issue, practitioners route a copy of the inference requests going to a production model to the new model they want to test. It can take weeks to build this testing infrastructure, mirror inference requests, and compare how models perform across key metrics (e.g., latency and throughput). While this provides practitioners with greater confidence in how the model will perform, the cost and complexity of implementing these solutions for hundreds or thousands of models makes it unscalable.
New geospatial capabilities in
Today, most data captured has geospatial information (e.g., location coordinates, weather maps, and traffic data). However, only a small amount of it is used for ML purposes because geospatial datasets are difficult to work with and can often be petabytes in size, spanning entire cities or hundreds of acres of land. To start building a geospatial model, customers typically augment their proprietary data by procuring third-party data sources like satellite imagery or map data. Practitioners need to combine this data, prepare it for training, and then write code to divide datasets into manageable subsets due to the massive size of geospatial data. Once customers are ready to deploy their trained models, they must write more code to recombine multiple datasets to correlate the data and ML model predictions. To extract predictions from a finished model, practitioners then need to spend days using open source visualization tools to render on a map. The entire process from data enrichment to visualization can take months, which makes it hard for customers to take advantage of geospatial data and generate timely ML predictions.
Capitec Bank is
EarthOptics is a soil-data-measurement and mapping company that leverages proprietary sensor technology and data analytics to precisely measure the health and structure of soil. “We wanted to use ML to help customers increase agricultural yields with cost-effective soil maps,” said
HERE Technologies is a leading location-data and technology platform that helps customers create custom maps and location experiences built on highly precise location data. “Our customers need real-time context as they make business decisions leveraging insights from spatial patterns and trends,” said
Intuit is the global financial technology platform that powers prosperity for more than 100 million customers worldwide with TurboTax, Credit Karma, QuickBooks, and Mailchimp. “We’re unleashing the power of data to transform the world of consumer, self-employed, and small business finances on our platform,” said
About
For over 15 years,
About
View source version on businesswire.com: https://www.businesswire.com/news/home/20221130005905/en/
Media Hotline
Amazon-pr@amazon.com
www.amazon.com/pr
Source:
FAQ
What new features did Amazon announce for SageMaker on November 30, 2022?
How does the SageMaker Role Manager improve machine learning governance?
What are SageMaker Model Cards and how do they benefit users?
What advantages do SageMaker Studio Notebooks provide?
How does automated model validation work in SageMaker?