AWS, Data Preparation and Information

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

The excitement is building for the fourteenth edition of AWS re:Invent, and as always, Las Vegas is set to host this spectacular event. The sessions showcase how Amazon Q can help you streamline coding, testing, and troubleshooting, as well as enable you to make the most of your data to optimize business operations.

AWS

AWS ML ML AI

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

To simplify infrastructure setup and accelerate distributed training, AWS introduced Amazon SageMaker HyperPod in late 2023. In this blog post, we showcase how you can perform efficient supervised fine tuning for a Meta Llama 3 model using PEFT on AWS Trainium with SageMaker HyperPod. architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/

AWS

AWS Clustering Deep Learning Deep Learning

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. Within the data flow, add an Amazon S3 destination node.

Data Preparation

Data Preparation ML ML Data Quality

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

By narrowing down the search space to the most relevant documents or chunks, metadata filtering reduces noise and irrelevant information, enabling the LLM to focus on the most relevant content. By combining the capabilities of LLM function calling and Pydantic data models, you can dynamically extract metadata from user queries.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

A secure approach to generative AI with AWS

AWS Machine Learning Blog

APRIL 16, 2024

They’re often used with highly sensitive business data, like personal data, compliance data, operational data, and financial information, to optimize the model’s output. At AWS, our top priority is safeguarding the security and confidentiality of our customers’ workloads.

AWS

AWS AI AI ML

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

AWS Machine Learning Blog

FEBRUARY 27, 2025

Through real-time retrieval of relevant medical information, RAG systems can provide more reliable and contextually appropriate responses, making them particularly valuable for healthcare applications where precision is crucial. The reports were de-identified using a rule-based approach to remove protected health information.

AWS

AWS AI AI ML

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

AWS Machine Learning Blog

AUGUST 20, 2024

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. About the Authors Charles Laughlin is a Principal AI Specialist at Amazon Web Services (AWS).

Data Preparation

Data Preparation ML ML AWS

Fine-tuning large language models (LLMs) for 2025

Dataconomy

NOVEMBER 11, 2024

This approach is ideal for use cases requiring accuracy and up-to-date information, like providing technical product documentation or customer support. For instance, prompts like “Provide a detailed but informal explanation” can shape the output significantly without requiring the model itself to be fine-tuned.

Data Preparation

Data Preparation Database Data Quality Machine Learning

Optimize data preparation with new features in AWS SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 4, 2023

Data preparation is a critical step in any data-driven project, and having the right tools can greatly enhance operational efficiency. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for machine learning (ML) from weeks to minutes.

Data Preparation

Data Preparation AWS ML ML

Using responsible AI principles with Amazon Bedrock Batch Inference

AWS Machine Learning Blog

NOVEMBER 21, 2024

These include safeguarding sensitive information, providing accuracy and relevance of AI-generated content, mitigating biases, maintaining transparency, and adhering to data protection regulations. Prerequisites To implement the proposed solution, make sure you have satisfied the following requirements: Have an active AWS account.

AI

AI AI AWS Data Preparation

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

Overview of multimodal embeddings and multimodal RAG architectures Multimodal embeddings are mathematical representations that integrate information not only from text but from multiple data modalities—such as product images, graphs, and charts—into a unified vector space.

AWS

AWS Computer Science Computer Science Database

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Yes, the AWS re:Invent season is upon us and as always, the place to be is Las Vegas! are the sessions dedicated to AWS DeepRacer ! Generative AI is at the heart of the AWS Village this year. You marked your calendars, you booked your hotel, and you even purchased the airfare. And last but not least (and always fun!)

AWS

AWS ML ML AI

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 15, 2024

Traditionally, developers have had two options when working with SageMaker: the AWS SDK for Python , also known as boto3 , or the SageMaker Python SDK. For this walkthrough, we use a straightforward generative AI lifecycle involving data preparation, fine-tuning, and a deployment of Meta’s Llama-3-8B LLM.

Python

Python AWS ML ML

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

AWS

AWS ML ML Machine Learning

Welcome to a New Era of Building in the Cloud with Generative AI on AWS

AWS Machine Learning Blog

NOVEMBER 30, 2023

The number of companies launching generative AI applications on AWS is substantial and building quickly, including adidas, Booking.com, Bridgewater Associates, Clariant, Cox Automotive, GoDaddy, and LexisNexis Legal & Professional, to name just a few. Innovative startups like Perplexity AI are going all in on AWS for generative AI.

AWS

AWS AI AI ML

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. AWS CodeBuild is a fully managed continuous integration service in the cloud.

AWS

AWS ML ML Machine Learning

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

MPII is using a machine learning (ML) bid optimization engine to inform upstream decision-making processes in power asset management and trading. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. Data comes from disparate sources in a number of formats.

AWS

AWS Machine Learning Machine Learning Analytics

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. For more information, see Zeta Global’s home page. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images. The model was then fine-tuned with training data from the data preparation stage. About the authors Scott Patterson is a Senior Solutions Architect at AWS.

AWS

AWS Data Lakes ML ML

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

AWS Machine Learning Blog

MAY 31, 2024

In this blog post and open source project , we show you how you can pre-train a genomics language model, HyenaDNA , using your genomic data in the AWS Cloud. Amazon SageMaker Amazon SageMaker is a fully managed ML service offered by AWS, designed to reduce the time and cost associated with training and tuning ML models at scale.

AWS

AWS ML ML Machine Learning

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

AWS Machine Learning Blog

JANUARY 6, 2023

The recently published IDC MarketScape: Asia/Pacific (Excluding Japan) AI Life-Cycle Software Tools and Platforms 2022 Vendor Assessment positions AWS in the Leaders category. AWS met the criteria and was evaluated by IDC along with eight other vendors. AWS is positioned in the Leaders category based on current capabilities.

AWS

AWS ML ML Data Preparation

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

Importing data from the SageMaker Data Wrangler flow allows you to interact with a sample of the data before scaling the data preparation flow to the full dataset. This improves time and performance because you don’t need to work with the entirety of the data during preparation.

ML

ML ML Data Preparation AWS

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

The General Data Protection Regulation (GDPR) right to be forgotten, also known as the right to erasure, gives individuals the right to request the deletion of their personally identifiable information (PII) data held by organizations. Knowledge Bases for Amazon Bedrock manages the end-to-end RAG workflow for you.

AWS

AWS Machine Learning Machine Learning Database

Prepare image data with Amazon SageMaker Data Wrangler

Flipboard

MAY 1, 2023

According to Gartner , unstructured data now represents 80–90% of all new enterprise data, but just 18% of organizations are taking advantage of this data. Today, we are happy to announce that with Amazon SageMaker Data Wrangler , you can perform image data preparation for machine learning (ML) using little to no code.

Data Preparation

Data Preparation AWS ML ML

Optimizing MLOps for Sustainability

AWS Machine Learning Blog

SEPTEMBER 11, 2024

AWS published Guidance for Optimizing MLOps for Sustainability on AWS to help customers maximize utilization and minimize waste in their ML workloads. The process begins with data preparation, followed by model training and tuning, and then model deployment and management. This leads to substantial resource consumption.

AWS

AWS Data Preparation ML ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker. These types of data are historical raw data from an ML perspective.

AWS

AWS ML ML ETL

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

This is where the AWS suite of low-code and no-code ML services becomes an essential tool. As a strategic systems integrator with deep ML experience, Deloitte utilizes the no-code and low-code ML tools from AWS to efficiently build and deploy ML models for Deloitte’s clients and for internal assets.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

With that, the need for data scientists and machine learning (ML) engineers has grown significantly. These skilled professionals are tasked with building and deploying models that improve the quality and efficiency of BMW’s business processes and enable informed leadership decisions.

ML

ML ML AWS AI

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. An Amazon DataZone domain and an associated Amazon DataZone project configured in your AWS account. Choose Data Wrangler in the navigation pane.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Advanced RAG patterns on Amazon SageMaker

AWS Machine Learning Blog

MARCH 28, 2024

However, when building generative AI applications, you can use an alternative solution that allows for the dynamic incorporation of external knowledge and allows you to control the information used for generation without the need to fine-tune your existing foundational model. license, for use without restrictions.

AWS

AWS Machine Learning Machine Learning AI

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 21, 2024

In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks.

AWS

AWS Data Preparation ML ML

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

At AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With Knowledge Bases for Amazon Bedrock, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for fully managed Retrieval Augmented Generation (RAG). What is Retrieval Augmented Generation?

AWS

AWS Database Python AI

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is a cloud data platform that provides data solutions for data warehousing to data science. Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics.

AWS

AWS Data Preparation Azure ML

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

Through evaluations of sensors and informed decision-making support, Afri-SET empowers governments and civil society for effective air quality management. The attempt is disadvantaged by the current focus on data cleaning, diverting valuable skills away from building ML models for sensor calibration.

AWS

AWS AI AI Python

Machine Learning with MATLAB and Amazon SageMaker

Flipboard

NOVEMBER 21, 2023

In recent years, MathWorks has brought many product offerings into the cloud, especially on Amazon Web Services (AWS). Here is a quick guide on how to run MATLAB on AWS. Installation of AWS Command-Line Interface (AWS CLI) , AWS Configure , and Python3. Set up AWS Configure to interact with AWS resources.

Machine Learning

Machine Learning Machine Learning AWS Decision Trees

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

To address this challenge, AWS recently announced the preview of Amazon Bedrock Custom Model Import , a feature that you can use to import customized models created in other environments—such as Amazon SageMaker , Amazon Elastic Compute Cloud (Amazon EC2) instances, and on premises—into Amazon Bedrock.

SQL

SQL AWS ML ML

The Ultimate Guide to Data Preparation for Machine Learning

DagsHub

FEBRUARY 29, 2024

Data, is therefore, essential to the quality and performance of machine learning models. This makes data preparation for machine learning all the more critical, so that the models generate reliable and accurate predictions and drive business value for the organization. Why do you need Data Preparation for Machine Learning?

Data Preparation

Data Preparation Machine Learning Machine Learning Data Governance

Implement real-time personalized recommendations using Amazon Personalize

AWS Machine Learning Blog

NOVEMBER 13, 2023

The post Architecting near real-time personalized recommendations with Amazon Personalize shows how to architect near real-time personalized recommendations using Amazon Personalize and AWS purpose-built data services. For more information, refer to Architecting near real-time personalized recommendations with Amazon Personalize.

AWS

AWS Data Preparation ML ML

GenASL: Generative AI-powered American Sign Language avatars

AWS Machine Learning Blog

AUGUST 26, 2024

AWS makes it possible for organizations of all sizes and developers of all skill levels to build and scale generative AI applications with security, privacy, and responsible AI. In this post, we dive into the architecture and implementation details of GenASL, which uses AWS generative AI capabilities to create human-like ASL avatar videos.

AWS

AWS AI AI ML

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. This query is used at the data processing step of the training pipeline to fetch data from the PrestoDB instance.

ML

ML ML AWS Machine Learning

Exploring data using AI chat at Domo with Amazon Bedrock

AWS Machine Learning Blog

SEPTEMBER 9, 2024

The next step is to provide them with a more intuitive and conversational interface to interact with their data, empowering them to generate meaningful visualizations and reports through natural language interactions. Solution overview The following diagram illustrates the solution architecture and data flow.

AI

AI AI AWS ML

Your guide to generative AI and ML at AWS re:Invent 2024

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Webinars

Trending Sources

Accelerate data preparation for ML in Amazon SageMaker Canvas

Webinars

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

A secure approach to generative AI with AWS

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Fine-tuning large language models (LLMs) for 2025

Optimize data preparation with new features in AWS SageMaker Data Wrangler

Using responsible AI principles with Amazon Bedrock Batch Inference

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Your guide to generative AI and ML at AWS re:Invent 2023

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Welcome to a New Era of Building in the Cloud with Generative AI on AWS

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Prepare image data with Amazon SageMaker Data Wrangler

Optimizing MLOps for Sustainability

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Advanced RAG patterns on Amazon SageMaker

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Improving air quality with generative AI

Machine Learning with MATLAB and Amazon SageMaker

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

The Ultimate Guide to Data Preparation for Machine Learning

Implement real-time personalized recommendations using Amazon Personalize

GenASL: Generative AI-powered American Sign Language avatars

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Exploring data using AI chat at Domo with Amazon Bedrock

Stay Connected