AWS, Data Preparation and Data Scientist

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

To simplify infrastructure setup and accelerate distributed training, AWS introduced Amazon SageMaker HyperPod in late 2023. In this blog post, we showcase how you can perform efficient supervised fine tuning for a Meta Llama 3 model using PEFT on AWS Trainium with SageMaker HyperPod. architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/

AWS

AWS Clustering Deep Learning Deep Learning

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

AWS Machine Learning Blog

FEBRUARY 27, 2025

Lets examine the key components of this architecture in the following figure, following the data flow from left to right. The workflow consists of the following phases: Data preparation Our evaluation process begins with a prompt dataset containing paired radiology findings and impressions.

AWS

AWS AI AI ML

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

Prerequisites Before proceeding with this tutorial, make sure you have the following in place: AWS account – You should have an AWS account with access to Amazon Bedrock. Knowledge base – You need a knowledge base created in Amazon Bedrock with ingested data and metadata. model in Amazon Bedrock.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Using responsible AI principles with Amazon Bedrock Batch Inference

AWS Machine Learning Blog

NOVEMBER 21, 2024

Prerequisites To implement the proposed solution, make sure you have satisfied the following requirements: Have an active AWS account. Have an S3 bucket to store your data prepared for batch inference. The method is designed to be cost-effective, flexible, and maintain high ethical standards.

AI

AI AI AWS Data Preparation

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

It offers an unparalleled suite of tools that cater to every stage of the ML lifecycle, from data preparation to model deployment and monitoring. You may be prompted to subscribe to this model through AWS Marketplace. On the AWS Marketplace listing , choose Continue to subscribe. Check out the Cohere on AWS GitHub repo.

AWS

AWS Computer Science Computer Science Database

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

AWS

AWS ML ML Machine Learning

Unlock proprietary data with Snorkel Flow and Amazon SageMaker

Snorkel AI

DECEMBER 2, 2024

We made this process much easier through Snorkel Flow’s integration with Amazon SageMaker and other tools and services from Amazon Web Services (AWS). At its core, Snorkel Flow empowers data scientists and domain experts to encode their knowledge into labeling functions, which are then used to generate high-quality training datasets.

AWS

AWS Machine Learning Machine Learning Data Preparation

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. Solution overview In this section, we provide an overview of three personas: the data admin, data publisher, and data scientist.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. In this post, you will learn how Marubeni is optimizing market decisions by using the broad set of AWS analytics and ML services, to build a robust and cost-effective Power Bid Optimization solution.

AWS

AWS Machine Learning Machine Learning Analytics

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In an increasingly digital and rapidly changing world, BMW Group’s business and product development strategies rely heavily on data-driven decision-making. With that, the need for data scientists and machine learning (ML) engineers has grown significantly.

ML

ML ML AWS AI

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Solution overview Amazon SageMaker is a fully managed service that helps developers and data scientists build, train, and deploy machine learning (ML) models. Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images.

AWS

AWS Data Lakes ML ML

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape. Let’s delve into the database portfolio from IBM available on AWS. 

AWS

AWS Database ETL AI

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

AWS Machine Learning Blog

JANUARY 6, 2023

The recently published IDC MarketScape: Asia/Pacific (Excluding Japan) AI Life-Cycle Software Tools and Platforms 2022 Vendor Assessment positions AWS in the Leaders category. The tools are typically used by data scientists and ML developers from experimentation to production deployment of AI and ML solutions. AWS position.

AWS

AWS ML ML Data Preparation

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker. The ETL pipeline, MLOps pipeline, and ML inference should be rebuilt in a different AWS account.

AWS

AWS ML ML ETL

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data.

SQL

SQL AWS Data Lakes AI

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

Optimizing MLOps for Sustainability

AWS Machine Learning Blog

SEPTEMBER 11, 2024

AWS published Guidance for Optimizing MLOps for Sustainability on AWS to help customers maximize utilization and minimize waste in their ML workloads. The process begins with data preparation, followed by model training and tuning, and then model deployment and management. This leads to substantial resource consumption.

AWS

AWS Data Preparation ML ML

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is a cloud data platform that provides data solutions for data warehousing to data science. Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics.

AWS

AWS Data Preparation Azure ML

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 6, 2024

In this solution, we fine-tune a variety of models on Hugging Face that were pre-trained on medical data and use the BioBERT model, which was pre-trained on the Pubmed dataset and performs the best out of those tried. We implemented the solution using the AWS Cloud Development Kit (AWS CDK).

AWS

AWS ML ML Data Preparation

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

From data collection and cleaning to feature engineering, model building, tuning, and deployment, ML projects often take months for developers to complete. And experienced data scientists can be hard to come by. This is where the AWS suite of low-code and no-code ML services becomes an essential tool.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. This same interface is also used for provisioning EMR clusters.

AWS

AWS Clustering Big Data Big Data

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 21, 2024

In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks.

AWS

AWS Data Preparation ML ML

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

Govern generative AI in the enterprise with Amazon SageMaker Canvas

AWS Machine Learning Blog

SEPTEMBER 23, 2024

Launched in 2021, Amazon SageMaker Canvas is a visual point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without writing any code. This is crucial for compliance, security, and governance.

ML

ML ML AWS AI

Improve prediction quality in custom classification models with Amazon Comprehend

AWS Machine Learning Blog

OCTOBER 5, 2023

We go through several steps, including data preparation, model creation, model performance metric analysis, and optimizing inference based on our analysis. We use an Amazon SageMaker notebook and the AWS Management Console to complete some of these steps. We will be using the Data-Preparation notebook.

Data Preparation

Data Preparation ML ML AWS

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

To address this challenge, AWS recently announced the preview of Amazon Bedrock Custom Model Import , a feature that you can use to import customized models created in other environments—such as Amazon SageMaker , Amazon Elastic Compute Cloud (Amazon EC2) instances, and on premises—into Amazon Bedrock.

SQL

SQL AWS ML ML

The Ultimate Guide to Data Preparation for Machine Learning

DagsHub

FEBRUARY 29, 2024

Data, is therefore, essential to the quality and performance of machine learning models. This makes data preparation for machine learning all the more critical, so that the models generate reliable and accurate predictions and drive business value for the organization. Why do you need Data Preparation for Machine Learning?

Data Preparation

Data Preparation Machine Learning Machine Learning Data Governance

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

This is a joint blog with AWS and Philips. Since 2014, the company has been offering customers its Philips HealthSuite Platform, which orchestrates dozens of AWS services that healthcare and life sciences companies use to improve patient care.

ML

ML AWS ML AI

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them.

SQL

SQL AWS Database Data Scientist

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 22, 2023

However, higher education institutions often lack ML professionals and data scientists. Amazon SageMaker Canvas is a low-code/no-code ML service that enables business analysts to perform data preparation and transformation, build ML models, and deploy these models into a governed workflow. Set up SageMaker Canvas.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

AWS Machine Learning Blog

AUGUST 4, 2023

Introducing Einstein Studio on Data Cloud Data Cloud is a data platform that provides businesses with real-time updates of their customer data from any touch point. With Einstein Studio, a gateway to AI tools on the data platform, admins and data scientists can effortlessly create models with a few clicks or using code.

AWS

AWS ML ML Data Scientist

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

Data scientists, ML engineers, IT staff, and DevOps teams must work together to operationalize models from research to deployment and maintenance. We create an automated model build pipeline that includes steps for data preparation, model training, model evaluation, and registration of the trained model in the SageMaker Model Registry.

AWS

AWS ML ML Data Preparation

Build a classification pipeline with Amazon Comprehend custom classification (Part I)

AWS Machine Learning Blog

SEPTEMBER 14, 2023

In first part of this multi-series blog post, you will learn how to create a scalable training pipeline and prepare training data for Comprehend Custom Classification models. We will introduce a custom classifier training pipeline that can be deployed in your AWS account with few clicks.

AWS

AWS Machine Learning Machine Learning Data Scientist

How Booking.com modernized its ML experimentation framework with Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 12, 2024

One of the several challenges faced was adapting the existing on-premises pipeline solution for use on AWS. The solution involved two key components: Modifying and extending existing code – The first part of our solution involved the modification and extension of our existing code to make it compatible with AWS infrastructure.

ML

ML ML AWS Machine Learning

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. Store your Snowflake account credentials in AWS Secrets Manager. Ingest the data in a table in your Snowflake account. AWS Region Link us-east-1 (N.

ML

ML ML AWS Python

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

AWS Machine Learning Blog

FEBRUARY 22, 2023

This post is co-written with Swagata Ashwani, Senior Data Scientist at Boomi. Boomi funded this solution using the AWS PE ML FastStart program, a customer enablement program meant to take ML-enabled solutions from idea to production in a matter of weeks. These tools integrate via API into Boomi’s core service offering.

AWS

AWS ML ML Data Science

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

AWS Machine Learning Blog

JULY 11, 2024

It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. CodeBuild supports a broad selection of git version control sources like AWS CodeCommit , GitHub, and GitLab.

AWS

AWS ML ML Python

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. Across 180 countries, millions of developers and hundreds of thousands of businesses use Twilio to create magical experiences for their customers.

ML

ML ML AWS Machine Learning

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

The first is by using low-code or no-code ML services such as Amazon SageMaker Canvas , Amazon SageMaker Data Wrangler , Amazon SageMaker Autopilot , and Amazon SageMaker JumpStart to help data analysts prepare data, build models, and generate predictions. We recognize that customers have different starting points.

ML

ML ML AWS Machine Learning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. To do this, we provide an AWS CloudFormation template to create a stack that contains the resources.

ML

ML ML AWS Data Warehouse

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

AWS Machine Learning Blog

MAY 23, 2023

We finish with a case study highlighting the benefits realize by a large AWS and PwC customer who implemented this solution. Solution overview AWS offers a comprehensive portfolio of cloud-native services for developing and running MLOps pipelines in a scalable and sustainable manner. The following diagram illustrates the workflow.

Machine Learning

Machine Learning Machine Learning AWS ML

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

AWS Machine Learning Blog

MARCH 10, 2023

Starting today, you can connect to Amazon EMR Hive as a big data query engine to bring in large datasets for ML. Aggregating and preparing large amounts of data is a critical part of ML workflow. Data scientists and data engineers use Apache Spark, Apache Hive, and Presto running on Amazon EMR for large-scale data processing.

Clustering

Clustering AWS ML ML

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Trending Sources

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

Webinars

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Using responsible AI principles with Amazon Bedrock Batch Inference

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Unlock proprietary data with Snorkel Flow and Amazon SageMaker

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Tackling AI’s data challenges with IBM databases on AWS

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Optimizing MLOps for Sustainability

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Deploy large language models for a healthtech use case on Amazon SageMaker

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

Govern generative AI in the enterprise with Amazon SageMaker Canvas

Improve prediction quality in custom classification models with Amazon Comprehend

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

The Ultimate Guide to Data Preparation for Machine Learning

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

Build a classification pipeline with Amazon Comprehend custom classification (Part I)

How Booking.com modernized its ML experimentation framework with Amazon SageMaker

Use Snowflake as a data source to train ML models with Amazon SageMaker

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Deliver your first ML use case in 8–12 weeks

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

Stay Connected