AWS, Data Preparation and Data Science

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

The excitement is building for the fourteenth edition of AWS re:Invent, and as always, Las Vegas is set to host this spectacular event. The sessions showcase how Amazon Q can help you streamline coding, testing, and troubleshooting, as well as enable you to make the most of your data to optimize business operations.

AWS

AWS ML ML AI

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. Within the data flow, add an Amazon S3 destination node.

Data Preparation

Data Preparation ML ML Data Quality

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

AWS Machine Learning Blog

AUGUST 20, 2024

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. About the Authors Charles Laughlin is a Principal AI Specialist at Amazon Web Services (AWS).

Data Preparation

Data Preparation ML ML AWS

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

AWS

AWS ML ML Machine Learning

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Manager Data Science at Marubeni Power International. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. Therefore, the ingestion components need to be able to manage authentication, data sourcing in pull mode, data preprocessing, and data storage.

AWS

AWS Machine Learning Machine Learning Analytics

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. AWS CodeBuild is a fully managed continuous integration service in the cloud.

AWS

AWS ML ML Machine Learning

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images. The model was then fine-tuned with training data from the data preparation stage. About the authors Scott Patterson is a Senior Solutions Architect at AWS.

AWS

AWS Data Lakes ML ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Becoming Human

MAY 15, 2023

Data Science is a popular as well as vast field; till date, there are a lot of opportunities in this field, and most people, whether they are working professionals or students, everyone want a transition in data science because of its scope. How much to learn? What to do next?

Data Science

Data Science Machine Learning Machine Learning Database

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS.

ML

ML ML AWS AI

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker. The ETL pipeline, MLOps pipeline, and ML inference should be rebuilt in a different AWS account.

AWS

AWS ML ML ETL

5 Top Large Language Models & Generative AI Books

Towards AI

AUGUST 6, 2024

Build a Large Language Model (From Scratch) by Sebastian Raschka provides a comprehensive guide to constructing LLMs, from data preparation to fine-tuning. Generative AI on AWS by Chris Fregly and team demystifies generative AI integration into business, emphasizing model selection and deployment on AWS.

Natural Language Processing

Natural Language Processing AI AI AWS

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

Through ML EBA, experienced AWS ML subject matter experts work side by side with your cross-functional team to provide prescriptive guidance, remove blockers, and build organizational capability for a continued ML adoption. Additionally, AWS can offer financial incentives to help offset the costs for your first ML use case.

ML

ML ML AWS Machine Learning

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is a cloud data platform that provides data solutions for data warehousing to data science. Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics.

AWS

AWS Data Preparation Azure Data Scientist

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

AWS Machine Learning Blog

JULY 11, 2024

It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. CodeBuild supports a broad selection of git version control sources like AWS CodeCommit , GitHub, and GitLab.

AWS

AWS ML ML Python

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. This same interface is also used for provisioning EMR clusters.

AWS

AWS Clustering Big Data Big Data

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

This is a joint blog with AWS and Philips. Since 2014, the company has been offering customers its Philips HealthSuite Platform, which orchestrates dozens of AWS services that healthcare and life sciences companies use to improve patient care.

ML

ML ML AWS AI

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. Across 180 countries, millions of developers and hundreds of thousands of businesses use Twilio to create magical experiences for their customers.

ML

ML ML AWS Machine Learning

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 6, 2024

In this solution, we fine-tune a variety of models on Hugging Face that were pre-trained on medical data and use the BioBERT model, which was pre-trained on the Pubmed dataset and performs the best out of those tried. We implemented the solution using the AWS Cloud Development Kit (AWS CDK).

AWS

AWS ML ML Data Preparation

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

AWS Machine Learning Blog

FEBRUARY 22, 2023

Boomi funded this solution using the AWS PE ML FastStart program, a customer enablement program meant to take ML-enabled solutions from idea to production in a matter of weeks. The data science team at Boomi applied the Markov Chain approach to the Step Suggest problem by treating integration steps as states in a state machine.

AWS

AWS ML ML Data Science

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

The built-in project templates provided by Amazon SageMaker include integration with some of third-party tools, such as Jenkins for orchestration and GitHub for source control, and several utilize AWS native CI/CD tools such as AWS CodeCommit , AWS CodePipeline , and AWS CodeBuild. all implemented via CloudFormation.

AWS

AWS ML ML Data Preparation

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

To address this challenge, AWS recently announced the preview of Amazon Bedrock Custom Model Import , a feature that you can use to import customized models created in other environments—such as Amazon SageMaker , Amazon Elastic Compute Cloud (Amazon EC2) instances, and on premises—into Amazon Bedrock.

SQL

SQL AWS ML ML

Advanced RAG patterns on Amazon SageMaker

AWS Machine Learning Blog

MARCH 28, 2024

For more information on Mixtral-8x7B Instruct on AWS, refer to Mixtral-8x7B is now available in Amazon SageMaker JumpStart. Before you get started with the solution, create an AWS account. This identity is called the AWS account root user. For more detailed steps to prepare the data, refer to the GitHub repo.

AWS

AWS Machine Learning Machine Learning AI

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

We create a custom training container that downloads data directly from the Snowflake table into the training instance rather than first downloading the data into an S3 bucket. Store your Snowflake account credentials in AWS Secrets Manager. Ingest the data in a table in your Snowflake account.

ML

ML ML AWS Python

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

Building a production-ready solution in AWS involves a series of trade-offs between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS.

AWS

AWS ML ML Machine Learning

Harnessing Machine Learning on Big Data with PySpark on AWS

ODSC - Open Data Science

AUGUST 9, 2023

Be sure to check out his talk, “ Build Classification and Regression Models with Spark on AWS ,” there! In the unceasingly dynamic arena of data science, discerning and applying the right instruments can significantly shape the outcomes of your machine learning initiatives. A cordial greeting to all data science enthusiasts!

Machine Learning

Machine Learning Machine Learning AWS Big Data

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. To do this, we provide an AWS CloudFormation template to create a stack that contains the resources.

ML

ML ML AWS Data Warehouse

Experience the new and improved Amazon SageMaker Studio

AWS Machine Learning Blog

DECEMBER 1, 2023

Launched in 2019, Amazon SageMaker Studio provides one place for all end-to-end machine learning (ML) workflows, from data preparation, building and experimentation, training, hosting, and monitoring. Lauren Mullennex is a Senior AI/ML Specialist Solutions Architect at AWS. In his spare time, he loves traveling and writing.

ML

ML ML Machine Learning Machine Learning

Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 22, 2023

We’re excited to announce Amazon SageMaker Data Wrangler support for Amazon S3 Access Points. Solution Overview Imagine you, as an administrator, have to manage data for multiple data science teams running their own data preparation workflows in SageMaker Data Wrangler. Create an S3 access point.

AWS

AWS Data Science Data Preparation Artificial Intelligence

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

For example, you might have acquired a company that was already running on a different cloud provider, or you may have a workload that generates value from unique capabilities provided by AWS. We show how you can build and train an ML model in AWS and deploy the model in another platform.

ML

ML ML Azure AWS

What is MLOps

Towards AI

AUGUST 16, 2023

Therefore, a common mistake when interviewing applicants is to focus on the minutia of a particular platform (AWS, GCP, Databricks, MLflow, etc.). Every data science team develops its own approach for each ML library that is used, so the link between the model and the code and parameters is often lost. References [1] J.

Machine Learning

Machine Learning Machine Learning ML ML

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

IAM role – SageMaker requires an AWS Identity and Access Management (IAM) role to be assigned to a SageMaker Studio domain or user profile to manage permissions effectively. An execution role update may be required to bring in data browsing and the SQL run feature. You need to create AWS Glue connections with specific connection types.

SQL

SQL AWS Database Data Scientist

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Flipboard

MARCH 7, 2023

Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity. You can manage app images via the SageMaker console, the AWS SDK for Python (Boto3), and the AWS Command Line Interface (AWS CLI). Environments without internet access.

Python

Python AWS ML ML

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

AWS Machine Learning Blog

MAY 23, 2023

We finish with a case study highlighting the benefits realize by a large AWS and PwC customer who implemented this solution. Solution overview AWS offers a comprehensive portfolio of cloud-native services for developing and running MLOps pipelines in a scalable and sustainable manner. The following diagram illustrates the workflow.

Machine Learning

Machine Learning Machine Learning AWS ML

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 2: SageMaker notebooks and Studio

AWS Machine Learning Blog

MAY 30, 2023

In 2021, we launched AWS Support Proactive Services as part of the AWS Enterprise Support offering. In Part 1 , we showed how to get started using AWS Cost Explorer to identify cost optimization opportunities in SageMaker. You can build custom queries to look up AWS CUR data using standard SQL.

AWS

AWS ML ML EDA

Is your model good? A deep dive into Amazon SageMaker Canvas advanced metrics

AWS Machine Learning Blog

JULY 31, 2023

We explain the metrics and show techniques to deal with data to obtain better model performance. Prerequisites If you would like to implement all or some of the tasks described in this post, you need an AWS account with access to SageMaker Canvas. Let’s try to improve the model performance using a data-centric approach.

ML

ML ML Data Preparation Machine Learning

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

The following steps give an overview of how to use the new capabilities launched in SageMaker for Salesforce to enable the overall integration: Set up the Amazon SageMaker Studio domain and OAuth between Salesforce and the AWS account s. Select Other type of secret. Save the secret and note the ARN of the secret.

ML

ML ML AWS AI

Fine-tune large multimodal models using Amazon SageMaker

AWS Machine Learning Blog

MAY 29, 2024

Figure 1: LLaVA architecture Prepare data When it comes to fine-tuning the LLaVA model for specific tasks or domains, data preparation is of paramount importance because having high-quality, comprehensive annotations enables the model to learn rich representations and achieve human-level performance on complex visual reasoning challenges.

ML

ML ML AWS Data Visualization

Fine-tune Whisper models on Amazon SageMaker with LoRA

AWS Machine Learning Blog

NOVEMBER 16, 2023

Additional model training benefits can include lower training costs with Managed Spot Training, distributed training libraries to split models and training datasets across AWS GPU instances, and more. input_ids return batch #apply the data preparation function to all of our fine-tuning dataset samples using dataset's.map method.

AWS

AWS ML ML Computer Science

Build an email spam detector using Amazon SageMaker

AWS Machine Learning Blog

JULY 18, 2023

Load the data in an Amazon SageMaker Studio notebook. Prepare the data for the model. Prerequisites Before diving into this use case, complete the following prerequisites: Set up an AWS account. If you are prompted to choose a Kernel, choose the Python 3 (Data Science 3.0) Train, deploy, and test the model.

Supervised Learning

Supervised Learning Algorithm Natural Language Processing AWS

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

AI

AI AI ML ML

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Here, we’ll discuss the key differences between AIOps and MLOps and how they each help teams and businesses address different IT and data science challenges. MLOps prioritizes end-to-end management of machine learning models, encompassing data preparation, model training, hyperparameter tuning and validation.

Big Data

Big Data Big Data ML ML

Your guide to generative AI and ML at AWS re:Invent 2024

Accelerate data preparation for ML in Amazon SageMaker Canvas

Webinars

Trending Sources

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Roadmap to Learn Data Science for Beginners and Freshers in 2023

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

5 Top Large Language Models & Generative AI Books

Deliver your first ML use case in 8–12 weeks

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Deploy large language models for a healthtech use case on Amazon SageMaker

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Advanced RAG patterns on Amazon SageMaker

Use Snowflake as a data source to train ML models with Amazon SageMaker

Build well-architected IDP solutions with a custom lens – Part 2: Security

Harnessing Machine Learning on Big Data with PySpark on AWS

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Experience the new and improved Amazon SageMaker Studio

Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler

Train and deploy ML models in a multicloud environment using Amazon SageMaker

What is MLOps

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 2: SageMaker notebooks and Studio

Is your model good? A deep dive into Amazon SageMaker Canvas advanced metrics

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

Fine-tune large multimodal models using Amazon SageMaker

Fine-tune Whisper models on Amazon SageMaker with LoRA

Build an email spam detector using Amazon SageMaker

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Stay Connected