AWS, Data Preparation and Data Science

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

The excitement is building for the fourteenth edition of AWS re:Invent, and as always, Las Vegas is set to host this spectacular event. The sessions showcase how Amazon Q can help you streamline coding, testing, and troubleshooting, as well as enable you to make the most of your data to optimize business operations.

AWS

AWS ML ML AI

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

How to Learn Machine Learning

APRIL 26, 2025

The field of data science is now one of the most preferred and lucrative career options available in the area of data because of the increasing dependence on data for decision-making in businesses, which makes the demand for data science hires peak.

Data Science

Data Science Data Analyst Data Scientist Machine Learning

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. Within the data flow, add an Amazon S3 destination node.

Data Preparation

Data Preparation ML ML Data Quality

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

AWS Machine Learning Blog

AUGUST 20, 2024

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. About the Authors Charles Laughlin is a Principal AI Specialist at Amazon Web Services (AWS).

Data Preparation

Data Preparation ML ML AWS

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

AWS

AWS ML ML Machine Learning

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

In addition to its groundbreaking AI innovations, Zeta Global has harnessed Amazon Elastic Container Service (Amazon ECS) with AWS Fargate to deploy a multitude of smaller models efficiently. It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines.

AWS

AWS Machine Learning Machine Learning ML

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Manager Data Science at Marubeni Power International. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. Therefore, the ingestion components need to be able to manage authentication, data sourcing in pull mode, data preprocessing, and data storage.

AWS

AWS Machine Learning Machine Learning Analytics

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. AWS CodeBuild is a fully managed continuous integration service in the cloud.

AWS

AWS ML ML Machine Learning

Enable single sign-on access of Amazon SageMaker Canvas using AWS IAM Identity Center: Part 2

AWS Machine Learning Blog

APRIL 2, 2024

It does so by covering the end-to-end ML workflow: whether you’re looking for powerful data preparation and AutoML, managed endpoint deployment, simplified MLOps capabilities, or the ability to configure foundation models for generative AI , SageMaker Canvas can help you achieve your goals. Choose Enable with AWS Organizations.

AWS

AWS ML ML Machine Learning

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images. The model was then fine-tuned with training data from the data preparation stage. About the authors Scott Patterson is a Senior Solutions Architect at AWS.

AWS

AWS Data Lakes ML ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Becoming Human

MAY 15, 2023

Data Science is a popular as well as vast field; till date, there are a lot of opportunities in this field, and most people, whether they are working professionals or students, everyone want a transition in data science because of its scope. How much to learn? What to do next?

Data Science

Data Science Machine Learning Machine Learning Database

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker. The ETL pipeline, MLOps pipeline, and ML inference should be rebuilt in a different AWS account.

AWS

AWS ML ML ETL

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS.

ML

ML ML AWS AI

5 Top Large Language Models & Generative AI Books

Towards AI

AUGUST 6, 2024

Build a Large Language Model (From Scratch) by Sebastian Raschka provides a comprehensive guide to constructing LLMs, from data preparation to fine-tuning. Generative AI on AWS by Chris Fregly and team demystifies generative AI integration into business, emphasizing model selection and deployment on AWS.

Natural Language Processing

Natural Language Processing AI AI AWS

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 12, 2023

Scalable Capital’s data science and client service teams identified that one of the largest bottlenecks in servicing our clients was responding to email inquiries. The following diagram shows the workflow for our email classifier project, but can also be generalized to other data science projects.

Data Science

Data Science Data Scientist AWS ML

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

Through ML EBA, experienced AWS ML subject matter experts work side by side with your cross-functional team to provide prescriptive guidance, remove blockers, and build organizational capability for a continued ML adoption. Additionally, AWS can offer financial incentives to help offset the costs for your first ML use case.

ML

ML ML AWS Machine Learning

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

AWS Machine Learning Blog

FEBRUARY 22, 2023

Boomi funded this solution using the AWS PE ML FastStart program, a customer enablement program meant to take ML-enabled solutions from idea to production in a matter of weeks. The data science team at Boomi applied the Markov Chain approach to the Step Suggest problem by treating integration steps as states in a state machine.

AWS

AWS ML ML Data Science

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

This is a joint blog with AWS and Philips. Since 2014, the company has been offering customers its Philips HealthSuite Platform, which orchestrates dozens of AWS services that healthcare and life sciences companies use to improve patient care.

ML

ML ML AWS AI

Predicting the Future of Data Science

Pickl AI

DECEMBER 4, 2024

Summary: The future of Data Science is shaped by emerging trends such as advanced AI and Machine Learning, augmented analytics, and automated processes. As industries increasingly rely on data-driven insights, ethical considerations regarding data privacy and bias mitigation will become paramount.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 6, 2024

In this solution, we fine-tune a variety of models on Hugging Face that were pre-trained on medical data and use the BioBERT model, which was pre-trained on the Pubmed dataset and performs the best out of those tried. We implemented the solution using the AWS Cloud Development Kit (AWS CDK).

AWS

AWS ML ML Data Preparation

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. This same interface is also used for provisioning EMR clusters.

AWS

AWS Clustering Big Data Big Data

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

AWS Machine Learning Blog

JULY 11, 2024

It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. CodeBuild supports a broad selection of git version control sources like AWS CodeCommit , GitHub, and GitLab.

AWS

AWS ML ML Python

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is a cloud data platform that provides data solutions for data warehousing to data science. Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics.

AWS

AWS Data Preparation Azure Data Scientist

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. Across 180 countries, millions of developers and hundreds of thousands of businesses use Twilio to create magical experiences for their customers.

ML

ML ML AWS Machine Learning

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

Building a production-ready solution in AWS involves a series of trade-offs between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS.

AWS

AWS ML ML Machine Learning

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

To address this challenge, AWS recently announced the preview of Amazon Bedrock Custom Model Import , a feature that you can use to import customized models created in other environments—such as Amazon SageMaker , Amazon Elastic Compute Cloud (Amazon EC2) instances, and on premises—into Amazon Bedrock.

SQL

SQL AWS ML ML

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

The built-in project templates provided by Amazon SageMaker include integration with some of third-party tools, such as Jenkins for orchestration and GitHub for source control, and several utilize AWS native CI/CD tools such as AWS CodeCommit , AWS CodePipeline , and AWS CodeBuild. all implemented via CloudFormation.

AWS

AWS ML ML Data Preparation

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

We create a custom training container that downloads data directly from the Snowflake table into the training instance rather than first downloading the data into an S3 bucket. Store your Snowflake account credentials in AWS Secrets Manager. Ingest the data in a table in your Snowflake account.

ML

ML ML AWS Python

Advanced RAG patterns on Amazon SageMaker

AWS Machine Learning Blog

MARCH 28, 2024

For more information on Mixtral-8x7B Instruct on AWS, refer to Mixtral-8x7B is now available in Amazon SageMaker JumpStart. Before you get started with the solution, create an AWS account. This identity is called the AWS account root user. For more detailed steps to prepare the data, refer to the GitHub repo.

AWS

AWS Machine Learning Machine Learning AI

Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 22, 2023

We’re excited to announce Amazon SageMaker Data Wrangler support for Amazon S3 Access Points. Solution Overview Imagine you, as an administrator, have to manage data for multiple data science teams running their own data preparation workflows in SageMaker Data Wrangler. Create an S3 access point.

AWS

AWS Data Science Data Preparation Artificial Intelligence

Exploring data using AI chat at Domo with Amazon Bedrock

AWS Machine Learning Blog

SEPTEMBER 9, 2024

However, companies can face challenges when using generative AI for data insights, including maintaining data quality, addressing privacy concerns, managing model biases, and integrating AI systems with existing workflows. Domo is a cloud-centered data experiences innovator that empowers users to make data-driven decisions.

AI

AI AI AWS ML

Experience the new and improved Amazon SageMaker Studio

AWS Machine Learning Blog

DECEMBER 1, 2023

Launched in 2019, Amazon SageMaker Studio provides one place for all end-to-end machine learning (ML) workflows, from data preparation, building and experimentation, training, hosting, and monitoring. Lauren Mullennex is a Senior AI/ML Specialist Solutions Architect at AWS. In his spare time, he loves traveling and writing.

ML

ML ML Machine Learning Machine Learning

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

For example, you might have acquired a company that was already running on a different cloud provider, or you may have a workload that generates value from unique capabilities provided by AWS. We show how you can build and train an ML model in AWS and deploy the model in another platform.

ML

ML ML Azure AWS

Harnessing Machine Learning on Big Data with PySpark on AWS

ODSC - Open Data Science

AUGUST 9, 2023

Be sure to check out his talk, “ Build Classification and Regression Models with Spark on AWS ,” there! In the unceasingly dynamic arena of data science, discerning and applying the right instruments can significantly shape the outcomes of your machine learning initiatives. A cordial greeting to all data science enthusiasts!

Machine Learning

Machine Learning Machine Learning AWS Big Data

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. To do this, we provide an AWS CloudFormation template to create a stack that contains the resources.

ML

ML ML AWS Data Warehouse

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

AWS Machine Learning Blog

MAY 23, 2023

We finish with a case study highlighting the benefits realize by a large AWS and PwC customer who implemented this solution. Solution overview AWS offers a comprehensive portfolio of cloud-native services for developing and running MLOps pipelines in a scalable and sustainable manner. The following diagram illustrates the workflow.

Machine Learning

Machine Learning Machine Learning AWS ML

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

IAM role – SageMaker requires an AWS Identity and Access Management (IAM) role to be assigned to a SageMaker Studio domain or user profile to manage permissions effectively. An execution role update may be required to bring in data browsing and the SQL run feature. You need to create AWS Glue connections with specific connection types.

SQL

SQL AWS Database Data Scientist

What is MLOps

Towards AI

AUGUST 16, 2023

Therefore, a common mistake when interviewing applicants is to focus on the minutia of a particular platform (AWS, GCP, Databricks, MLflow, etc.). Every data science team develops its own approach for each ML library that is used, so the link between the model and the code and parameters is often lost. References [1] J.

Machine Learning

Machine Learning Machine Learning ML ML

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 2: SageMaker notebooks and Studio

AWS Machine Learning Blog

MAY 30, 2023

In 2021, we launched AWS Support Proactive Services as part of the AWS Enterprise Support offering. In Part 1 , we showed how to get started using AWS Cost Explorer to identify cost optimization opportunities in SageMaker. You can build custom queries to look up AWS CUR data using standard SQL.

AWS

AWS ML ML EDA

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Flipboard

MARCH 7, 2023

Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity. You can manage app images via the SageMaker console, the AWS SDK for Python (Boto3), and the AWS Command Line Interface (AWS CLI). Environments without internet access.

Python

Python AWS ML ML

Achieve effective business outcomes with no-code machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 29, 2023

With Canvas, you can take ML mainstream throughout your organization so business analysts without data science or ML experience can use accurate ML predictions to make data-driven decisions. This means empowering business analysts to use ML on their own, without depending on data science teams. Happy innovating!

Machine Learning

Machine Learning Machine Learning ML ML

Your guide to generative AI and ML at AWS re:Invent 2024

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

Webinars

Trending Sources

Accelerate data preparation for ML in Amazon SageMaker Canvas

Webinars

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Enable single sign-on access of Amazon SageMaker Canvas using AWS IAM Identity Center: Part 2

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Roadmap to Learn Data Science for Beginners and Freshers in 2023

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

5 Top Large Language Models & Generative AI Books

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

Deliver your first ML use case in 8–12 weeks

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

Predicting the Future of Data Science

Deploy large language models for a healthtech use case on Amazon SageMaker

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Build well-architected IDP solutions with a custom lens – Part 2: Security

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

Use Snowflake as a data source to train ML models with Amazon SageMaker

Advanced RAG patterns on Amazon SageMaker

Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler

Exploring data using AI chat at Domo with Amazon Bedrock

Experience the new and improved Amazon SageMaker Studio

Train and deploy ML models in a multicloud environment using Amazon SageMaker

Harnessing Machine Learning on Big Data with PySpark on AWS

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

What is MLOps

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 2: SageMaker notebooks and Studio

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Achieve effective business outcomes with no-code machine learning using Amazon SageMaker Canvas

Stay Connected