Blog, Data Preparation and ML - Data Science Current

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler.

Data Preparation

Data Preparation ML ML Data Quality

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

This year, generative AI and machine learning (ML) will again be in focus, with exciting keynote announcements and a variety of sessions showcasing insights from AWS experts, customer stories, and hands-on experiences with AWS services. Visit the session catalog to learn about all our generative AI and ML sessions.

AWS

AWS ML ML AI

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

AWS Machine Learning Blog

AUGUST 20, 2024

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. Charles holds an MS in Supply Chain Management and a PhD in Data Science.

Data Preparation

Data Preparation ML ML AWS

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

IBM Data Science in Practice

JANUARY 9, 2024

Many businesses are in different stages of their MAS AI/ML modernization journey. In this blog, we delve into 4 different “on-ramps” we created in a MAS Accelerator to offer a straightforward path to harnessing the power of AI in MAS, wherever you may be on your MAS AI/ML modernization journey.

ML

ML ML AI AI

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others. Let’s learn about the services we will use to make this happen.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

AWS Machine Learning Blog

FEBRUARY 1, 2024

It offers industry-leading scalability, data availability, security, and performance. SageMaker Canvas now supports comprehensive data preparation capabilities powered by SageMaker Data Wrangler. We also demonstrate using the chat for data prep feature in SageMaker Canvas to analyze the data and visualize your findings.

Data Preparation

Data Preparation AWS AI AI

Optimize data preparation with new features in AWS SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 4, 2023

Data preparation is a critical step in any data-driven project, and having the right tools can greatly enhance operational efficiency. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for machine learning (ML) from weeks to minutes.

Data Preparation

Data Preparation AWS ML ML

How Booking.com modernized its ML experimentation framework with Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 12, 2024

Sharing in-house resources with other internal teams, the Ranking team machine learning (ML) scientists often encountered long wait times to access resources for model training and experimentation – challenging their ability to rapidly experiment and innovate. If it shows online improvement, it can be deployed to all the users.

ML

ML ML AWS Machine Learning

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Now all you need is some guidance on generative AI and machine learning (ML) sessions to attend at this twelfth edition of re:Invent. In addition to several exciting announcements during keynotes, most of the sessions in our track will feature generative AI in one form or another, so we can truly call our track “Generative AI and ML.”

AWS

AWS ML ML AI

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

AWS

AWS ML ML Machine Learning

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 15, 2024

We’re excited to announce the release of SageMaker Core , a new Python SDK from Amazon SageMaker designed to offer an object-oriented approach for managing the machine learning (ML) lifecycle. With SageMaker Core, managing ML workloads on SageMaker becomes simpler and more efficient. and above.

Python

Python AWS ML ML

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production. Challenges Customers may face several challenges when implementing machine learning (ML) solutions.

ML

ML ML AWS Machine Learning

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. We add this data to Snowflake as a new table.

ML

ML ML AWS Python

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

In these scenarios, as you start to embrace generative AI, large language models (LLMs) and machine learning (ML) technologies as a core part of your business, you may be looking for options to take advantage of AWS AI and ML capabilities outside of AWS in a multicloud environment.

ML

ML ML Azure AWS

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. The next step is to build ML models using features selected from one or multiple feature groups.

ML

ML ML AWS Data Warehouse

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. Photo by Myriam Jessier on Unsplash To set the stage, let’s examine the nuances between research-phase data and production-phase data. Data is a key differentiator in ML projects (more on this in my blog post below).

ML

ML ML Data Preparation Data Engineering

Improve prediction quality in custom classification models with Amazon Comprehend

AWS Machine Learning Blog

OCTOBER 5, 2023

Artificial intelligence (AI) and machine learning (ML) have seen widespread adoption across enterprise and government organizations. Processing unstructured data has become easier with the advancements in natural language processing (NLP) and user-friendly AI/ML services like Amazon Textract , Amazon Transcribe , and Amazon Comprehend.

Data Preparation

Data Preparation ML ML AWS

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

The ability to quickly build and deploy machine learning (ML) models is becoming increasingly important in today’s data-driven world. However, building ML models requires significant time, effort, and specialized expertise. This is where the AWS suite of low-code and no-code ML services becomes an essential tool.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

The Ultimate Guide to Data Preparation for Machine Learning

DagsHub

FEBRUARY 29, 2024

Data, is therefore, essential to the quality and performance of machine learning models. This makes data preparation for machine learning all the more critical, so that the models generate reliable and accurate predictions and drive business value for the organization. Why do you need Data Preparation for Machine Learning?

Data Preparation

Data Preparation Machine Learning Machine Learning Data Governance

Using responsible AI principles with Amazon Bedrock Batch Inference

AWS Machine Learning Blog

NOVEMBER 21, 2024

Have an S3 bucket to store your data prepared for batch inference. Have an AWS Identity and Access Management (IAM) role for batch inference with a trust policy and Amazon S3 access (read access to the folder containing input data and write access to the folder storing output data).

AI

AI AI AWS Data Preparation

Scale training and inference of thousands of ML models with Amazon SageMaker

AWS Machine Learning Blog

AUGUST 3, 2023

As machine learning (ML) becomes increasingly prevalent in a wide range of industries, organizations are finding the need to train and serve large numbers of ML models to meet the diverse needs of their customers. Here, the checkpoints need to be saved in a pre-specified location, with the default being /opt/ml/checkpoints.

ML

ML ML AWS Python

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone makes it straightforward for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate to derive data-driven insights.

Machine Learning

Machine Learning Machine Learning Data Governance ML

LLMOps demystified: Why it’s crucial and best practices for 2023

Data Science Dojo

AUGUST 28, 2023

Some projects may necessitate a comprehensive LLMOps approach, spanning tasks from data preparation to pipeline production. Exploratory Data Analysis (EDA) Data collection: The first step in LLMOps is to collect the data that will be used to train the LLM.

Exploratory Data Analysis

Exploratory Data Analysis Data Preparation Machine Learning Machine Learning

ML Model Packaging [The Ultimate Guide]

The MLOps Blog

APRIL 5, 2023

In this comprehensive guide, we’ll explore the key concepts, challenges, and best practices for ML model packaging, including the different types of packaging formats, techniques, and frameworks. Best practices for ml model packaging Here is how you can package a model efficiently.

ML

ML ML Machine Learning Machine Learning

Tune ML models for additional objectives like fairness with SageMaker Automatic Model Tuning

AWS Machine Learning Blog

FEBRUARY 27, 2023

Model tuning is the experimental process of finding the optimal parameters and configurations for a machine learning (ML) model that result in the best possible desired outcome with a validation dataset. Single objective optimization with a performance metric is the most common approach for tuning ML models.

ML

ML ML AWS Machine Learning

Optimizing MLOps for Sustainability

AWS Machine Learning Blog

SEPTEMBER 11, 2024

Machine learning operations (MLOps) are a set of practices that automate and simplify machine learning (ML) workflows and deployments. AWS published Guidance for Optimizing MLOps for Sustainability on AWS to help customers maximize utilization and minimize waste in their ML workloads.

AWS

AWS Data Preparation ML ML

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

jpg", "prompt": "Which part of Virginia is this letter sent from", "completion": "Richmond"} SageMaker JumpStart SageMaker JumpStart is a powerful feature within the SageMaker machine learning (ML) environment that provides ML practitioners a comprehensive hub of publicly available and proprietary foundation models (FMs).

ML

ML ML Python AWS

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Second, because data, code, and other development artifacts like machine learning (ML) models are stored within different services, it can be cumbersome for users to understand how they interact with each other and make changes. With the SQL editor, you can query data lakes, databases, data warehouses, and federated data sources.

SQL

SQL AWS Data Lakes AI

6 AI tools revolutionizing data analysis: Unleashing the best in business

Data Science Dojo

JULY 17, 2023

It is a powerful tool that can be used to automate many of the tasks involved in data analysis, and it can also help businesses to discover new insights from their data. It has a wide range of machine 6: Tableau Tableau is a data visualization software platform that can be used to create interactive dashboards and reports.

Data Analysis

Data Analysis Data Analysis Tableau Machine Learning

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

Train a recommendation model in SageMaker Studio using training data that was prepared using SageMaker Data Wrangler. The real-time inference call data is first passed to the SageMaker Data Wrangler container in the inference pipeline, where it is preprocessed and passed to the trained model for product recommendation.

ML

ML ML AWS AI

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

For readers who work in ML/AI, it’s well understood that machine learning models prefer feature vectors of numerical information. In this blog, we propose GraphReduce as an abstraction for these problems. Data preparation happens at the entity-level first so errors and anomalies don’t make their way into the aggregated dataset.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Instead, businesses tend to rely on advanced tools and strategies—namely artificial intelligence for IT operations (AIOps) and machine learning operations (MLOps)—to turn vast quantities of data into actionable insights that can improve IT decision-making and ultimately, the bottom line.

Big Data

Big Data Big Data ML ML

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. SageMaker is a fully managed service for building, training, and deploying ML models.

Machine Learning

Machine Learning Machine Learning ML ML

Experience the new and improved Amazon SageMaker Studio

AWS Machine Learning Blog

DECEMBER 1, 2023

Launched in 2019, Amazon SageMaker Studio provides one place for all end-to-end machine learning (ML) workflows, from data preparation, building and experimentation, training, hosting, and monitoring. About the Authors Mair Hasco is an AI/ML Specialist for Amazon SageMaker Studio. Get started on SageMaker Studio here.

ML

ML ML Machine Learning Machine Learning

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. ML models don’t operate in isolation. This necessitates considering the entire ML lifecycle during design and development.

ML

ML ML AWS Machine Learning

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

AWS Machine Learning Blog

MARCH 10, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Starting today, you can connect to Amazon EMR Hive as a big data query engine to bring in large datasets for ML. LDAP No Auth Choose Next.

Clustering

Clustering AWS ML ML

Building ML Platform in Retail and eCommerce

The MLOps Blog

MAY 31, 2023

And eCommerce companies have a ton of use cases where ML can help. The problem is, with more ML models and systems in production, you need to set up more infrastructure to reliably manage everything. And because of that, many companies decide to centralize this effort in an internal ML platform. But how to build it?

ML

ML ML Algorithm Machine Learning

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

Iguazio

JULY 31, 2023

MLOps accelerates the ML model deployment process to make it more efficient and scalable. In this blog post, we detail the steps you need to take to build and run a successful MLOps pipeline. An extension of DevOps, MLOps streamlines and monitors ML workflows. Looking to improve your MLOps knowledge and processes?

ML

ML ML Machine Learning Machine Learning

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Solution overview Amazon SageMaker is a fully managed service that helps developers and data scientists build, train, and deploy machine learning (ML) models. Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images.

AWS

AWS Data Lakes ML ML

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 6, 2024

To support overarching pharmacovigilance activities, our pharmaceutical customers want to use the power of machine learning (ML) to automate the adverse event detection from various data sources, such as social media feeds, phone calls, emails, and handwritten notes, and trigger appropriate actions.

AWS

AWS ML ML Data Preparation

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

MPII is using a machine learning (ML) bid optimization engine to inform upstream decision-making processes in power asset management and trading. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. Data comes from disparate sources in a number of formats.

AWS

AWS Machine Learning Machine Learning Analytics

Principles of MLOps

Heartbeat

FEBRUARY 1, 2023

Machine learning has become an essential part of our lives because we interact with various applications of ML models, whether consciously or unconsciously. Machine Learning Operations (MLOps) are the aspects of ML that deal with the creation and advancement of these models. They might also help with data preparation and cleaning.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

Accelerate data preparation for ML in Amazon SageMaker Canvas

Your guide to generative AI and ML at AWS re:Invent 2024

Webinars

Trending Sources

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Webinars

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

Optimize data preparation with new features in AWS SageMaker Data Wrangler

How Booking.com modernized its ML experimentation framework with Amazon SageMaker

Your guide to generative AI and ML at AWS re:Invent 2023

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

Deliver your first ML use case in 8–12 weeks

Use Snowflake as a data source to train ML models with Amazon SageMaker

Train and deploy ML models in a multicloud environment using Amazon SageMaker

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Data4ML Preparation Guidelines (Beyond The Basics)

Improve prediction quality in custom classification models with Amazon Comprehend

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

The Ultimate Guide to Data Preparation for Machine Learning

Using responsible AI principles with Amazon Bedrock Batch Inference

Scale training and inference of thousands of ML models with Amazon SageMaker

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

LLMOps demystified: Why it’s crucial and best practices for 2023

ML Model Packaging [The Ultimate Guide]

Tune ML models for additional objectives like fairness with SageMaker Automatic Model Tuning

Optimizing MLOps for Sustainability

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

6 AI tools revolutionizing data analysis: Unleashing the best in business

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

GraphReduce: Using Graphs for Feature Engineering Abstractions

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Experience the new and improved Amazon SageMaker Studio

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

Building ML Platform in Retail and eCommerce

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Deploy large language models for a healthtech use case on Amazon SageMaker

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Principles of MLOps

MLOps Landscape in 2023: Top Tools and Platforms

Stay Connected