Data Engineering, Data Preparation and ML

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler.

Data Preparation

Data Preparation ML ML Data Quality

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others. Let’s learn about the services we will use to make this happen.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

With over 50 connectors, an intuitive Chat for data prep interface, and petabyte support, SageMaker Canvas provides a scalable, low-code/no-code (LCNC) ML solution for handling real-world, enterprise use cases. Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data.

ML

ML ML Data Preparation AWS

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

With that, the need for data scientists and machine learning (ML) engineers has grown significantly. Data scientists and ML engineers require capable tooling and sufficient compute for their work. Data scientists and ML engineers require capable tooling and sufficient compute for their work.

ML

ML ML AWS AI

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production. Challenges Customers may face several challenges when implementing machine learning (ML) solutions.

ML

ML ML AWS Machine Learning

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

In these scenarios, as you start to embrace generative AI, large language models (LLMs) and machine learning (ML) technologies as a core part of your business, you may be looking for options to take advantage of AWS AI and ML capabilities outside of AWS in a multicloud environment.

ML

ML ML Azure AWS

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

The ability to quickly build and deploy machine learning (ML) models is becoming increasingly important in today’s data-driven world. However, building ML models requires significant time, effort, and specialized expertise. This is where the AWS suite of low-code and no-code ML services becomes an essential tool.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. Photo by Myriam Jessier on Unsplash To set the stage, let’s examine the nuances between research-phase data and production-phase data. Data is a key differentiator in ML projects (more on this in my blog post below).

ML

ML ML Data Preparation Data Engineering

How Creating Training-ready Datasets Faster Can Unleash ML Teams’ Productivity

DagsHub

AUGUST 2, 2023

ML teams have a very important core purpose in their organizations - delivering high-quality, reliable models, fast. With users’ productivity in mind, at DagHub we aimed for a solution that will provide ML teams with the whole process out of the box and with no extra effort. Let’s explain how the Data Engine helps teams do just that.

ML

ML ML Data Engineering Data Engineer

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

How to Learn Machine Learning

APRIL 26, 2025

The field of data science is now one of the most preferred and lucrative career options available in the area of data because of the increasing dependence on data for decision-making in businesses, which makes the demand for data science hires peak. Their insights must be in line with real-world goals.

Data Science

Data Science Data Analyst Data Scientist Machine Learning

Experience the new and improved Amazon SageMaker Studio

AWS Machine Learning Blog

DECEMBER 1, 2023

Launched in 2019, Amazon SageMaker Studio provides one place for all end-to-end machine learning (ML) workflows, from data preparation, building and experimentation, training, hosting, and monitoring. About the Authors Mair Hasco is an AI/ML Specialist for Amazon SageMaker Studio. Get started on SageMaker Studio here.

ML

ML ML Machine Learning Machine Learning

Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio

AWS Machine Learning Blog

APRIL 3, 2024

Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. SageMaker Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity.

Machine Learning

Machine Learning Machine Learning ML ML

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Organizations are building data-driven applications to guide business decisions, improve agility, and drive innovation. Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services.

SQL

SQL AWS Data Lakes AI

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

AWS Machine Learning Blog

JANUARY 6, 2023

The vendors evaluated for this MarketScape offer various software tools needed to support end-to-end machine learning (ML) model development, including data preparation, model building and training, model operation, evaluation, deployment, and monitoring. AI life-cycle tools are essential to productize AI/ML solutions.

AWS

AWS ML ML Data Preparation

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. ML models don’t operate in isolation. This necessitates considering the entire ML lifecycle during design and development.

ML

ML ML AWS Machine Learning

What is MLOps

Towards AI

AUGUST 16, 2023

Pietro Jeng on Unsplash MLOps is a set of methods and techniques to deploy and maintain machine learning (ML) models in production reliably and efficiently. Thus, MLOps is the intersection of Machine Learning, DevOps, and Data Engineering (Figure 1). Projects: a standard format for packaging reusable ML code.

Machine Learning

Machine Learning Machine Learning ML ML

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

AWS Machine Learning Blog

MAY 23, 2023

With organizations increasingly investing in machine learning (ML), ML adoption has become an integral part of business transformation strategies. However, implementing ML into production comes with various considerations, notably being able to navigate the world of AI safely, strategically, and responsibly.

Machine Learning

Machine Learning Machine Learning AWS ML

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

Be sure to check out his talk, “ Using Graphs for Large Feature Engineering Pipelines ,” there to learn more about GraphReduce and more! For readers who work in ML/AI, it’s well understood that machine learning models prefer feature vectors of numerical information.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Instead, businesses tend to rely on advanced tools and strategies—namely artificial intelligence for IT operations (AIOps) and machine learning operations (MLOps)—to turn vast quantities of data into actionable insights that can improve IT decision-making and ultimately, the bottom line.

Big Data

Big Data Big Data ML ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment.

AWS

AWS Machine Learning Machine Learning ML

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

AWS Machine Learning Blog

JULY 11, 2024

Amazon SageMaker Studio provides a comprehensive suite of fully managed integrated development environments (IDEs) for machine learning (ML), including JupyterLab , Code Editor (based on Code-OSS), and RStudio. Let’s first review how to extend SageMaker Distribution for geospatial analyses and ML.

AWS

AWS ML ML Python

Step-by-step guide: Generative AI for your business

IBM Journey to AI blog

JULY 30, 2024

Data Scientists and AI experts: Historically we have seen Data Scientists build and choose traditional ML models for their use cases. Data Scientists will typically help with training, validating, and maintaining foundation models that are optimized for data tasks.

AI

AI AI Data Scientist Data Preparation

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

Iguazio

JULY 31, 2023

MLOps accelerates the ML model deployment process to make it more efficient and scalable. MLOps (Machine Learning Operations) is the set of practices and techniques used to efficiently and automatically develop, test, deploy, and maintain ML models and applications and data in production. You’ve come to the right place.

ML

ML ML Machine Learning Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

AWS Machine Learning Blog

MARCH 10, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Starting today, you can connect to Amazon EMR Hive as a big data query engine to bring in large datasets for ML.

Clustering

Clustering AWS ML ML

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. This same interface is also used for provisioning EMR clusters. python3.11-pip jars/livy-repl_2.12-0.7.1-incubating.jar

AWS

AWS Clustering Big Data Big Data

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

Despite the challenges, Afri-SET, with limited resources, envisions a comprehensive data management solution for stakeholders seeking sensor hosting on their platform, aiming to deliver accurate data from low-cost sensors. This happens only when a new data format is detected to avoid overburdening scarce Afri-SET resources.

AWS

AWS AI AI Python

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

AWS Machine Learning Blog

FEBRUARY 22, 2023

In this post, we discuss how Boomi used the bring-your-own-container (BYOC) approach to develop a new AI/ML enabled solution for their customers to tackle the “blank canvas” problem. Boomi’s ML and data engineering teams needed the solution to be deployed quickly, in a repeatable and consistent way, at scale.

AWS

AWS ML ML Data Science

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. Amazon SageMaker notebook jobs allow data scientists to run their notebooks on demand or on a schedule with a few clicks in SageMaker Studio.

ML

ML ML Data Scientist Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.

Towards AI

MAY 8, 2024

Created by the author with DALL E-3 Google Earth Engine for machine learning has just gotten a new face lift, with all the advancement that has been going on in the world of Artificial intelligence, Google Earth Engine was not going to be left behind as it is an important tool for spatial analysis.

Machine Learning

Machine Learning Machine Learning ML ML

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

ML operationalization summary As defined in the post MLOps foundation roadmap for enterprises with Amazon SageMaker , ML and operations (MLOps) is the combination of people, processes, and technology to productionize machine learning (ML) solutions efficiently.

AI

AI AI ML ML

Vertex AI: Guide to Google’s Unified Machine Learning Platform

Pickl AI

AUGUST 28, 2024

From data preparation and model training to deployment and management, Vertex AI provides the tools and infrastructure needed to build intelligent applications. Overview of Vertex AI Vertex AI is a fully-managed, unified AI development platform that integrates all of Google Cloud’s existing ML offerings into a single environment.

Machine Learning

Machine Learning Machine Learning ML ML

State of Machine Learning Survey Results Part Two

ODSC - Open Data Science

MARCH 13, 2023

First, there’s a need for preparing the data, aka data engineering basics. Machine learning practitioners are often working with data at the beginning and during the full stack of things, so they see a lot of workflow/pipeline development, data wrangling, and data preparation.

Machine Learning

Machine Learning Machine Learning Data Wrangling Data Science

How are AI Projects Different

Towards AI

AUGUST 16, 2023

The MLOps Process We can see some of the differences with MLOps which is a set of methods and techniques to deploy and maintain machine learning (ML) models in production reliably and efficiently. MLOps is the intersection of Machine Learning, DevOps, and Data Engineering.

Machine Learning

Machine Learning Machine Learning AI AI

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale. The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring.

AWS

AWS ML ML Machine Learning

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Automate and streamline our ML inference pipeline with SageMaker and Airflow Building an inference data pipeline on large datasets is a challenge many companies face. SageMaker Batch Job Allows you to run batch inference on large datasets and generate predictions in a batch mode using machine learning (ML) models hosted in SageMaker.

Data Pipeline

Data Pipeline ML ML AWS

Introducing our New Book: Implementing MLOps in the Enterprise

Iguazio

DECEMBER 14, 2023

Drawing from their extensive experience in the field, the authors share their strategies, methodologies, tools and best practices for designing and building a continuous, automated and scalable ML pipeline that delivers business value. The book is poised to address these exact challenges.

ML

ML ML Data Science Data Preparation

How Vericast optimized feature engineering using Amazon SageMaker Processing

AWS Machine Learning Blog

MAY 3, 2023

For any machine learning (ML) problem, the data scientist begins by working with data. This includes gathering, exploring, and understanding the business and technical aspects of the data, along with evaluation of any manipulations that may be needed for the model building process.

AWS

AWS Machine Learning Machine Learning ML

New DataRobot and Snowflake Integrations: Seamless Data Prep, Model Deployment, and Monitoring

DataRobot Blog

MARCH 16, 2023

Data scientists run experiments. They work with partners in IT to harden ML use cases into production systems. To work effectively, data scientists need agility in the form of access to enterprise data, streamlined tooling, and infrastructure that just works. Because the current ML lifecycle process is broken.

Data Scientist

Data Scientist ML ML Data Preparation

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Flipboard

MARCH 7, 2023

Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity.

Python

Python AWS ML ML

AI Development Lifecycle Learnings of What Changed with LLMs

ODSC - Open Data Science

FEBRUARY 5, 2025

The Evolving AI Development Lifecycle Despite the revolutionary capabilities of LLMs, the core development lifecycle established by traditional natural language processing remains essential: Plan, Prepare Data, Engineer Model, Evaluate, Deploy, Operate, and Monitor. For instance: Data Preparation: GoogleSheets.

Data Preparation

Data Preparation AI AI Data Scientist

MLOps and the evolution of data science

IBM Journey to AI blog

AUGUST 11, 2023

Both computer scientists and business leaders have taken note of the potential of the data. Machine learning (ML), a subset of artificial intelligence (AI), is an important piece of data-driven innovation. MLOps is the next evolution of data analysis and deep learning. What is MLOps?

Data Science

Data Science Machine Learning Machine Learning ML

Accelerate data preparation for ML in Amazon SageMaker Canvas

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Trending Sources

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Webinars

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Deliver your first ML use case in 8–12 weeks

Train and deploy ML models in a multicloud environment using Amazon SageMaker

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Data4ML Preparation Guidelines (Beyond The Basics)

How Creating Training-ready Datasets Faster Can Unleash ML Teams’ Productivity

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

Experience the new and improved Amazon SageMaker Studio

Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

What is MLOps

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

GraphReduce: Using Graphs for Feature Engineering Abstractions

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

Step-by-step guide: Generative AI for your business

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

MLOps Landscape in 2023: Top Tools and Platforms

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Improving air quality with generative AI

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Vertex AI: Guide to Google’s Unified Machine Learning Platform

State of Machine Learning Survey Results Part Two

How are AI Projects Different

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Introducing our New Book: Implementing MLOps in the Enterprise

How Vericast optimized feature engineering using Amazon SageMaker Processing

New DataRobot and Snowflake Integrations: Seamless Data Prep, Model Deployment, and Monitoring

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

AI Development Lifecycle Learnings of What Changed with LLMs

MLOps and the evolution of data science

Stay Connected