Data Pipeline, Data Preparation and Machine Learning

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

AWS Machine Learning Blog

FEBRUARY 1, 2024

To unlock the potential of generative AI technologies, however, there’s a key prerequisite: your data needs to be appropriately prepared. In this post, we describe how use generative AI to update and scale your data pipeline using Amazon SageMaker Canvas for data prep.

Data Preparation

Data Preparation AWS AI AI

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

Dataiku is an advanced analytics and machine learning platform designed to democratize data science and foster collaboration across technical and non-technical teams. Snowflake excels in efficient data storage and governance, while Dataiku provides the tooling to operationalize advanced analytics and machine learning models.

Machine Learning

Machine Learning Machine Learning Data Science ML

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. We set up an end-to-end Ray-based ML workflow, orchestrated using SageMaker Pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

Data Quality in Machine Learning

Pickl AI

JULY 24, 2024

Summary: Data quality is a fundamental aspect of Machine Learning. Poor-quality data leads to biased and unreliable models, while high-quality data enables accurate predictions and insights. What is Data Quality in Machine Learning? What is Data Quality in Machine Learning?

Data Quality

Data Quality Machine Learning Machine Learning Clean Data

5 Ways Where Data-Driven Analytics Reshaped The Software Industry

Smart Data Collective

FEBRUARY 3, 2022

It is 2022, and software developers are observing the dominance of native apps because of the data-driven approach. With data technology and machine learning, every customer gets a unique approach. Business teams significantly rely upon data for self-service tools and more.

Analytics

Analytics Analytics Machine Learning Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

How to evaluate MLOps tools and platforms Like every software solution, evaluating MLOps (Machine Learning Operations) tools and platforms can be a complex task as it requires consideration of varying factors. Pay-as-you-go pricing makes it easy to scale when needed.

Machine Learning

Machine Learning Machine Learning ML ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Zeta’s AI innovation is powered by a proprietary machine learning operations (MLOps) system, developed in-house. Context In early 2023, Zeta’s machine learning (ML) teams shifted from traditional vertical teams to a more dynamic horizontal structure, introducing the concept of pods comprising diverse skill sets.

AWS

AWS Machine Learning Machine Learning ML

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Automate and streamline our ML inference pipeline with SageMaker and Airflow Building an inference data pipeline on large datasets is a challenge many companies face. The Batch job automatically launches an ML compute instance, deploys the model, and processes the input data in batches, producing the output predictions.

Data Pipeline

Data Pipeline ML ML AWS

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale. The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring.

AWS

AWS ML ML Machine Learning

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Instead, businesses tend to rely on advanced tools and strategies—namely artificial intelligence for IT operations (AIOps) and machine learning operations (MLOps)—to turn vast quantities of data into actionable insights that can improve IT decision-making and ultimately, the bottom line.

Big Data

Big Data Big Data ML ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Statistical methods and machine learning (ML) methods are actively developed and adopted to maximize the LTV. In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker.

AWS

AWS ML ML ETL

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In an increasingly digital and rapidly changing world, BMW Group’s business and product development strategies rely heavily on data-driven decision-making. With that, the need for data scientists and machine learning (ML) engineers has grown significantly.

ML

ML ML AWS AI

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. All code for this post is available in the GitHub repo.

ML

ML ML AWS Python

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 21, 2024

In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks.

AWS

AWS Data Preparation ML ML

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. With AWS Glue custom connectors, it’s effortless to transfer data between Amazon S3 and other applications.

AWS

AWS Python AI AI

2024 Mexican Grand Prix: Formula 1 Prediction Challenge Results

Ocean Protocol

NOVEMBER 28, 2024

Introduction The Formula 1 Prediction Challenge: 2024 Mexican Grand Prix brought together data scientists to tackle one of the most dynamic aspects of racing — pit stop strategies. With every second on the track critical, the challenge showcased how data can shape decisions that define race outcomes.

Cross Validation

Cross Validation Decision Trees Data Scientist Data Science

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, ML, and application development.

ML

ML ML AWS Data Warehouse

MLOps and the evolution of data science

IBM Journey to AI blog

AUGUST 11, 2023

Machine learning (ML), a subset of artificial intelligence (AI), is an important piece of data-driven innovation. Machine learning engineers take massive datasets and use statistical methods to create algorithms that are trained to find patterns and uncover key insights in data mining projects.

Data Science

Data Science Machine Learning Machine Learning ML

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. They are crucial in ensuring data is readily available for analysis and reporting.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

ODSC - Open Data Science

OCTOBER 7, 2024

In today’s landscape, AI is becoming a major focus in developing and deploying machine learning models. It isn’t just about writing code or creating algorithms — it requires robust pipelines that handle data, model training, deployment, and maintenance. Model Training: Running computations to learn from the data.

Machine Learning

Machine Learning Machine Learning AI AI

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. Automated development: Automates data preparation, model development, feature engineering and hyperparameter optimization using AutoAI.

AI

AI AI Machine Learning Machine Learning

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

As businesses increasingly turn to cloud solutions, Azure stands out as a leading platform for Data Science, offering powerful tools and services for advanced analytics and Machine Learning. This roadmap aims to guide aspiring Azure Data Scientists through the essential steps to build a successful career.

Azure

Azure Data Scientist Data Science Machine Learning

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

Source: [link] Similarly, while building any machine learning-based product or service, training and evaluating the model on a few real-world samples does not necessarily mean the end of your responsibilities. MLOps tools play a pivotal role in every stage of the machine learning lifecycle. What is MLOps?

Machine Learning

Machine Learning Machine Learning ML ML

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

Standard Chartered Bank’s Global Head of Technology, Santhosh Mahendiran , discussed the democratization of data across 3,500+ business users in 68 countries. We look at data as an asset, regardless of whether the use case is AML/fraud or new revenue. 3) Data professionals come in all shapes and forms. DataRobot Data Prep.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

3 Takeaways from Gartner’s 2018 Data and Analytics Summit

DataRobot Blog

APRIL 1, 2018

Today’s data management and analytics products have infused artificial intelligence (AI) and machine learning (ML) algorithms into their core capabilities. These modern tools will auto-profile the data, detect joins and overlaps, and offer recommendations. DataRobot Data Prep. Sallam | Shubhangi Vashisth. .

Analytics

Analytics Analytics Data Preparation Augmented Analytics

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

The primary goal of Data Engineering is to transform raw data into a structured and usable format that can be easily accessed, analyzed, and interpreted by Data Scientists, analysts, and other stakeholders. Future of Data Engineering The Data Engineering market will expand from $18.2

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Summary: Data transformation tools streamline data processing by automating the conversion of raw data into usable formats. These tools enhance efficiency, improve data quality, and support Advanced Analytics like Machine Learning. Aggregation : Combining multiple data points into a single summary (e.g.,

Data Quality

Data Quality AWS Machine Learning Machine Learning

Introducing the DataRobot AI Cloud: A Closer Look

DataRobot

SEPTEMBER 14, 2021

DataRobot AI Cloud is the only platform on the market that offers straight through code, straight through automation, or any combination of these approaches in a unified environment that continuously learns.

AI

AI AI Data Pipeline Data Preparation

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

Iguazio

JULY 31, 2023

In this blog post, we detail the steps you need to take to build and run a successful MLOps pipeline. MLOps (Machine Learning Operations) is the set of practices and techniques used to efficiently and automatically develop, test, deploy, and maintain ML models and applications and data in production. What is MLOps?

ML

ML ML Machine Learning Machine Learning

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022.

SQL

SQL ML ML Python

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Automation Automation plays a pivotal role in streamlining ETL processes, reducing the need for manual intervention, and ensuring consistent data availability. By automating key tasks, organisations can enhance efficiency and accuracy, ultimately improving the quality of their data pipelines.

ETL

ETL Data Warehouse Data Quality Data Governance

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

On the client side, Snowpark consists of libraries, including the DataFrame API and native Snowpark machine learning (ML) APIs for model development (public preview) and deployment (private preview). Machine Learning Training machine learning (ML) models can sometimes be resource-intensive.

Python

Python ML ML SQL

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

These activities cover disparate fields such as basic data processing, analytics, and machine learning (ML). Historical data is normally (but not always) independent inter-day, meaning that days can be parsed independently. An important part of the data pipeline is the production of features, both online and offline.

AWS

AWS ML ML Clustering

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

Knowing this, you want to have data prepared in a way to optimize your load. Data Pipelines “Data pipeline” means moving data in a consistent, secure, and reliable way at some frequency that meets your requirements. It might be tempting to have massive files and let the system sort it out.

Database

Database Clustering SQL Data Pipeline

Using ChatGPT for Data Science

Pickl AI

FEBRUARY 8, 2023

Data Manipulation The process through which you can change the data according to your project requirement for further data analysis is known as Data Manipulation. The entire process involves cleaning, Merging and changing the data format. This data can help in building the project pipeline.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Unlocking Tabular Data’s Hidden Potential

ODSC - Open Data Science

MAY 10, 2023

Many mistakenly equate tabular data with business intelligence rather than AI, leading to a dismissive attitude toward its sophistication. Standard data science practices could also be contributing to this issue. One might say that tabular data modeling is the original data-centric AI!

Data Scientist

Data Scientist Data Science Deep Learning Deep Learning

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

From a software engineering perspective, machine-learning models, if you look at it in terms of the number of parameters and in terms of size, started out from the transformer models. So the application started to go from the pure software-engineering/machine-learning domain to industry and the sciences, essentially.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

Google’s Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

From a software engineering perspective, machine-learning models, if you look at it in terms of the number of parameters and in terms of size, started out from the transformer models. So the application started to go from the pure software-engineering/machine-learning domain to industry and the sciences, essentially.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

LLMOps vs. MLOps: Understanding the Differences

Iguazio

FEBRUARY 8, 2024

LLMOps (Large Language Model Operations), is a specialized domain within the broader field of machine learning operations (MLOps). Continuous monitoring of resources, data, and metrics. Data Pipeline - Manages and processes various data sources. ML Pipeline - Focuses on training, validation and deployment.

ML

ML ML Data Scientist AI

Train An Emotion Recognition Model Using Multiple Datasets-Part 1

Mlearning.ai

JUNE 21, 2023

We then go over all the project components and processes, from data preparation, model training, and experiment tracking to model evaluation, to equip you with the skills to construct your own emotion recognition model. Refer to this repository as we walk through the project.

Deep Learning

Deep Learning Deep Learning ML ML

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

To establish trust between the data producers and data consumers, SageMaker Catalog also integrates the data quality metrics and data lineage events to track and drive transparency in data pipelines. In this section, we show you how to import the technical metadata from AWS Glue data catalogs.

SQL

SQL Data Analyst Data Warehouse AWS

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

They run scripts manually to preprocess their training data, rerun the deployment scripts, manually tune their models, and spend their working hours keeping previously developed models up to date. Building end-to-end machine learning pipelines lets ML engineers build once, rerun, and reuse many times.

ML

ML ML Machine Learning Machine Learning

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

Webinars

Trending Sources

How Dataiku and Snowflake Strengthen the Modern Data Stack

Webinars

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Data Quality in Machine Learning

5 Ways Where Data-Driven Analytics Reshaped The Software Industry

MLOps Landscape in 2023: Top Tools and Platforms

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Use Snowflake as a data source to train ML models with Amazon SageMaker

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

Improving air quality with generative AI

2024 Mexican Grand Prix: Formula 1 Prediction Challenge Results

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

MLOps and the evolution of data science

Discover the Most Important Fundamentals of Data Engineering

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

Exploring the AI and data capabilities of watsonx

Your Complete Roadmap to Become an Azure Data Scientist

How to Choose MLOps Tools: In-Depth Guide for 2024

3 Major Trends at Strata New York 2017

3 Takeaways from Gartner’s 2018 Data and Analytics Summit

10 Best Data Engineering Books [Beginners to Advanced]

Popular Data Transformation Tools: Importance and Best Practices

Introducing the DataRobot AI Cloud: A Closer Look

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Maximising Efficiency with ETL Data: Future Trends and Best Practices

How Does Snowpark Work?

A review of purpose-built accelerators for financial services

Getting Started With Snowflake: Best Practices For Launching

Using ChatGPT for Data Science

Unlocking Tabular Data’s Hidden Potential

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Google’s Arsanjani on Enterprise Foundation Model Challenges

LLMOps vs. MLOps: Understanding the Differences

Train An Emotion Recognition Model Using Multiple Datasets-Part 1

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

How to Build an End-To-End ML Pipeline

Stay Connected