Data Pipeline, Events and ML - Data Science Current

Streaming Data Pipelines: What Are They and How to Build One

Precisely

DECEMBER 28, 2023

The concept of streaming data was born of necessity. More than ever, advanced analytics, ML, and AI are providing the foundation for innovation, efficiency, and profitability. But insights derived from day-old data don’t cut it. Business success is based on how we use continuously changing data.

Data Pipeline

Data Pipeline Apache Kafka Big Data Big Data

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Automate and streamline our ML inference pipeline with SageMaker and Airflow Building an inference data pipeline on large datasets is a challenge many companies face. The Batch job automatically launches an ML compute instance, deploys the model, and processes the input data in batches, producing the output predictions.

Data Pipeline

Data Pipeline ML ML AWS

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

AWS Machine Learning Blog

DECEMBER 4, 2024

Building generative AI applications presents significant challenges for organizations: they require specialized ML expertise, complex infrastructure management, and careful orchestration of multiple services. Prompt 2: Were there any major world events in 2016 affecting the sale of Vegetables?

AWS

AWS AI AI SQL

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

IBM Data Science in Practice

MARCH 8, 2023

The growth of the AI and Machine Learning (ML) industry has continued to grow at a rapid rate over recent years. Hidden Technical Debt in Machine Learning Systems More money, more problems — Rise of too many ML tools 2012 vs 2023 — Source: Matt Turck People often believe that money is the solution to a problem. Spark, Flink, etc.)

Machine Learning

Machine Learning Machine Learning ML ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

The result of these events can be evaluated afterwards so that they make better decisions in the future. With this proactive approach, Kakao Games can launch the right events at the right time. Kakao Games can then create a promotional event not to leave the game. However, this approach is reactive.

AWS

AWS ML ML ETL

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. For more information, refer to Preview: Use Amazon SageMaker to Build, Train, and Deploy ML Models Using Geospatial Data.

ML

ML ML AWS Data Pipeline

Real‑time data streaming architecture: The essential guide to AI‑ready pipelines and instant personalization

Dataconomy

MAY 16, 2025

Six core principles of a realtime streaming pipeline Drawing on Matus Tomleins stepbystep Implementation Guide: Building an AIReady Data Pipeline Architecture , you can anchor any streaming stack around six nonnegotiables: Explicit data requirements. Tight ML integration. Schemafirst design. Duallayer storage.

AI

AI AI ETL ML

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

Flipboard

JUNE 3, 2025

Amazon Elastic Kubernetes Service (Amazon EKS) retrieves data from Amazon DocumentDB , processes it, and invokes Amazon Bedrock Agents for reasoning and analysis. This structured data pipeline enables optimized pricing strategies and multilingual customer interactions.

AWS

AWS Apache Kafka Database AI

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

AWS Machine Learning Blog

FEBRUARY 5, 2025

The following diagram illustrates the data pipeline for indexing and query in the foundational search architecture. The listing writer microservice publishes listing change events to an Amazon Simple Notification Service (Amazon SNS) topic, which an Amazon Simple Queue Service (Amazon SQS) queue subscribes to.

K-nearest Neighbors

K-nearest Neighbors Machine Learning Machine Learning Database

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Instead, businesses tend to rely on advanced tools and strategies—namely artificial intelligence for IT operations (AIOps) and machine learning operations (MLOps)—to turn vast quantities of data into actionable insights that can improve IT decision-making and ultimately, the bottom line.

Big Data

Big Data Big Data ML ML

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Lambda enables serverless, event-driven data processing tasks, allowing for real-time transformations and calculations as data arrives. Step Functions complements this by orchestrating complex workflows, coordinating multiple Lambda functions, and managing error handling for sophisticated data processing pipelines.

AWS

AWS Data Governance Data Silos SQL

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

In this post we highlight how the AWS Generative AI Innovation Center collaborated with the AWS Professional Services and PGA TOUR to develop a prototype virtual assistant using Amazon Bedrock that could enable fans to extract information about any event, player, hole or shot level details in a seamless interactive manner.

SQL

SQL AWS AI AI

Align and monitor your Amazon Bedrock powered insurance assistance chatbot to responsible AI principles with AWS Audit Manager

AWS Machine Learning Blog

JANUARY 7, 2025

AI systems, as well as the infrastructure in which they are deployed, are said to be resilient if they can withstand unexpected adverse events or unexpected changes in their environment or use. Data pipelines that ingest data to the knowledge base should account for throttling and use backoff techniques.

AWS

AWS AI AI Database

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

If the question was Whats the schedule for AWS events in December?, AWS usually announces the dates for their upcoming # re:Invent event around 6-9 months in advance. Chaithanya Maisagoni is a Senior Software Development Engineer (AI/ML) in Amazons Worldwide Returns and ReCommerce organization.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. Their task is to construct and oversee efficient data pipelines.

AWS

AWS ML ML Machine Learning

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Heartbeat

NOVEMBER 6, 2023

Image Source — Pixel Production Inc In the previous article, you were introduced to the intricacies of data pipelines, including the two major types of existing data pipelines. You might be curious how a simple tool like Apache Airflow can be powerful for managing complex data pipelines.

Data Pipeline

Data Pipeline Clean Data ETL Python

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

A novel approach to solve this complex security analytics scenario combines the ingestion and storage of security data using Amazon Security Lake and analyzing the security data with machine learning (ML) using Amazon SageMaker. Deploy the trained ML model to a SageMaker inference endpoint.

AWS

AWS ML ML Algorithm

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. SageMaker is a fully managed service for building, training, and deploying ML models.

Machine Learning

Machine Learning Machine Learning ML ML

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas , allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Let’s add some transformations to get our data ready for training an ML model.

Machine Learning

Machine Learning Machine Learning AWS ML

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

APRIL 5, 2024

SageMaker geospatial capabilities make it straightforward for data scientists and machine learning (ML) engineers to build, train, and deploy models using geospatial data. Geobox enables city departments to do the following: Improved climate adaptation planning – Informed decisions reduce the impact of extreme heat events.

Clustering

Clustering ML ML AWS

Robust time series forecasting with MLOps on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 28, 2023

In these applications, time series data can have heavy-tailed distributions, where the tails represent extreme values. Accurate forecasting in these regions is important in determining how likely an extreme event is and whether to raise an alarm. However, the extreme event will have zero probability.

AWS

AWS Machine Learning Machine Learning ML

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale. The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring. DynamoDB is used to store the pet attributes.

AWS

AWS ML ML Machine Learning

Unleashing Innovation and Success: Comet.ml?—?The Trusted ML Platform for Enterprise Environments

Heartbeat

SEPTEMBER 18, 2023

Unleashing Innovation and Success: Comet — The Trusted ML Platform for Enterprise Environments Machine learning (ML) is a rapidly developing field, and businesses are increasingly depending on ML platforms to fuel innovation, improve efficiency, and mine data for insights.

ML

ML ML Data Scientist Machine Learning

Time series forecasting with LLM-based foundation models and scalable AIOps on AWS

AWS Machine Learning Blog

MARCH 5, 2025

Chronos is founded on a key insight: both LLMs and time series forecasting aim to decode sequential patterns to predict future events. This parallel allows us to treat time series data as a language to be modeled by off-the-shelf transformer architectures. In addition, he builds and deploys AI/ML models on the AWS Cloud.

AWS

AWS Machine Learning Machine Learning Natural Language Processing

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

Traditional maintenance activities rely on a sizable workforce distributed across key locations along the BHS dispatched by operators in the event of an operational fault. It’s an easy way to run analytics on IoT data to gain accurate insights. Motor 2 operated in a cooler environment, where the temperature was ranging between 20–25°C.

AWS

AWS ML ML Machine Learning

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

At its core, Amazon Bedrock provides the foundational infrastructure for robust performance, security, and scalability for deploying machine learning (ML) models. The architecture employs an event-driven model, where the completion of one snap triggers the next step in the workflow.

AI

AI AI Database AWS

Deployment of Data and ML Pipelines for the Most Chaotic Industry: The Stirred Rivers of Crypto

The MLOps Blog

DECEMBER 7, 2022

And we at deployr , worked alongside them to find the best possible answers for everyone involved and build their Data and ML Pipelines. Building data and ML pipelines: from the ground to the cloud It was the beginning of 2022, and things were looking bright after the lockdown’s end.

ML

ML ML AWS ETL

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

Iguazio

MAY 29, 2024

The 4 Gen AI Architecture Pipelines The four pipelines are: 1. The Data Pipeline The data pipeline is the foundation of any AI system. It's responsible for collecting and ingesting the data from various external sources, processing it and managing the data.

Data Pipeline

Data Pipeline AI AI ML

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

These activities cover disparate fields such as basic data processing, analytics, and machine learning (ML). ML is often associated with PBAs, so we start this post with an illustrative figure. The ML paradigm is learning followed by inference. The union of advances in hardware and ML has led us to the current day.

AWS

AWS ML ML Clustering

Training Models on Streaming Data [Practical Guide]

The MLOps Blog

FEBRUARY 5, 2023

In the later part of this article, we will discuss its importance and how we can use machine learning for streaming data analysis with the help of a hands-on example. What is streaming data? This will also help us observe the importance of stream data. It can be used to collect, store, and process streaming data in real-time.

Machine Learning

Machine Learning Machine Learning Data Pipeline Apache Kafka

Harnessing Machine Learning for Climate Change Mitigation: A Roadmap to Sustainable Future

Heartbeat

JANUARY 3, 2024

Our planet sends distress signals through extreme weather events, melting ice caps, and vanishing species. Enter machine learning (ML) , the technological powerhouse that has revolutionized industries from healthcare to finance, with its unparalleled ability to analyze vast datasets, identify patterns, and make predictions.

Machine Learning

Machine Learning Machine Learning ML ML

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. An AI governance framework ensures the ethical, responsible and transparent use of AI and machine learning (ML). It can be used with both on-premise and multi-cloud environments.

AI

AI AI Data Warehouse ML

Advancing AI Cloud with Release 7.2

DataRobot

SEPTEMBER 14, 2021

With Composable ML , expert data scientists can extend DataRobot’s AutoML blueprints with their domain knowledge and custom code. Composable ML turns DataRobot blueprints into reusable building blocks. These retraining policies can be based on a schedule of your choosing or triggered by an event like data drift.

AI

AI AI Data Scientist Machine Learning

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

Data scientists and machine learning engineers need to collaborate to make sure that together with the model, they develop robust data pipelines. These pipelines cover the entire lifecycle of an ML project, from data ingestion and preprocessing, to model training, evaluation, and deployment.

Machine Learning

Machine Learning Machine Learning ML ML

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot. As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers to build and deploy models at scale.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Serverless use cases: How enterprises are using the technology to let developers innovate

IBM Journey to AI blog

AUGUST 6, 2024

Serverless and function as a service (FaaS) In a serverless environment, function as a service (FaaS) —a service that allows customers to run code in response to events—is critical to freeing up developers from managing the underlying infrastructure. In a serverless model, an event triggers app code to run.

Cloud Computing

Cloud Computing Internet of Things Big Data Big Data

How Data Observability Helps to Build Trusted Data

Precisely

SEPTEMBER 18, 2023

It enables a big-picture understanding of the health of your organization’s data through continuous AI/ML-enabled monitoring – detecting anomalies throughout the data pipeline and preventing data downtime. The post How Data Observability Helps to Build Trusted Data appeared first on Precisely.

Data Observability

Data Observability Data Quality Data Pipeline DataOps

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

Snorkel AI wrapped the second day of our The Future of Data-Centric AI virtual conference by showcasing how Snorkel’s data-centric platform has enabled customers to succeed, taking a deep look at Snorkel Flow’s capabilities, and announcing two new solutions. You need to find a place to park your data.

AI

AI AI Data Scientist Machine Learning

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

Snorkel AI wrapped the second day of our The Future of Data-Centric AI virtual conference by showcasing how Snorkel’s data-centric platform has enabled customers to succeed, taking a deep look at Snorkel Flow’s capabilities, and announcing two new solutions. You need to find a place to park your data.

AI

AI AI Data Scientist Machine Learning

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

O'Reilly Media

JUNE 14, 2024

It offers a wealth of books, on-demand courses, live events, short-form posts, interactive labs, expert playlists, and more—formed from the proprietary content of thousands of independent authors, industry experts, and several of the largest education publishers in the world.

AI

AI AI Data Pipeline ML

Announcing the ODSC West 2023 Preliminary Schedule

ODSC - Open Data Science

SEPTEMBER 20, 2023

Monday’s sessions will cover a wide range of topics, from Generative AI and LLMs to MLOps and Data Visualization. Finally, get ready for some All Hallows Eve fun with Halloween Data After Dark , featuring a costume contest, candy, and more. There will also be an in-person career expo where you can find your next job in data science!

Data Wrangling

Data Wrangling Data Science Machine Learning Machine Learning

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

And, as organizations progress and grow, “data drift” starts to impact data usage, models, and your business. In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used. Scheduling.

Data Governance

Data Governance ML ML Cloud Data

Streaming Data Pipelines: What Are They and How to Build One

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Webinars

Trending Sources

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Webinars

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Real‑time data streaming architecture: The essential guide to AI‑ready pipelines and instant personalization

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

MLOps Landscape in 2023: Top Tools and Platforms

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Shaping the future: OMRON’s data-driven journey with AWS

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

Align and monitor your Amazon Bedrock powered insurance assistance chatbot to responsible AI principles with AWS Audit Manager

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

Robust time series forecasting with MLOps on Amazon SageMaker

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Unleashing Innovation and Success: Comet.ml?—?The Trusted ML Platform for Enterprise Environments

Time series forecasting with LLM-based foundation models and scalable AIOps on AWS

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

Comparing Tools For Data Processing Pipelines

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

Deployment of Data and ML Pipelines for the Most Chaotic Industry: The Stirred Rivers of Crypto

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

A review of purpose-built accelerators for financial services

Training Models on Streaming Data [Practical Guide]

Harnessing Machine Learning for Climate Change Mitigation: A Roadmap to Sustainable Future

How to use foundation models and trusted governance to manage AI workflow risk

Advancing AI Cloud with Release 7.2

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

Definite Guide to Building a Machine Learning Platform

Serverless use cases: How enterprises are using the technology to let developers innovate

How Data Observability Helps to Build Trusted Data

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

Announcing the ODSC West 2023 Preliminary Schedule

The Cloud Connection: How Governance Supports Security

Stay Connected