Apache Kafka, Events and ML - Data Science Current

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

AWS Machine Learning Blog

APRIL 18, 2025

Solution overview: Build a generative AI stock price analyzer with RAG For this post, we implement a RAG architecture with Amazon Bedrock Knowledge Bases using a custom connector and topics built with Amazon Managed Streaming for Apache Kafka (Amazon MSK) for a user who may be interested to understand stock price trends.

Apache Kafka

Apache Kafka AWS Clustering Database

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

Real-time artificial intelligence and event processing

IBM Journey to AI blog

NOVEMBER 29, 2023

By leveraging AI for real-time event processing, businesses can connect the dots between disparate events to detect and respond to new trends, threats and opportunities. AI and event processing: a two-way street An event-driven architecture is essential for accelerating the speed of business.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Apache Kafka AI

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

Flipboard

JUNE 3, 2025

By using containerized applications, event-driven workflows, and AI capabilities, the system provides scalable and flexible insights to EV station operators. The data is then transmitted to Amazon Managed Streaming for Apache Kafka (Amazon MSK) to facilitate high-throughput, reliable streaming.

AWS

AWS Apache Kafka Database AI

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

AWS Machine Learning Blog

APRIL 19, 2023

Businesses are increasingly using machine learning (ML) to make near-real-time decisions, such as placing an ad, assigning a driver, recommending a product, or even dynamically pricing products and services. Apache Flink is a popular framework and engine for processing data streams. 0 … 1248 Nov-02 12:14:31 32.45

ML

ML ML Apache Kafka SQL

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

AWS Machine Learning Blog

FEBRUARY 7, 2025

However, it lacked essential services required for machine learning (ML) applications, such as frontend and backend infrastructure, DNS, load balancers, scaling, blob storage, and managed databases. At that time, the application was deployed as a single monolithic container, which included Kafka and a database.

Analytics

Analytics Analytics AWS Clustering

Top Big Data Tools Every Data Professional Should Know

Pickl AI

FEBRUARY 23, 2025

Best Big Data Tools Popular tools such as Apache Hadoop, Apache Spark, Apache Kafka, and Apache Storm enable businesses to store, process, and analyse data efficiently. Machine Learning Integration : Built-in ML capabilities streamline model development and deployment.

Big Data

Big Data Big Data Apache Hadoop Apache Kafka

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

AWS Machine Learning Blog

SEPTEMBER 11, 2024

In this post, we demonstrate how to build a robust real-time anomaly detection solution for streaming time series data using Amazon Managed Service for Apache Flink and other AWS managed services. This solution employs machine learning (ML) for anomaly detection, and doesn’t require users to have prior AI expertise.

AWS

AWS ML ML Apache Kafka

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

The key requirement for TR’s new machine learning (ML)-based personalization engine was centered around an accurate recommendation system that takes into account recent customer trends. ML training pipeline. TR customer data is changing at a faster rate than the business rules can evolve to reflect changing customer needs.

AWS

AWS Data Warehouse ML ML

Streaming Data Pipelines: What Are They and How to Build One

Precisely

DECEMBER 28, 2023

More than ever, advanced analytics, ML, and AI are providing the foundation for innovation, efficiency, and profitability. Streaming data pipelines, by extension, offer an architecture capable of handling large volumes of data, accommodating millions of events in near real time. The concept of streaming data was born of necessity.

Data Pipeline

Data Pipeline Apache Kafka Big Data Big Data

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

AWS Machine Learning Blog

NOVEMBER 3, 2023

This process comprises two key components: event data and optical tracking data. Event data collection entails gathering the fundamental building blocks of the game. For the precision needed in shot speed calculations, we must ensure that the ball’s position aligns precisely with the moment of the event.

AWS

AWS Apache Kafka Data Scientist Data Science

Pictures and Highlights from ODSC Europe 2023

ODSC - Open Data Science

JULY 22, 2023

Expo Hall ODSC events are more than just data science training and networking events. Thank you to everyone who attended for making this event possible, and showing once again why we do what we do — connecting the greater data science community together to push the industry forward. What’s next?

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Science

Watch the Top ODSC Europe 2023 Virtual Sessions Here

ODSC - Open Data Science

JULY 14, 2023

Probabilistic Machine Learning for Finance and Investing Deepak Kanungo | Founder and CEO, Advisory Board Member | Hedged Capital LLC, AIKON This session will introduce you to the reasons why probabilistic machine learning is the next generation of AI in finance and investing.

Machine Learning

Machine Learning Machine Learning Apache Kafka Data Science

Training Models on Streaming Data [Practical Guide]

The MLOps Blog

FEBRUARY 5, 2023

Streaming data is a continuous flow of information and a foundation of event-driven architecture software model” – RedHat Enterprises around the world are becoming dependent on data more than ever. A streaming data pipeline is an enhanced version which is able to handle millions of events in real-time at scale.

Machine Learning

Machine Learning Machine Learning Data Pipeline Apache Kafka

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

These pipelines cover the entire lifecycle of an ML project, from data ingestion and preprocessing, to model training, evaluation, and deployment. Adopted from [link] In this article, we will first briefly explain what ML workflows and pipelines are. around the world to streamline their data and ML pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Managing unstructured data is essential for the success of machine learning (ML) projects. This article will discuss managing unstructured data for AI and ML projects. You will learn the following: Why unstructured data management is necessary for AI and ML projects. How to properly manage unstructured data.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Mlearning.ai

MAY 16, 2023

In data engineering, the Pub/Sub pattern can be used for various use cases such as real-time data processing, event-driven architectures, and data synchronization across multiple systems. The company can use the Pub/Sub pattern to process customer events such as product views, add to cart, and checkout.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process. Data Ingestion : Involves raw data collection from origin and storage using architectures such as batch, streaming or event-driven.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

DagsHub

JANUARY 14, 2025

A massive amount of diverse data powers today's ML models. Similar Audio: Audio recordings of the same event or sound but with different microphone placements or background noise. Tools like Apache Kafka and Apache Flink can be configured for this purpose.

Machine Learning

Machine Learning Machine Learning Clustering Algorithm

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

There comes a time when every ML practitioner realizes that training a model in Jupyter Notebook is just one small part of the entire project. At that point, the Data Scientists or ML Engineers become curious and start looking for such implementations. What are ML pipeline architecture design patterns?

ML

ML ML Machine Learning Machine Learning

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

AWS Machine Learning Blog

MARCH 30, 2023

The result is a machine learning (ML)-powered insight that allows fans to easily evaluate and compare the goalkeepers’ proficiencies. An ML model is trained through Amazon SageMaker , using data from four seasons of the first and second Bundesliga, encompassing all shots that landed on target (either resulting in a goal or being saved).

Machine Learning

Machine Learning Machine Learning AWS ML

Building a Business with a Real-Time Analytics Stack, Streaming ML Without a Data Lake, and…

ODSC - Open Data Science

MAY 24, 2023

Building a Business with a Real-Time Analytics Stack, Streaming ML Without a Data Lake, and Google’s PaLM 2 Building a Pizza Delivery Service with a Real-Time Analytics Stack The best businesses react quickly and with informed decisions. Here’s a use case of how you can use a real-time analytics stack to build a pizza delivery service.

Data Lakes

Data Lakes ML ML Analytics

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

In this guide, we will explore concepts like transitional modeling for customer profiles, the power of event logs for customer behavior, persistent staging for raw customer data, real-time customer data capture, and much more. Rich Context: Each event carries with it a wealth of contextual information. What is Activity Schema Modeling?

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Data Science Current

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

Streaming Machine Learning Without a Data Lake

Webinars

Trending Sources

Real-time artificial intelligence and event processing

Webinars

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

Top Big Data Tools Every Data Professional Should Know

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Streaming Data Pipelines: What Are They and How to Build One

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

Pictures and Highlights from ODSC Europe 2023

Watch the Top ODSC Europe 2023 Virtual Sessions Here

Training Models on Streaming Data [Practical Guide]

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

How to Manage Unstructured Data in AI and Machine Learning Projects

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Comparing Tools For Data Processing Pipelines

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

Building a Business with a Real-Time Analytics Stack, Streaming ML Without a Data Lake, and…

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Stay Connected