Apache Kafka, Database and Events - Data Science Current

Maximizing your event-driven architecture investments: Unleashing the power of Apache Kafka with IBM Event Automation

IBM Journey to AI blog

FEBRUARY 12, 2024

Recognizing the need to harness real-time data, businesses are increasingly turning to event-driven architecture (EDA) as a strategic approach to stay ahead of the curve. At the forefront of this event-driven revolution is Apache Kafka, the widely recognized and dominant open-source technology for event streaming.

Apache Kafka

Apache Kafka EDA SQL Database

Apache Kafka use cases: Driving innovation across diverse industries

IBM Journey to AI blog

SEPTEMBER 4, 2024

Apache Kafka is an open-source , distributed streaming platform that allows developers to build real-time, event-driven applications. With Apache Kafka, developers can build applications that continuously use streaming data records and deliver real-time experiences to users. How does Apache Kafka work?

Apache Kafka

Apache Kafka Internet of Things Data Pipeline Clustering

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Real-time artificial intelligence and event processing

IBM Journey to AI blog

NOVEMBER 29, 2023

By leveraging AI for real-time event processing, businesses can connect the dots between disparate events to detect and respond to new trends, threats and opportunities. AI and event processing: a two-way street An event-driven architecture is essential for accelerating the speed of business.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Apache Kafka AI

Real-Time Sentiment Analysis with Kafka and PySpark

Towards AI

FEBRUARY 29, 2024

Within this article, we will explore the significance of these pipelines and utilise robust tools such as Apache Kafka and Spark to manage vast streams of data efficiently. Apache Kafka Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and streaming applications.

Apache Kafka

Apache Kafka SQL Clustering Data Pipeline

Level up your Kafka applications with schemas

IBM Journey to AI blog

NOVEMBER 21, 2023

Apache Kafka is a well-known open-source event store and stream processing platform and has grown to become the de facto standard for data streaming. Apache Kafka transfers data without validating the information in the messages. Optimize your Kafka environment by using a schema registry.

Apache Kafka

Apache Kafka Clustering Data Quality Data Governance

Big Data – Lambda or Kappa Architecture?

Data Science Blog

JUNE 27, 2023

In this representation, there is a separate store for events within the speed layer and another store for data loaded during batch processing. It is important to note that in the Lambda architecture, the serving layer can be omitted, allowing batch processing and event streaming to remain separate entities.

Big Data

Big Data Big Data Apache Kafka Database

Real-time fraud detection using AWS serverless and machine learning services

AWS Machine Learning Blog

MARCH 10, 2023

We show how you can apply this approach to various data streaming and event-driven architectures, depending on the desired outcome and actions to take to prevent fraud (such as alert the user about the fraud or flag the transaction for additional review). The user can then choose to take action to prevent further abuse.

Machine Learning

Machine Learning Machine Learning AWS Apache Kafka

Big data engineering simplified: Exploring roles of distributed systems

Data Science Dojo

JULY 24, 2023

Its characteristics can be summarized as follows: Volume : Big Data involves datasets that are too large to be processed by traditional database management systems. databases), semi-structured data (e.g., These datasets can range from terabytes to petabytes and beyond. XML, JSON), and unstructured data (e.g., text, images, videos).

Big Data

Big Data Big Data Data Engineering Data Engineering

Streaming Data Pipelines: What Are They and How to Build One

Precisely

DECEMBER 28, 2023

Streaming data pipelines, by extension, offer an architecture capable of handling large volumes of data, accommodating millions of events in near real time. One very popular platform is Apache Kafka , a powerful open-source tool used by thousands of companies. You need a separate tool to do that.

Data Pipeline

Data Pipeline Apache Kafka Big Data Big Data

Apache Flink for all: Making Flink consumable across all areas of your business

IBM Journey to AI blog

AUGUST 29, 2024

Event-driven businesses across all industries thrive on real-time data, enabling companies to act on events as they happen rather than after the fact. This is where Apache Flink shines, offering a powerful solution to harness the full potential of an event-driven business model through efficient computing and processing capabilities.

Apache Kafka

Apache Kafka Hadoop ETL Data Pipeline

Building a Pizza Delivery Service with a Real-Time Analytics Stack

ODSC - Open Data Science

JUNE 1, 2023

To understand what it means, we should start by thinking of the world in terms of events, where an event is a thing that happens. And we are going to take those events, become aware of them, and understand them. Stores events in a durable manner so that downstream components can process them.

Analytics

Analytics Analytics Apache Kafka Data Science

How to Unlock Real-Time Analytics with Snowflake?

phData

MAY 3, 2024

How Snowflake Helps Achieve Real-Time Analytics Snowflake is the ideal platform to achieve real-time analytics for several reasons, but two of the biggest are its ability to manage concurrency due to the multi-cluster architecture of Snowflake and its robust connections to 3rd party tools like Kafka. p8 -pubout -out C:tmpnew_rsa_key_v1.pub

Apache Kafka

Apache Kafka Analytics Analytics ETL

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

AWS Machine Learning Blog

NOVEMBER 3, 2023

This process comprises two key components: event data and optical tracking data. Event data collection entails gathering the fundamental building blocks of the game. For the precision needed in shot speed calculations, we must ensure that the ball’s position aligns precisely with the moment of the event.

AWS

AWS Apache Kafka Data Scientist Data Science

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

AWS Machine Learning Blog

SEPTEMBER 11, 2024

It initially sources input time series data from Amazon Managed Streaming for Apache Kafka (Amazon MSK) using this live stream for model training. Conclusion This post demonstrated how to build a robust real-time anomaly detection solution for streaming time series data using Managed Service for Apache Flink and other AWS services.

AWS

AWS ML ML Apache Kafka

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

AWS Machine Learning Blog

MARCH 30, 2023

How Keeper Efficiency is implemented This Bundesliga Match Fact consumes both event and positional data. Event data consists of hand-labelled event descriptions with useful attributes, such as shot on target. This frame is used to synchronize the event data with the positional data.

Machine Learning

Machine Learning Machine Learning AWS ML

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Pickl AI

SEPTEMBER 18, 2024

It utilises Amazon Web Services (AWS) as its main data lake, processing over 550 billion events daily—equivalent to approximately 1.3 Data in Motion Technologies like Apache Kafka facilitate real-time processing of events and data, allowing Netflix to respond swiftly to user interactions and operational needs.

Big Data

Big Data Big Data Apache Kafka Big Data Analytics

Training Models on Streaming Data [Practical Guide]

The MLOps Blog

FEBRUARY 5, 2023

Streaming data is a continuous flow of information and a foundation of event-driven architecture software model” – RedHat Enterprises around the world are becoming dependent on data more than ever. A streaming data pipeline is an enhanced version which is able to handle millions of events in real-time at scale.

Machine Learning

Machine Learning Machine Learning Data Pipeline Apache Kafka

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Mlearning.ai

MAY 16, 2023

It is used to extract data from various sources, transform the data to fit a specific data model or schema, and then load the transformed data into a target system such as a data warehouse or a database. The company can use the Pub/Sub pattern to process customer events such as product views, add to cart, and checkout.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Introduction to Apache NiFi and Its Architecture

Pickl AI

JULY 30, 2024

Guaranteed Delivery : NiFi ensures that data delivered reliably, even in the event of failures. It maintains a write-ahead log to ensure that the state of FlowFiles preserved, even in the event of a failure. Provenance Repository : This repository records all provenance events related to FlowFiles.

ETL

ETL Data Lakes Big Data Big Data

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

They are responsible for building and maintaining data architectures, which include databases, data warehouses, and data lakes. Data Modelling Data modelling is creating a visual representation of a system or database. Physical Models: These models specify how data will be physically stored in databases.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

In this guide, we will explore concepts like transitional modeling for customer profiles, the power of event logs for customer behavior, persistent staging for raw customer data, real-time customer data capture, and much more. It often involves specialized databases designed to handle this kind of atomic, temporal data.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Data can come from different sources, such as databases or directly from users, with additional sources, including platforms like GitHub, Notion, or S3 buckets. Vector Databases Vector databases help store unstructured data by storing the actual data and its vector representation. mp4,webm, etc.), and audio files (.wav,mp3,acc,

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Big Data Syllabus: A Comprehensive Overview

Pickl AI

AUGUST 9, 2024

Variety It encompasses the different types of data, including structured data (like databases), semi-structured data (like XML), and unstructured formats (such as text, images, and videos). Understanding the differences between SQL and NoSQL databases is crucial for students. Once data is collected, it needs to be stored efficiently.

Big Data

Big Data Big Data Big Data Analytics Big Data Analytics

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

Data Ingestion : Involves raw data collection from origin and storage using architectures such as batch, streaming or event-driven. Data Pipeline Tool Key Features Apache Airflow Flexible, customizable, and supports complex business logic. Relational database connectors are available. Talend Free to use.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

AWS Machine Learning Blog

APRIL 19, 2023

Streaming ingestion – An Amazon Kinesis Data Analytics for Apache Flink application backed by Apache Kafka topics in Amazon Managed Streaming for Apache Kafka (MSK) (Amazon MSK) calculates aggregated features from a transaction stream, and an AWS Lambda function updates the online feature store.

ML

ML ML Apache Kafka SQL

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

Apache Kafka, Amazon Kinesis) 2 Data Preprocessing (e.g., The exploration of common machine learning pipeline architecture and patterns starts with a pattern found in not just machine learning systems but also database systems, streaming platforms, web applications, and modern computing infrastructure. 1 Data Ingestion (e.g.,

ML

ML ML Machine Learning Machine Learning

Top Big Data Tools Every Data Professional Should Know

Pickl AI

FEBRUARY 23, 2025

Best Big Data Tools Popular tools such as Apache Hadoop, Apache Spark, Apache Kafka, and Apache Storm enable businesses to store, process, and analyse data efficiently. Real-Time Data Analysis: Connects seamlessly with various databases for live analysis.

Big Data

Big Data Big Data Apache Hadoop Apache Kafka

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

AWS Machine Learning Blog

FEBRUARY 7, 2025

However, it lacked essential services required for machine learning (ML) applications, such as frontend and backend infrastructure, DNS, load balancers, scaling, blob storage, and managed databases. At that time, the application was deployed as a single monolithic container, which included Kafka and a database.

Analytics

Analytics Analytics AWS Clustering

Data Science Current

Maximizing your event-driven architecture investments: Unleashing the power of Apache Kafka with IBM Event Automation

Apache Kafka use cases: Driving innovation across diverse industries

Webinars

Trending Sources

Streaming Machine Learning Without a Data Lake

Webinars

Real-time artificial intelligence and event processing

Real-Time Sentiment Analysis with Kafka and PySpark

Level up your Kafka applications with schemas

Big Data – Lambda or Kappa Architecture?

Real-time fraud detection using AWS serverless and machine learning services

Big data engineering simplified: Exploring roles of distributed systems

Streaming Data Pipelines: What Are They and How to Build One

Apache Flink for all: Making Flink consumable across all areas of your business

Building a Pizza Delivery Service with a Real-Time Analytics Stack

How to Unlock Real-Time Analytics with Snowflake?

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Training Models on Streaming Data [Practical Guide]

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Introduction to Apache NiFi and Its Architecture

Discover the Most Important Fundamentals of Data Engineering

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

How to Manage Unstructured Data in AI and Machine Learning Projects

Big Data Syllabus: A Comprehensive Overview

Comparing Tools For Data Processing Pipelines

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Top Big Data Tools Every Data Professional Should Know

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

Stay Connected