Apache Kafka and Blog - Data Science Current

Exploring Partitions and Consumer Groups in Apache Kafka

Analytics Vidhya

AUGUST 2, 2022

Introduction Earlier, I had introduced basic concepts of Apache Kafka in my blog on Analytics Vidhya(link is available under references). This article introduced concepts involved in Apache Kafka and further built the understanding by using the python API of Kafka to write some […].

Apache Kafka

Apache Kafka Data Science Python Analytics

Supernovas, Black Holes and Streaming Data

databricks

AUGUST 12, 2024

The blog explores data streams from NASA satellites using Apache Kafka and Databricks. It demonstrates ingestion and transformation with Delta Live Tables in SQL and AI/BI-powered analysis of supernova events.

Apache Kafka

Apache Kafka SQL AI AI

Maximizing your event-driven architecture investments: Unleashing the power of Apache Kafka with IBM Event Automation

IBM Journey to AI blog

FEBRUARY 12, 2024

At the forefront of this event-driven revolution is Apache Kafka, the widely recognized and dominant open-source technology for event streaming. While most enterprises have already recognized how Apache Kafka provides a strong foundation for EDA, they often fall behind in unlocking its true potential.

Apache Kafka

Apache Kafka EDA SQL Database

Webinars

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Apache Kafka and Apache Flink: An open-source match made in heaven

IBM Journey to AI blog

NOVEMBER 3, 2023

Apache Kafka and Apache Flink working together Anyone who is familiar with the stream processing ecosystem is familiar with Apache Kafka: the de-facto enterprise standard for open-source event streaming. With Apache Kafka, you get a raw stream of events from everything that is happening within your business.

Apache Kafka

Apache Kafka Data Warehouse Data Pipeline Big Data

Apache Kafka use cases: Driving innovation across diverse industries

IBM Journey to AI blog

SEPTEMBER 4, 2024

Apache Kafka is an open-source , distributed streaming platform that allows developers to build real-time, event-driven applications. With Apache Kafka, developers can build applications that continuously use streaming data records and deliver real-time experiences to users. How does Apache Kafka work?

Apache Kafka

Apache Kafka Internet of Things Data Pipeline Clustering

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

The winning combination for real-time insights: Messaging and event-driven architecture

IBM Journey to AI blog

APRIL 2, 2024

However, IBM MQ and Apache Kafka can sometimes be viewed as competitors, taking each other on in terms of speed, availability, cost and skills. MQ and Apache Kafka: Teammates Simply put, they are different technologies with different strengths, albeit often perceived to be quite similar. Interested in learning more?

Apache Kafka

Apache Kafka Clustering SQL AI

Accelerate your speed of business with IBM Event Automation

IBM Journey to AI blog

MAY 9, 2023

IBM Event Automation provides an intuitive and integrated experience for distributing, discovering and processing business events across the organization: Event distribution: Collect raw streams of real-time business events with enterprise-grade Apache Kafka.

Apache Kafka

Apache Kafka Business Intelligence Business Intelligence

Level up your Kafka applications with schemas

IBM Journey to AI blog

NOVEMBER 21, 2023

Apache Kafka is a well-known open-source event store and stream processing platform and has grown to become the de facto standard for data streaming. Apache Kafka transfers data without validating the information in the messages. Learn more about Kafka and its use cases here.

Apache Kafka

Apache Kafka Clustering Data Quality Data Governance

Event-driven architecture (EDA) enables a business to become more aware of everything that’s happening, as it’s happening

IBM Journey to AI blog

JANUARY 8, 2024

They often use Apache Kafka as an open technology and the de facto standard for accessing events from a various core systems and applications. IBM provides an Event Streams capability build on Apache Kafka that makes events manageable across an entire enterprise.

EDA

EDA Apache Kafka Clustering Data Governance

Big data engineering simplified: Exploring roles of distributed systems

Data Science Dojo

JULY 24, 2023

In the next sections of this blog, we will delve deeper into the technical aspects of Distributed Systems in Big Data Engineering, showcasing code snippets to illustrate how these systems work in practice.

Big Data

Big Data Big Data Data Engineering Data Engineering

How to Unlock Real-Time Analytics with Snowflake?

phData

MAY 3, 2024

If you have the Snowflake Data Cloud (or are considering migrating to Snowflake ), you’re a blog away from taking a step closer to real-time analytics. In this blog, we’ll show you step-by-step how to achieve real-time analytics with Snowflake via the Kafka Connector and Snowpipe. Looking for additional help?

Apache Kafka

Apache Kafka Analytics Analytics ETL

Apache Flink for all: Making Flink consumable across all areas of your business

IBM Journey to AI blog

AUGUST 29, 2024

The unique advantages of Apache Flink Apache Flink augments event streaming technologies like Apache Kafka to enable businesses to respond to events more effectively in real time. Integration: Integrates seamlessly with other data systems and platforms, including Apache Kafka, Spark, Hadoop and various databases.

Apache Kafka

Apache Kafka Hadoop ETL Data Pipeline

Big Data – Lambda or Kappa Architecture?

Data Science Blog

JUNE 27, 2023

In practical implementation, the Kappa architecture is commonly deployed using Apache Kafka or Kafka-based tools. Applications can directly read from and write to Kafka or an alternative message queue tool. appeared first on Data Science Blog. The post Big Data – Lambda or Kappa Architecture?

Big Data

Big Data Big Data Apache Kafka Database

Getting started with Kafka client metrics

IBM Journey to AI blog

MARCH 14, 2024

Apache Kafka stands as a widely recognized open source event store and stream processing platform. One key advantage of opting for managed Kafka services is the delegation of responsibility for broker and operational metrics, allowing users to focus solely on metrics specific to applications.

Apache Kafka

Apache Kafka Data Pipeline

Five scalability pitfalls to avoid with your Kafka application

IBM Journey to AI blog

NOVEMBER 9, 2023

Apache Kafka is a high-performance, highly scalable event streaming platform. To unlock Kafka’s full potential, you need to carefully consider the design of your application. It’s all too easy to write Kafka applications that perform poorly or eventually hit a scalability brick wall.

Apache Kafka

Apache Kafka Algorithm Clustering

Real-time artificial intelligence and event processing

IBM Journey to AI blog

NOVEMBER 29, 2023

IBM Event Automation is a fully composable solution, built on open technologies, with capabilities for: Event streaming : Collect and distribute raw streams of real-time business events with enterprise-grade Apache Kafka. Event endpoint management : Describe and document events easily according to the Async API specification.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Apache Kafka AI

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

To learn more, see the blog post , watch the introductory video , or see the documentation. To learn more about the beta offering, see Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink.

AWS

AWS ML ML Data Quality

Real-time fraud detection using AWS serverless and machine learning services

AWS Machine Learning Blog

MARCH 10, 2023

The same architecture applies if you use Amazon Managed Streaming for Apache Kafka (Amazon MSK) as a data streaming service. This pattern can be useful for real-time fraud detection, notification, and potential prevention. Example use cases for this could be payment processing or high-volume account creation.

Machine Learning

Machine Learning Machine Learning AWS Apache Kafka

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

AWS Machine Learning Blog

APRIL 19, 2023

Streaming ingestion – An Amazon Kinesis Data Analytics for Apache Flink application backed by Apache Kafka topics in Amazon Managed Streaming for Apache Kafka (MSK) (Amazon MSK) calculates aggregated features from a transaction stream, and an AWS Lambda function updates the online feature store.

ML

ML ML Apache Kafka SQL

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

AWS Machine Learning Blog

SEPTEMBER 11, 2024

It initially sources input time series data from Amazon Managed Streaming for Apache Kafka (Amazon MSK) using this live stream for model training. The application, once deployed, constructs an ML model using the Random Cut Forest (RCF) algorithm. Post-training, the model continues to process incoming data points from the stream.

AWS

AWS ML ML Apache Kafka

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

AWS Machine Learning Blog

NOVEMBER 3, 2023

m How it’s implemented In our quest to accurately determine shot speed during live matches, we’ve implemented a cutting-edge solution using Amazon Managed Streaming for Apache Kafka (Amazon MSK). Example 1 Measured with top shot speed 118.43 km/h with a distance to goal of 20.61 m Example 2 Measured with top shot speed 123.32

AWS

AWS Apache Kafka Data Scientist Data Science

Know Before You Go: Precisely at Confluent’s Current 2023

Precisely

SEPTEMBER 12, 2023

Precisely data integrity solutions fuel your Confluent and Apache Kafka streaming data pipelines with trusted data that has maximum accuracy, consistency, and context and we’re ready to share more with you at the upcoming Current 2023. Let’s cover some additional information to know before attending.

Data Silos

Data Silos Apache Kafka Data Pipeline Data Quality

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Pickl AI

SEPTEMBER 18, 2024

This blog explores how Netflix applies Big Data across its business operations, focusing on its infrastructure, content strategies, customer engagement, operational efficiency, marketing insights, security measures, and future challenges. Data at Rest This includes storage solutions such as S3 Data Warehouse and Cassandra.

Big Data

Big Data Big Data Apache Kafka Big Data Analytics

Unlock the knowledge in your Slack workspace with Slack connector for Amazon Q Business

AWS Machine Learning Blog

OCTOBER 9, 2024

I am currently using Apache Kafka. Learn more about this feature in the AWS Machine Learning blog. Conclusion This blog post provides a step-by-step guide on setting up the Slack connector for Amazon Q Business, enabling you to seamlessly integrate data from your Slack workspace. My connector is unable to sync.

AWS

AWS Apache Kafka Data Scientist Database Administration

Why your event-driven architecture needs advanced event governance

IBM Journey to AI blog

AUGUST 22, 2024

In recognizing the benefits of event-driven architectures, many companies have turned to Apache Kafka for their event streaming needs. Apache Kafka enables scalable, fault-tolerant and real-time processing of streams of data—but how do you manage and properly utilize the sheer amount of data your business ingests every second?

EDA

EDA Apache Kafka Clustering

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

Spark, Tensorflow, Apache Kafka, et cetera, are all out found in cloud databases,” points out Jones. Subscribe to Alation's Blog. But with the cloud, you can take a small project and test it out on new platforms with a smaller budget to start. You can] see that it works before going all-in.”. appeared first on Alation.

Big Data

Big Data Big Data Apache Kafka Data Lakes

IBM continues to support OpenSource AsyncAPI in breaking the boundaries of event driven architectures

IBM Journey to AI blog

JULY 12, 2024

With its intuitive UI, it makes it easy to produce a valid AsyncAPI document for any Kafka cluster or system that adheres to the Apache Kafka protocol. One of the key benefits of event endpoint management is that it allows you to describe events in a standardized way according to the AysncAPI specification.

Apache Kafka

Apache Kafka Clustering

What is Data Ingestion? Understanding the Basics

Pickl AI

JULY 25, 2024

In this blog, we’ll delve into the intricacies of data ingestion, exploring its challenges, best practices, and the tools that can help you harness the full potential of your data. Apache Kafka An open-source platform designed for real-time data streaming. What are Some Popular Data Ingestion Tools?

Apache Kafka

Apache Kafka Data Lakes Data Warehouse Data Quality

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

AWS Machine Learning Blog

MARCH 30, 2023

To ensure real-time updates of ball recovery times, we have implemented Amazon Managed Streaming for Apache Kafka (Amazon MSK) as a central solution for data streaming and messaging. This allows for seamless communication of positional data and various outputs of Bundesliga Match Facts between containers in real time.

AWS

AWS Machine Learning Machine Learning Apache Kafka

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

Then the events are ingested into TR’s centralized streaming platform, which is built on top of Amazon Managed Streaming for Kafka (Amazon MSK). Amazon MSK makes it easy to ingest and process streaming data in real time with fully managed Apache Kafka. About the Authors.

AWS

AWS Data Warehouse ML ML

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

AWS Machine Learning Blog

MARCH 30, 2023

For every xSaves prediction, it produces a message with the prediction as a payload, which then gets distributed by a central message broker running on Amazon Managed Streaming for Apache Kafka (Amazon MSK). The information also gets stored in a data lake for future auditing and model improvements.

Machine Learning

Machine Learning Machine Learning AWS ML

Predicting the Future of Data Science

Pickl AI

DECEMBER 4, 2024

This blog explores the current state of Data Science, emerging trends, the role of generative AI, decision-making enhancements, ethical challenges, essential skills for future Data Scientists, and predictions for the next decade. Apache Kafka), organisations can now analyse vast amounts of data as it is generated.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

What to Expect from Open-Source Data Infrastructure in 2023

Dataversity

JANUARY 12, 2023

Open-source technologies will become even more prominent within enterprises’ data architecture over the coming year, driven by the stark budgetary advantages combined with some of the newest enterprise-friendly capabilities added to several solutions. Here are three predictions for the open-source data infrastructure space in 2023: 1.

Apache Kafka

Apache Kafka Database

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. This blog explains how to build data pipelines and provides clear steps and best practices. Must Read Blogs: Elevate Your Data Quality: Unleashing the Power of AI and ML for Scaling Operations.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

Introduction to Apache NiFi and Its Architecture

Pickl AI

JULY 30, 2024

This blog delves into the fundamentals of Apache NiFi, its architecture, and how it can leverage for effective data flow management. What is Apache NiFi? Apache NiFi is a robust data integration tool that facilitates the automation of data flows between different systems.

ETL

ETL Data Lakes Big Data Big Data

Training Models on Streaming Data [Practical Guide]

The MLOps Blog

FEBRUARY 5, 2023

There are a number of tools that can help with streaming data collection and processing, some popular ones include: Apache Kafka : An open-source, distributed event streaming platform that can handle millions of events per second. It can be used to collect, store, and process streaming data in real-time.

Machine Learning

Machine Learning Machine Learning Data Pipeline Apache Kafka

Big Data Syllabus: A Comprehensive Overview

Pickl AI

AUGUST 9, 2024

This blog aims to provide a comprehensive overview of a typical Big Data syllabus, covering essential topics that aspiring data professionals should master. Data Streaming Learning about real-time data collection methods using tools like Apache Kafka and Amazon Kinesis.

Big Data

Big Data Big Data Big Data Analytics Big Data Analytics

Developing & Processing Real-Time Data Stream Applications

The Data Administration Newsletter

FEBRUARY 15, 2022

Most large technology businesses collect data from their consumers in a variety of methods, and the majority of the time, this data is in its raw form. However, when data is presented in an understandable and accessible style, it may assist and drive business requirements. The task is to process the data and, if required, […].

Apache Kafka

Apache Kafka Big Data Big Data Business Intelligence

Architecting Real-Time Analytics for Speed and Scale

Dataversity

JUNE 30, 2023

In today’s fast-paced world, the concept of patience as a virtue seems to be fading away, as people no longer want to wait for anything. If Netflix takes too long to load or the nearest Lyft is too far, users are quick to switch to alternative options.

Analytics

Analytics Analytics Apache Kafka Database

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

Typical examples include: Airbyte Talend Apache Kafka Apache Beam Apache Nifi While getting control over the process is an ideal position an organization wants to be in, the time and effort needed to build such systems are immense and frequently exceeds the license fee of a commercial offering.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

DagsHub

JANUARY 14, 2025

This data proliferates across websites, blogs, and social media primarily via automated content creation, SEO-optimized spun text, chatbot interactions, and similar systems. For in depth knowledge, please refer to this blog post. Tools like Apache Kafka and Apache Flink can be configured for this purpose.

Machine Learning

Machine Learning Machine Learning Clustering Algorithm

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

This blog will answer these questions by exploring the following: 1 What is pipeline architecture and design consideration, and what are the advantages of understanding it? Apache Kafka, Amazon Kinesis) 2 Data Preprocessing (e.g., References Netflix Tech Blog: Meson Workflow Orchestration for Netflix Recommendations Netflix.

ML

ML ML Machine Learning Machine Learning

Major Differences: Kafka vs RabbitMQ

Pickl AI

MARCH 13, 2025

Two of the most popular message brokers are RabbitMQ and Apache Kafka. In this blog, we will explore RabbitMQ vs Kafka, their key differences, and when to use each. Understanding Apache Kafka Apache Kafka is an open-source system designed to handle real-time data streaming.

Apache Kafka

Apache Kafka Big Data Big Data Data Pipeline

Exploring Partitions and Consumer Groups in Apache Kafka

Supernovas, Black Holes and Streaming Data

Webinars

Trending Sources

Maximizing your event-driven architecture investments: Unleashing the power of Apache Kafka with IBM Event Automation

Webinars

Apache Kafka and Apache Flink: An open-source match made in heaven

Apache Kafka use cases: Driving innovation across diverse industries

Streaming Machine Learning Without a Data Lake

The winning combination for real-time insights: Messaging and event-driven architecture

Accelerate your speed of business with IBM Event Automation

Level up your Kafka applications with schemas

Event-driven architecture (EDA) enables a business to become more aware of everything that’s happening, as it’s happening

Big data engineering simplified: Exploring roles of distributed systems

How to Unlock Real-Time Analytics with Snowflake?

Apache Flink for all: Making Flink consumable across all areas of your business

Big Data – Lambda or Kappa Architecture?

Getting started with Kafka client metrics

Five scalability pitfalls to avoid with your Kafka application

Real-time artificial intelligence and event processing

Transitioning off Amazon Lookout for Metrics

Real-time fraud detection using AWS serverless and machine learning services

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

Know Before You Go: Precisely at Confluent’s Current 2023

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Unlock the knowledge in your Slack workspace with Slack connector for Amazon Q Business

Why your event-driven architecture needs advanced event governance

Did Big Data Deliver Business Transformation & Improved CX?

IBM continues to support OpenSource AsyncAPI in breaking the boundaries of event driven architectures

What is Data Ingestion? Understanding the Basics

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

Predicting the Future of Data Science

What to Expect from Open-Source Data Infrastructure in 2023

Build Data Pipelines: Comprehensive Step-by-Step Guide

Introduction to Apache NiFi and Its Architecture

Training Models on Streaming Data [Practical Guide]

Big Data Syllabus: A Comprehensive Overview

Developing & Processing Real-Time Data Stream Applications

Architecting Real-Time Analytics for Speed and Scale

Comparing Tools For Data Processing Pipelines

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Major Differences: Kafka vs RabbitMQ

Stay Connected