Apache Kafka and Article - Data Science Current

Apache Kafka Architecture and Use Cases Explained

Analytics Vidhya

JULY 22, 2022

This article was published as a part of the Data Science Blogathon. That’s why you need to know about Apache Kafka, a publish-subscribe messaging system you can use to build distributed applications. The post Apache Kafka Architecture and Use Cases Explained appeared first on Analytics Vidhya.

Apache Kafka

Apache Kafka Big Data Big Data Data Science

Handling Streaming Data with Apache Kafka – A First Look

Analytics Vidhya

JUNE 21, 2022

This article was published as a part of the Data Science Blogathon. The post Handling Streaming Data with Apache Kafka – A First Look appeared first on Analytics Vidhya. The post Handling Streaming Data with Apache Kafka – A First Look appeared first on Analytics Vidhya.

Apache Kafka

Apache Kafka Data Science Analytics Analytics

Apache Kafka Use Cases and Installation Guide

Analytics Vidhya

OCTOBER 3, 2022

This article was published as a part of the Data Science Blogathon. The post Apache Kafka Use Cases and Installation Guide appeared first on Analytics Vidhya. Introduction Today, we expect web applications to respond to user queries quickly, if not immediately. Source: kafka.apache.org Caching is used to solve […].

Apache Kafka

Apache Kafka Data Science Analytics Analytics

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Exploring Partitions and Consumer Groups in Apache Kafka

Analytics Vidhya

AUGUST 2, 2022

This article was published as a part of the Data Science Blogathon. Introduction Earlier, I had introduced basic concepts of Apache Kafka in my blog on Analytics Vidhya(link is available under references). The post Exploring Partitions and Consumer Groups in Apache Kafka appeared first on Analytics Vidhya.

Apache Kafka

Apache Kafka Data Science Python Analytics

Introduction to Apache Kafka: Fundamentals and Working

Analytics Vidhya

DECEMBER 30, 2022

This article was published as a part of the Data Science Blogathon. The post Introduction to Apache Kafka: Fundamentals and Working appeared first on Analytics Vidhya. All these sites use some event streaming tool to monitor user activities. […]. . […].

Apache Kafka

Apache Kafka Data Science Analytics Analytics

Creating a Data Science Pipeline for Real-Time Analytics Using Apache Kafka and Spark

KDnuggets

APRIL 1, 2025

This article explains how to create a system that processes data in real time using Apache Kafka and Spark.

Apache Kafka

Apache Kafka Data Science Analytics Analytics

Build a Simple Realtime Data Pipeline

Analytics Vidhya

SEPTEMBER 22, 2022

This article was published as a part of the Data Science Blogathon. Dale Carnegie” Apache Kafka is a Software Framework for storing, reading, and analyzing streaming data. Introduction “Learning is an active process. We learn by doing. Only knowledge that is used sticks in your mind.-

Data Pipeline

Data Pipeline Apache Kafka Internet of Things Data Science

Behind AWS S3's Scale

Hacker News

AUGUST 30, 2024

This is a guest article by Stanislav Kozlovski, an Apache Kafka Committer. If you would like to connect with Stanislav, you can do so on Twitter and LinkedIn. AWS S3 is a service every engineer is familiar with. It’s the service that popularized the notion of cold-storage to the

Apache Kafka

Apache Kafka AWS

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

Real-Time Sentiment Analysis with Kafka and PySpark

Towards AI

FEBRUARY 29, 2024

Within this article, we will explore the significance of these pipelines and utilise robust tools such as Apache Kafka and Spark to manage vast streams of data efficiently. Apache Kafka Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and streaming applications.

Apache Kafka

Apache Kafka SQL Clustering Data Pipeline

The Rise of Streaming Data and Its Cost Efficiency – How Did We Get Here?

insideBIGDATA

JUNE 25, 2024

In this contributed article, Sijie Guo, Founder and CEO of Streamnative, believes that with remote work entrenched in the post-pandemic enterprise, organizations are restructuring their technology stack and software strategy for a new, distributed workforce.

Apache Kafka

Apache Kafka Big Data Big Data Analytics

The Rise of Streaming Data and Its Cost Efficiency – How Did We Get Here?

insideBIGDATA

JUNE 25, 2024

In this contributed article, Sijie Guo, Founder and CEO of Streamnative, believes that with remote work entrenched in the post-pandemic enterprise, organizations are restructuring their technology stack and software strategy for a new, distributed workforce.

Apache Kafka

Apache Kafka Big Data Big Data Analytics

Level up your Kafka applications with schemas

IBM Journey to AI blog

NOVEMBER 21, 2023

Apache Kafka is a well-known open-source event store and stream processing platform and has grown to become the de facto standard for data streaming. Apache Kafka transfers data without validating the information in the messages.

Apache Kafka

Apache Kafka Clustering Data Quality Data Governance

Event-driven architecture, Kafka and CDPs: Joining internal infrastructure with your tech stack

Twilio Segment

SEPTEMBER 7, 2021

Event streaming platforms such as Apache Kafka are gaining in importance across all industries. In this article we'll discuss the benefits Apache Kafka implementations can gain from pairing it with a CDP.

Apache Kafka

Navigating the Big Data Frontier: A Guide to Efficient Handling

Women in Big Data

OCTOBER 9, 2024

Refer to Unlocking the Power of Big Data Article to understand the use case of these data collected from various sources. Data Ingestion: Data is collected and funneled into the pipeline using batch or real-time methods, leveraging tools like Apache Kafka, AWS Kinesis, or custom ETL scripts.

Big Data

Big Data Big Data Apache Kafka Data Pipeline

Five scalability pitfalls to avoid with your Kafka application

IBM Journey to AI blog

NOVEMBER 9, 2023

Apache Kafka is a high-performance, highly scalable event streaming platform. To unlock Kafka’s full potential, you need to carefully consider the design of your application. It’s all too easy to write Kafka applications that perform poorly or eventually hit a scalability brick wall.

Apache Kafka

Apache Kafka Algorithm Clustering

Streaming Data Pipelines: What Are They and How to Build One

Precisely

DECEMBER 28, 2023

This article explores what streaming data pipelines are, how they work, and how to build this data pipeline architecture. One very popular platform is Apache Kafka , a powerful open-source tool used by thousands of companies. But in all likelihood, Kafka doesn’t natively connect with the applications that contain your data.

Data Pipeline

Data Pipeline Apache Kafka Big Data Big Data

Getting started with Kafka client metrics

IBM Journey to AI blog

MARCH 14, 2024

Apache Kafka stands as a widely recognized open source event store and stream processing platform. One key advantage of opting for managed Kafka services is the delegation of responsibility for broker and operational metrics, allowing users to focus solely on metrics specific to applications.

Apache Kafka

Apache Kafka Data Pipeline

Building a Pizza Delivery Service with a Real-Time Analytics Stack

ODSC - Open Data Science

JUNE 1, 2023

We’re going to assume that the pizza service already captures orders in Apache Kafka and is also keeping a record of its customers and the products that they sell in MySQL. Apache Pinot is a real-time OLAP database built at LinkedIn to deliver scalable real-time analytics with low latency. He tweets at @markhneedham.

Analytics

Analytics Analytics Apache Kafka Data Science

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

Apache Kafka For data engineers dealing with real-time data, Apache Kafka is a game-changer. Originally posted on OpenDataScience.com Read more data science articles on OpenDataScience.com , including tutorials and guides from beginner to advanced levels!

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Pictures and Highlights from ODSC Europe 2023

ODSC - Open Data Science

JULY 22, 2023

Originally posted on OpenDataScience.com Read more data science articles on OpenDataScience.com , including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Science

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

In this article, we’ll take stock of what big data has achieved from a c-suite perspective (with special attention to business transformation and customer experience.). Spark, Tensorflow, Apache Kafka, et cetera, are all out found in cloud databases,” points out Jones.

Big Data

Big Data Big Data Apache Kafka Data Lakes

Watch the Top ODSC Europe 2023 Virtual Sessions Here

ODSC - Open Data Science

JULY 14, 2023

Originally posted on OpenDataScience.com Read more data science articles on OpenDataScience.com , including tutorials and guides from beginner to advanced levels! The session participants will learn the theory behind compound sparsity, state-of-the-art techniques, and how to apply it in practice using the Neural Magic platform.

Machine Learning

Machine Learning Machine Learning Apache Kafka Data Science

Unveiling Developers’ Technologies and Tools Usage in Large and Small and Medium-sized Enterprises…

Mlearning.ai

AUGUST 4, 2023

Unveiling Developers’ Technologies and Tools Usage in Large and Small and Medium-sized Enterprises with ChatGPT In this article, I delve into an in-depth exploration and analysis of the 2023 StackOverflow Survey data to uncover the technologies and tools utilized by Developers by showcasing an interesting application of ChatGPT in programming tasks.

Database

Database Apache Kafka SQL AI

Exploring Database Management Systems in Social Media Giants

Pickl AI

OCTOBER 21, 2024

Summary: This article highlights the significance of Database Management Systems in social media giants, focusing on their functionality, types, challenges, and future trends that impact user experience and data management. One significant challenge Twitter faces is scaling its DBMS to accommodate its growing user base.

Database

Database Apache Kafka Machine Learning Machine Learning

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

This article explores the key fundamentals of Data Engineering, highlighting its significance and providing a roadmap for professionals seeking to excel in this vital field. Among these tools, Apache Hadoop, Apache Spark, and Apache Kafka stand out for their unique capabilities and widespread usage.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Training Models on Streaming Data [Practical Guide]

The MLOps Blog

FEBRUARY 5, 2023

In this article, we will go through the basics of streaming data, what it is, and how it differs from traditional data. In this article, our focus is on streaming data, but before we deal with it, it is important to understand how it differs from Batch data processing. This will also help us observe the importance of stream data.

Machine Learning

Machine Learning Machine Learning Data Pipeline Apache Kafka

What to Expect from Open-Source Data Infrastructure in 2023

Dataversity

JANUARY 12, 2023

Open-source technologies will become even more prominent within enterprises’ data architecture over the coming year, driven by the stark budgetary advantages combined with some of the newest enterprise-friendly capabilities added to several solutions. Here are three predictions for the open-source data infrastructure space in 2023: 1.

Apache Kafka

Apache Kafka Database

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

Adopted from [link] In this article, we will first briefly explain what ML workflows and pipelines are. By the end of this article, you will be able to identify the key characteristics of each of the selected orchestration tools and pick the one that is best suited for your use case! Programming language: Airflow is very versatile.

Machine Learning

Machine Learning Machine Learning ML ML

How data engineers tame Big Data?

Dataconomy

FEBRUARY 23, 2023

Some of these solutions include: Stream processing: Stream processing systems, such as Apache Kafka and Apache Flink, can help process high-speed data streams in real-time. If you want to learn more about data engineers, check out article called: “ Data is the new gold and the industry demands goldsmiths.”

Big Data

Big Data Big Data Data Engineer Data Engineering

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

DagsHub

JANUARY 14, 2025

This article is an attempt to delve into how duplicate data can affect machine learning models, and how it impacts their accuracy and other performance metrics. We hope you find this article thought-provoking! If you're interested in learning more about image augmentation, you might want to check out this article.

Machine Learning

Machine Learning Machine Learning Clustering Algorithm

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Mlearning.ai

MAY 16, 2023

This article discusses five commonly used architectural design patterns in data engineering and their use cases. The events can be published to a message broker such as Apache Kafka or Google Cloud Pub/Sub. There are various architectural design patterns in data engineering that are used to solve different data-related problems.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

This article will discuss managing unstructured data for AI and ML projects. Apache Kafka Apache Kafka is a distributed event streaming platform for real-time data pipelines and stream processing. Managing unstructured data is essential for the success of machine learning (ML) projects.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Top 15 Data Analytics Projects in 2023 for beginners to Experienced

Pickl AI

JULY 20, 2023

Text Analytics and Natural Language Processing (NLP) Projects: These projects involve analyzing unstructured text data, such as customer reviews, social media posts, emails, and news articles. NLP techniques help extract insights, sentiment analysis, and topic modeling from text data.

Analytics

Analytics Analytics Big Data Big Data

Developing & Processing Real-Time Data Stream Applications

The Data Administration Newsletter

FEBRUARY 15, 2022

Most large technology businesses collect data from their consumers in a variety of methods, and the majority of the time, this data is in its raw form. However, when data is presented in an understandable and accessible style, it may assist and drive business requirements. The task is to process the data and, if required, […].

Apache Kafka

Apache Kafka Big Data Big Data Business Intelligence

Architecting Real-Time Analytics for Speed and Scale

Dataversity

JUNE 30, 2023

In today’s fast-paced world, the concept of patience as a virtue seems to be fading away, as people no longer want to wait for anything. If Netflix takes too long to load or the nearest Lyft is too far, users are quick to switch to alternative options.

Analytics

Analytics Analytics Apache Kafka Database

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

Typical examples include: Airbyte Talend Apache Kafka Apache Beam Apache Nifi While getting control over the process is an ideal position an organization wants to be in, the time and effort needed to build such systems are immense and frequently exceeds the license fee of a commercial offering.

Data Pipeline

Data Pipeline ETL SQL Data Quality

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

Apache Kafka, Amazon Kinesis) 2 Data Preprocessing (e.g., Conclusion This article covered various aspects, including pipeline architecture, design considerations, standard practices in leading tech corporations, common patterns, and typical components of ML pipelines. 1 Data Ingestion (e.g., 2022, January 18).

ML

ML ML Machine Learning Machine Learning

Apache Kafka Architecture and Use Cases Explained

Handling Streaming Data with Apache Kafka – A First Look

Webinars

Trending Sources

Apache Kafka Use Cases and Installation Guide

Webinars

Exploring Partitions and Consumer Groups in Apache Kafka

Introduction to Apache Kafka: Fundamentals and Working

Creating a Data Science Pipeline for Real-Time Analytics Using Apache Kafka and Spark

Build a Simple Realtime Data Pipeline

Behind AWS S3's Scale

Streaming Machine Learning Without a Data Lake

Real-Time Sentiment Analysis with Kafka and PySpark

The Rise of Streaming Data and Its Cost Efficiency – How Did We Get Here?

The Rise of Streaming Data and Its Cost Efficiency – How Did We Get Here?

Level up your Kafka applications with schemas

Event-driven architecture, Kafka and CDPs: Joining internal infrastructure with your tech stack

Navigating the Big Data Frontier: A Guide to Efficient Handling

Five scalability pitfalls to avoid with your Kafka application

Streaming Data Pipelines: What Are They and How to Build One

Getting started with Kafka client metrics

Building a Pizza Delivery Service with a Real-Time Analytics Stack

11 Open-Source Data Engineering Tools Every Pro Should Use

Top Big Data Interview Questions for 2025

Pictures and Highlights from ODSC Europe 2023

Did Big Data Deliver Business Transformation & Improved CX?

Watch the Top ODSC Europe 2023 Virtual Sessions Here

Unveiling Developers’ Technologies and Tools Usage in Large and Small and Medium-sized Enterprises…

Exploring Database Management Systems in Social Media Giants

Discover the Most Important Fundamentals of Data Engineering

Training Models on Streaming Data [Practical Guide]

What to Expect from Open-Source Data Infrastructure in 2023

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

How data engineers tame Big Data?

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

How to Manage Unstructured Data in AI and Machine Learning Projects

Top 15 Data Analytics Projects in 2023 for beginners to Experienced

Developing & Processing Real-Time Data Stream Applications

Architecting Real-Time Analytics for Speed and Scale

Comparing Tools For Data Processing Pipelines

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Stay Connected