Apache Kafka and Data Engineer - Data Science Current

Apache Kafka

Data Engineer

Apache Kafka: A Metaphorical Introduction to Event Streaming for Data Scientists and Data Engineers

Analytics Vidhya

NOVEMBER 2, 2020

Overview Learn about viewing data as streams of immutable events in contrast to mutable containers Understand how Apache Kafka captures real-time data through event. The post Apache Kafka: A Metaphorical Introduction to Event Streaming for Data Scientists and Data Engineers appeared first on Analytics Vidhya.

Apache Kafka

Apache Kafka Data Scientist Data Engineering Data Engineering

Apache Kafka Architecture and Use Cases Explained

Analytics Vidhya

JULY 22, 2022

Introduction The big data industry is growing daily and needs tools to process vast volumes of data. That’s why you need to know about Apache Kafka, a publish-subscribe messaging system you can use to build distributed applications. It is scalable and fault-tolerant, making […].

Apache Kafka

Apache Kafka Big Data Big Data Data Science

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Apache Kafka Use Cases and Installation Guide

Analytics Vidhya

OCTOBER 3, 2022

The post Apache Kafka Use Cases and Installation Guide appeared first on Analytics Vidhya. As applications cover more aspects of our daily lives, it is increasingly difficult to provide users with a quick response. Source: kafka.apache.org Caching is used to solve […].

Apache Kafka

Apache Kafka Data Science Analytics Analytics

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Exploring Partitions and Consumer Groups in Apache Kafka

Analytics Vidhya

AUGUST 2, 2022

Introduction Earlier, I had introduced basic concepts of Apache Kafka in my blog on Analytics Vidhya(link is available under references). This article introduced concepts involved in Apache Kafka and further built the understanding by using the python API of Kafka to write some […].

Apache Kafka

Apache Kafka Data Science Python Analytics

Introduction to Apache Kafka: Fundamentals and Working

Analytics Vidhya

DECEMBER 30, 2022

The post Introduction to Apache Kafka: Fundamentals and Working appeared first on Analytics Vidhya. Introduction Have you ever wondered how Instagram recommends similar kinds of reels while you are scrolling through your feed or ad recommendations for similar products that you were browsing on Amazon?

Apache Kafka

Apache Kafka Data Science Analytics Analytics

A Detailed Guide of Interview Questions on Apache Kafka

Analytics Vidhya

APRIL 28, 2023

Introduction Apache Kafka is an open-source publish-subscribe messaging application initially developed by LinkedIn in early 2011. It is a famous Scala-coded data processing tool that offers low latency, extensive throughput, and a unified platform to handle the data in real-time.

Apache Kafka

Apache Kafka Analytics Analytics Hadoop

How to Build a Scalable Data Architecture with Apache Kafka

KDnuggets

APRIL 5, 2023

Learn about Apache Kafka architecture and its implementation using a real-world use case of a taxi booking app.

Apache Kafka

Apache Kafka Data Engineering Data Engineering Data Engineer

9 Must-Have Skills to Become a Data Engineer!

Analytics Vidhya

DECEMBER 4, 2020

Overview Know which are the top 9 skills required to be a data engineer Find suitable resources to learn about these tools By no. The post 9 Must-Have Skills to Become a Data Engineer! appeared first on Analytics Vidhya.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Build a Simple Realtime Data Pipeline

Analytics Vidhya

SEPTEMBER 22, 2022

This article was published as a part of the Data Science Blogathon. Dale Carnegie” Apache Kafka is a Software Framework for storing, reading, and analyzing streaming data. The post Build a Simple Realtime Data Pipeline appeared first on Analytics Vidhya. Introduction “Learning is an active process.

Data Pipeline

Data Pipeline Apache Kafka Internet of Things Data Science

Big data engineering simplified: Exploring roles of distributed systems

Data Science Dojo

JULY 24, 2023

They allow data processing tasks to be distributed across multiple machines, enabling parallel processing and scalability. It involves various technologies and techniques that enable efficient data processing and retrieval. Stay tuned for an insightful exploration into the world of Big Data Engineering with Distributed Systems!

Big Data

Big Data Big Data Data Engineering Data Engineer

Apache Kafka and Apache Flink: An open-source match made in heaven

IBM Journey to AI blog

NOVEMBER 3, 2023

It allows your business to ingest continuous data streams as they happen and bring them to the forefront for analysis, enabling you to keep up with constant changes. Apache Kafka boasts many strong capabilities, such as delivering a high throughput and maintaining a high fault tolerance in the case of application failure.

Apache Kafka

Apache Kafka Data Warehouse Data Pipeline Big Data

How data engineers tame Big Data?

Dataconomy

FEBRUARY 23, 2023

Data engineers play a crucial role in managing and processing big data. They are responsible for designing, building, and maintaining the infrastructure and tools needed to manage and process large volumes of data effectively. What is data engineering?

Big Data

Big Data Big Data Data Engineering Data Engineer

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

Data engineering has become an integral part of the modern tech landscape, driving advancements and efficiencies across industries. So let’s explore the world of open-source tools for data engineers, shedding light on how these resources are shaping the future of data handling, processing, and visualization.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Mlearning.ai

MAY 16, 2023

Data engineering is a rapidly growing field that designs and develops systems that process and manage large amounts of data. There are various architectural design patterns in data engineering that are used to solve different data-related problems. BECOME a WRITER at MLearning.ai.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Navigating the Big Data Frontier: A Guide to Efficient Handling

Women in Big Data

OCTOBER 9, 2024

Refer to Unlocking the Power of Big Data Article to understand the use case of these data collected from various sources. Data Ingestion: Data is collected and funneled into the pipeline using batch or real-time methods, leveraging tools like Apache Kafka, AWS Kinesis, or custom ETL scripts.

Big Data

Big Data Big Data Apache Kafka Data Pipeline

Big Data – Lambda or Kappa Architecture?

Data Science Blog

JUNE 27, 2023

This architectural concept relies on event streaming as the core element of data delivery. In practical implementation, the Kappa architecture is commonly deployed using Apache Kafka or Kafka-based tools. Applications can directly read from and write to Kafka or an alternative message queue tool.

Big Data

Big Data Big Data Apache Kafka Database

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

AWS Machine Learning Blog

APRIL 19, 2023

Streaming ingestion – An Amazon Kinesis Data Analytics for Apache Flink application backed by Apache Kafka topics in Amazon Managed Streaming for Apache Kafka (MSK) (Amazon MSK) calculates aggregated features from a transaction stream, and an AWS Lambda function updates the online feature store.

ML ML Apache Kafka SQL

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

With ML-powered anomaly detection, customers can find outliers in their data without the need for manual analysis, custom development, or ML domain expertise. Using Amazon Glue Data Quality for anomaly detection Data engineers and analysts can use AWS Glue Data Quality to measure and monitor their data.

AWS

AWS ML ML Data Quality

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

ODSC - Open Data Science

OCTOBER 9, 2024

Efficient Incremental Processing with Apache Iceberg and Netflix Maestro Dimensional Data Modeling in the Modern Era Building Big Data Workflows: NiFi, Hive, Trino, & Zeppelin An Introduction to Data Contracts From Data Mess to Data Mesh — Data Management in the Age of Big Data and Gen AI Introduction to Containers for Data Science / Data Engineering (..)

Apache Kafka

Apache Kafka AI AI Machine Learning

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

General Purpose Tools These tools help manage the unstructured data pipeline to varying degrees, with some encompassing data collection, storage, processing, analysis, and visualization. DagsHub's Data Engine DagsHub's Data Engine is a centralized platform for teams to manage and use their datasets effectively.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Training Models on Streaming Data [Practical Guide]

The MLOps Blog

FEBRUARY 5, 2023

The machine learning model is part of the Stream processing engine, and it provides the logic that helps the streaming data pipeline expose features within the stream and potentially within a historical data store. It can be used to collect, store, and process streaming data in real-time.

Machine Learning

Machine Learning Machine Learning Data Pipeline Apache Kafka

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

DagsHub

JANUARY 14, 2025

Uncertain examples are chosen for expert labelling and then fed-back into the training dataset to undergo additional active learning iterations, while the trained model generates duplicate/non-duplicate predictions on unlabeled data. Tools like Apache Kafka and Apache Flink can be configured for this purpose.

Machine Learning

Machine Learning Machine Learning Clustering Algorithm

Memphis: A game changer in the world of traditional messaging systems

Data Science Dojo

MARCH 9, 2023

It requires minimal operational maintenance and allows for rapid development, resulting in significant cost savings and reduced development time for data-focused developers and engineers. Handling too many data sources can become overwhelming, especially with complex schemas.

Apache Kafka

Apache Kafka Azure Data Science Data Pipeline

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Technologies like Apache Kafka, often used in modern CDPs, use log-based approaches to stream customer events between systems in real-time. Activity Schema Processing : To capture and process customer activities, you might use a stream processing technology like Apache Kafka or Apache Flink.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

1 Data Ingestion (e.g., Apache Kafka, Amazon Kinesis) 2 Data Preprocessing (e.g., pandas, NumPy) 3 Feature Engineering and Selection (e.g., 1 Data Ingestion (e.g., Apache Kafka, Amazon Kinesis) 2 Data Preprocessing (e.g., pandas, NumPy) 3 Feature Engineering and Selection (e.g.,

ML ML Machine Learning Machine Learning

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

AWS Machine Learning Blog

APRIL 18, 2025

Solution overview: Build a generative AI stock price analyzer with RAG For this post, we implement a RAG architecture with Amazon Bedrock Knowledge Bases using a custom connector and topics built with Amazon Managed Streaming for Apache Kafka (Amazon MSK) for a user who may be interested to understand stock price trends.

Apache Kafka

Apache Kafka AWS Clustering Database

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Tools like Python, SQL, Apache Spark, and Snowflake help engineers automate workflows and improve efficiency. Learning these tools is crucial for building scalable data pipelines. Thats where data engineering tools come in!

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Major Differences: Kafka vs RabbitMQ

Pickl AI

MARCH 13, 2025

Two of the most popular message brokers are RabbitMQ and Apache Kafka. In this blog, we will explore RabbitMQ vs Kafka, their key differences, and when to use each. IoT applications : Collecting and distributing sensor data from connected devices. Thats where message brokers come in.

Apache Kafka

Apache Kafka Big Data Big Data Data Pipeline

Apache Kafka: A Metaphorical Introduction to Event Streaming for Data Scientists and Data Engineers

Apache Kafka Architecture and Use Cases Explained

Webinars

Trending Sources

Apache Kafka Use Cases and Installation Guide

Webinars

Exploring Partitions and Consumer Groups in Apache Kafka

Introduction to Apache Kafka: Fundamentals and Working

A Detailed Guide of Interview Questions on Apache Kafka

How to Build a Scalable Data Architecture with Apache Kafka

9 Must-Have Skills to Become a Data Engineer!

Build a Simple Realtime Data Pipeline

Big data engineering simplified: Exploring roles of distributed systems

Apache Kafka and Apache Flink: An open-source match made in heaven

How data engineers tame Big Data?

11 Open-Source Data Engineering Tools Every Pro Should Use

Discover the Most Important Fundamentals of Data Engineering

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Navigating the Big Data Frontier: A Guide to Efficient Handling

Big Data – Lambda or Kappa Architecture?

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

Transitioning off Amazon Lookout for Metrics

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

How to Manage Unstructured Data in AI and Machine Learning Projects

Training Models on Streaming Data [Practical Guide]

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

Memphis: A game changer in the world of traditional messaging systems

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

Best Data Engineering Tools Every Engineer Should Know

Major Differences: Kafka vs RabbitMQ

Stay Connected