Apache Kafka, AWS and Events - Data Science Current

Apache Kafka

AWS

Events

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. 5 Key Comparisons in Different Apache Kafka Architectures. 5 Key Comparisons in Different Apache Kafka Architectures.

Apache Kafka

Apache Kafka ETL Data Lakes AWS

Real-time fraud detection using AWS serverless and machine learning services

AWS Machine Learning Blog

MARCH 10, 2023

We show how you can apply this approach to various data streaming and event-driven architectures, depending on the desired outcome and actions to take to prevent fraud (such as alert the user about the fraud or flag the transaction for additional review). Example use cases for this could be payment processing or high-volume account creation.

Machine Learning

Machine Learning Machine Learning AWS Apache Kafka

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Apache Kafka use cases: Driving innovation across diverse industries

IBM Journey to AI blog

SEPTEMBER 4, 2024

Apache Kafka is an open-source , distributed streaming platform that allows developers to build real-time, event-driven applications. With Apache Kafka, developers can build applications that continuously use streaming data records and deliver real-time experiences to users. How does Apache Kafka work?

Apache Kafka

Apache Kafka Internet of Things Data Pipeline Clustering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

AWS Machine Learning Blog

NOVEMBER 3, 2023

This process comprises two key components: event data and optical tracking data. Event data collection entails gathering the fundamental building blocks of the game. For the precision needed in shot speed calculations, we must ensure that the ball’s position aligns precisely with the moment of the event.

AWS

AWS Apache Kafka Data Scientist Data Science

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

AWS Machine Learning Blog

SEPTEMBER 11, 2024

In this post, we demonstrate how to build a robust real-time anomaly detection solution for streaming time series data using Amazon Managed Service for Apache Flink and other AWS managed services. It offers an AWS CloudFormation template for straightforward deployment in an AWS account.

AWS

AWS ML ML Apache Kafka

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

TR wanted to take advantage of AWS managed services where possible to simplify operations and reduce undifferentiated heavy lifting. TR used AWS Glue DataBrew and AWS Batch jobs to perform the extract, transform, and load (ETL) jobs in the ML pipelines, and SageMaker along with Amazon Personalize to tailor the recommendations.

AWS

AWS Data Warehouse ML ML

Big data engineering simplified: Exploring roles of distributed systems

Data Science Dojo

JULY 24, 2023

Amazon S3: Amazon Simple Storage Service (S3) is a scalable object storage service provided by Amazon Web Services (AWS). Unlike traditional batch processing, where data is processed in fixed intervals, stream processing enables organizations to gain insights and respond to events as they happen in real-time.

Big Data

Big Data Big Data Data Engineering Data Engineer

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

AWS Machine Learning Blog

APRIL 19, 2023

Streaming ingestion – An Amazon Kinesis Data Analytics for Apache Flink application backed by Apache Kafka topics in Amazon Managed Streaming for Apache Kafka (MSK) (Amazon MSK) calculates aggregated features from a transaction stream, and an AWS Lambda function updates the online feature store.

ML ML Apache Kafka SQL

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

Apache Kafka For data engineers dealing with real-time data, Apache Kafka is a game-changer. Interested in attending an ODSC event? Learn more about our upcoming events here. Each platform offers unique features and benefits, making it vital for data engineers to understand their differences.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Pickl AI

SEPTEMBER 18, 2024

It utilises Amazon Web Services (AWS) as its main data lake, processing over 550 billion events daily—equivalent to approximately 1.3 Data in Motion Technologies like Apache Kafka facilitate real-time processing of events and data, allowing Netflix to respond swiftly to user interactions and operational needs.

Big Data

Big Data Big Data Apache Kafka Big Data Analytics

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Among these tools, Apache Hadoop, Apache Spark, and Apache Kafka stand out for their unique capabilities and widespread usage. Apache Hadoop Hadoop is a powerful framework that enables distributed storage and processing of large data sets across clusters of computers.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Top 15 Data Analytics Projects in 2023 for beginners to Experienced

Pickl AI

JULY 20, 2023

Diagnostic Analytics Projects: Diagnostic analytics seeks to determine the reasons behind specific events or patterns observed in the data. 3. Predictive Analytics Projects: Predictive analytics involves using historical data to predict future events or outcomes. Root cause analysis is a typical diagnostic analytics task.

Analytics

Analytics Analytics Big Data Big Data

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Apache Kafka Apache Kafka is a distributed event streaming platform for real-time data pipelines and stream processing. Kafka is highly scalable and ideal for high-throughput and low-latency data pipeline applications. It allows unstructured data to be moved and processed easily between systems.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

Flexibility: Airflow was designed with batch workflows in mind; it was not meant for permanently running event-based workflows. Also, while it is not a streaming solution, we can still use it for such a purpose if combined with systems such as Apache Kafka. Miscellaneous Workflows are created as directed acyclic graphs (DAGs).

Machine Learning

Machine Learning Machine Learning ML ML

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

Data Ingestion : Involves raw data collection from origin and storage using architectures such as batch, streaming or event-driven. Pricing Up to a million events/month on the free plan. Up to 100 million events/month and a 14-day trial for the starter plan. High CPU requirement at data destination for data loading operations.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

AWS Machine Learning Blog

MARCH 30, 2023

Bundesliga and AWS have collaborated to perform an in-depth examination to study the quantification of achievements of Bundesliga’s keepers. How Keeper Efficiency is implemented This Bundesliga Match Fact consumes both event and positional data. This frame is used to synchronize the event data with the positional data.

Machine Learning

Machine Learning Machine Learning AWS ML

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

Apache Kafka, Amazon Kinesis) 2 Data Preprocessing (e.g., As usage increased, the system had to be scaled vertically, approaching AWS instance-type limits. These include shared-nothing architecture, event-driven architecture, and directed acyclic graphs (DAGs). 1 Data Ingestion (e.g.,

ML ML Machine Learning Machine Learning

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

AWS Machine Learning Blog

APRIL 18, 2025

Solution overview: Build a generative AI stock price analyzer with RAG For this post, we implement a RAG architecture with Amazon Bedrock Knowledge Bases using a custom connector and topics built with Amazon Managed Streaming for Apache Kafka (Amazon MSK) for a user who may be interested to understand stock price trends.

Apache Kafka

Apache Kafka AWS Clustering Database

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

AWS Machine Learning Blog

FEBRUARY 7, 2025

In this post, we dive deep into how CONXAI hosts the state-of-the-art OneFormer segmentation model on AWS using Amazon Simple Storage Service (Amazon S3), Amazon Elastic Kubernetes Service (Amazon EKS), KServe, and NVIDIA Triton. Our journey to AWS Initially, CONXAI started with a small cloud provider specializing in offering affordable GPUs.

Analytics

Analytics Analytics AWS Clustering

Top Big Data Tools Every Data Professional Should Know

Pickl AI

FEBRUARY 23, 2025

Best Big Data Tools Popular tools such as Apache Hadoop, Apache Spark, Apache Kafka, and Apache Storm enable businesses to store, process, and analyse data efficiently. Statistics : According to AWS reports, EMR reduces the time required for Big Data processing tasks by up to 90% compared to traditional methods.

Big Data

Big Data Big Data Apache Hadoop Apache Kafka

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Python, SQL, and Apache Spark are essential for data engineering workflows. Real-time data processing with Apache Kafka enables faster decision-making. Apache Spark Apache Spark is a powerful data processing framework that efficiently handles Big Data. Which cloud-based data engineering tools are most popular?

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Real-time fraud detection using AWS serverless and machine learning services

Webinars

Trending Sources

Apache Kafka use cases: Driving innovation across diverse industries

Webinars

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Big data engineering simplified: Exploring roles of distributed systems

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

11 Open-Source Data Engineering Tools Every Pro Should Use

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Discover the Most Important Fundamentals of Data Engineering

Top 15 Data Analytics Projects in 2023 for beginners to Experienced

How to Manage Unstructured Data in AI and Machine Learning Projects

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

Comparing Tools For Data Processing Pipelines

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

Top Big Data Tools Every Data Professional Should Know

Best Data Engineering Tools Every Engineer Should Know

Stay Connected