Analytics, Apache Kafka and AWS - Data Science Current

Analytics

Apache Kafka

AWS

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. 5 Key Comparisons in Different Apache Kafka Architectures. 5 Key Comparisons in Different Apache Kafka Architectures.

Apache Kafka

Apache Kafka ETL Data Lakes AWS

Apache Kafka use cases: Driving innovation across diverse industries

IBM Journey to AI blog

SEPTEMBER 4, 2024

Apache Kafka is an open-source , distributed streaming platform that allows developers to build real-time, event-driven applications. With Apache Kafka, developers can build applications that continuously use streaming data records and deliver real-time experiences to users. How does Apache Kafka work?

Apache Kafka

Apache Kafka Internet of Things Data Pipeline Clustering

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

Trending Sources

Real-time fraud detection using AWS serverless and machine learning services

AWS Machine Learning Blog

MARCH 10, 2023

The same architecture applies if you use Amazon Managed Streaming for Apache Kafka (Amazon MSK) as a data streaming service. You can use this metadata in your data analytics solutions, machine learning model training tasks, or visualizations and dashboards that consume transaction data. An example use case is claims processing.

Machine Learning

Machine Learning Machine Learning AWS Apache Kafka

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

What Are AI Credits and How Can Data Scientists Use Them?

ODSC - Open Data Science

APRIL 23, 2025

Confluent Confluent provides a robust data streaming platform built around Apache Kafka. Amazon Web Services(AWS) AWS offers one of the most extensive AI and ML infrastructures in the world. Access Production-Grade Infrastructure Cloud providers such as AWS and Azure allow you to simulate real-world deployment scenarios.

Data Scientist

Data Scientist Azure Apache Kafka ML

Machine Learning with MATLAB and Amazon SageMaker

Flipboard

NOVEMBER 21, 2023

In recent years, MathWorks has brought many product offerings into the cloud, especially on Amazon Web Services (AWS). Here is a quick guide on how to run MATLAB on AWS. Installation of AWS Command-Line Interface (AWS CLI) , AWS Configure , and Python3. Set up AWS Configure to interact with AWS resources.

Machine Learning

Machine Learning Machine Learning AWS Decision Trees

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

AWS Machine Learning Blog

SEPTEMBER 11, 2024

Common examples of time series data include sales revenue, system performance data (such as CPU utilization and memory usage), credit card transactions, sensor readings, and user activity analytics. It offers an AWS CloudFormation template for straightforward deployment in an AWS account.

AWS

AWS ML ML Apache Kafka

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

The service, which was launched in March 2021, predates several popular AWS offerings that have anomaly detection, such as Amazon OpenSearch , Amazon CloudWatch , AWS Glue Data Quality , Amazon Redshift ML , and Amazon QuickSight. To use this feature, you can write rules or analyzers and then turn on anomaly detection in AWS Glue ETL.

AWS

AWS ML ML Data Quality

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

AWS Machine Learning Blog

APRIL 19, 2023

Streaming ingestion – An Amazon Kinesis Data Analytics for Apache Flink application backed by Apache Kafka topics in Amazon Managed Streaming for Apache Kafka (MSK) (Amazon MSK) calculates aggregated features from a transaction stream, and an AWS Lambda function updates the online feature store.

ML ML Apache Kafka SQL

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

AWS Machine Learning Blog

NOVEMBER 3, 2023

m How it’s implemented In our quest to accurately determine shot speed during live matches, we’ve implemented a cutting-edge solution using Amazon Managed Streaming for Apache Kafka (Amazon MSK). We’ve implemented an AWS Lambda function with the specific task of retrieving the calculated shot speed from the relevant Kafka topic.

AWS

AWS Apache Kafka Data Scientist Data Science

Big data engineering simplified: Exploring roles of distributed systems

Data Science Dojo

JULY 24, 2023

Amazon S3: Amazon Simple Storage Service (S3) is a scalable object storage service provided by Amazon Web Services (AWS). It provides fault tolerance and high throughput for Big Data storage and processing. It allows organizations to store and retrieve any amount of data, making it popular for storing and managing Big Data in the cloud.

Big Data

Big Data Big Data Data Engineering Data Engineering

Navigating the Big Data Frontier: A Guide to Efficient Handling

Women in Big Data

OCTOBER 9, 2024

After this, the data is analyzed, business logic is applied, and it is processed for further analytical tasks like visualization or machine learning. Data Ingestion: Data is collected and funneled into the pipeline using batch or real-time methods, leveraging tools like Apache Kafka, AWS Kinesis, or custom ETL scripts.

Big Data

Big Data Big Data Apache Kafka Data Pipeline

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

TR wanted to take advantage of AWS managed services where possible to simplify operations and reduce undifferentiated heavy lifting. TR used AWS Glue DataBrew and AWS Batch jobs to perform the extract, transform, and load (ETL) jobs in the ML pipelines, and SageMaker along with Amazon Personalize to tailor the recommendations.

AWS

AWS Data Warehouse ML ML

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

Apache Kafka For data engineers dealing with real-time data, Apache Kafka is a game-changer. REGISTER NOW Data Orchestration and Workflow Management Apache Airflow Apache Airflow is renowned for its ability to build and schedule complex data pipelines.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Pickl AI

SEPTEMBER 18, 2024

As a pioneer in the streaming industry, Netflix utilises advanced data analytics to enhance user experience, optimise operations, and drive strategic decisions. It utilises Amazon Web Services (AWS) as its main data lake, processing over 550 billion events daily—equivalent to approximately 1.3 petabytes of data.

Big Data

Big Data Big Data Apache Kafka Big Data Analytics

Top 15 Data Analytics Projects in 2023 for beginners to Experienced

Pickl AI

JULY 20, 2023

Top 15 Data Analytics Projects in 2023 for Beginners to Experienced Levels: Data Analytics Projects allow aspirants in the field to display their proficiency to employers and acquire job roles. However, you might be looking for a guide to help you understand the different types of Data Analytics projects you may undertake.

Analytics

Analytics Analytics Big Data Big Data

What is Data Ingestion? Understanding the Basics

Pickl AI

JULY 25, 2024

This is essential for applications that demand immediate insights, such as fraud detection or real-time analytics. By centralising data from disparate sources, organisations can ensure that they have a unified view of their information, which is vital for analytics, reporting, and decision-making.

Apache Kafka

Apache Kafka Data Lakes Data Warehouse Data Quality

Predicting the Future of Data Science

Pickl AI

DECEMBER 4, 2024

Summary: The future of Data Science is shaped by emerging trends such as advanced AI and Machine Learning, augmented analytics, and automated processes. Data privacy regulations will shape how organisations handle sensitive information in analytics. Continuous learning and adaptation will be essential for data professionals.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

It involves developing data pipelines that efficiently transport data from various sources to storage solutions and analytical tools. OLAP (Online Analytical Processing): OLAP tools allow users to analyse data from multiple perspectives. ETL is vital for ensuring data quality and integrity.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

A central repository for unstructured data is beneficial for tasks like analytics and data virtualization. Tools and Techniques to Manage Unstructured Data Several tools are required to properly manage unstructured data, from storage to analytical tools. You also need the right technique to help manage unstructured data.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

Data Consumption : You have reached a point where the data is ready for consumption for AI, BI & other analytics. Data Storage : To store this processed data to retrieve it over time – be it a data warehouse or a data lake. The origins of a data pipeline connect it to the need for reusability and efficiency.

Data Pipeline

Data Pipeline ETL SQL Data Quality

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

Apache Kafka, Amazon Kinesis) 2 Data Preprocessing (e.g., As usage increased, the system had to be scaled vertically, approaching AWS instance-type limits. Today different stages exist within ML pipelines built to meet technical, industrial, and business requirements. 1 Data Ingestion (e.g.,

ML ML Machine Learning Machine Learning

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

AWS Machine Learning Blog

APRIL 18, 2025

Solution overview: Build a generative AI stock price analyzer with RAG For this post, we implement a RAG architecture with Amazon Bedrock Knowledge Bases using a custom connector and topics built with Amazon Managed Streaming for Apache Kafka (Amazon MSK) for a user who may be interested to understand stock price trends.

Apache Kafka

Apache Kafka AWS Clustering Database

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

AWS Machine Learning Blog

FEBRUARY 7, 2025

In this post, we dive deep into how CONXAI hosts the state-of-the-art OneFormer segmentation model on AWS using Amazon Simple Storage Service (Amazon S3), Amazon Elastic Kubernetes Service (Amazon EKS), KServe, and NVIDIA Triton. Our journey to AWS Initially, CONXAI started with a small cloud provider specializing in offering affordable GPUs.

Analytics

Analytics Analytics AWS Clustering

Top Big Data Tools Every Data Professional Should Know

Pickl AI

FEBRUARY 23, 2025

Ultimately, leveraging Big Data analytics provides a competitive advantage and drives innovation across various industries. Competitive Advantage Organisations that leverage Big Data Analytics can stay ahead of the competition by anticipating market trends and consumer preferences.

Big Data

Big Data Big Data Apache Hadoop Apache Kafka

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Python, SQL, and Apache Spark are essential for data engineering workflows. Real-time data processing with Apache Kafka enables faster decision-making. Apache Spark Apache Spark is a powerful data processing framework that efficiently handles Big Data.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Apache Kafka use cases: Driving innovation across diverse industries

Webinars

Trending Sources

Real-time fraud detection using AWS serverless and machine learning services

Webinars

What Are AI Credits and How Can Data Scientists Use Them?

Machine Learning with MATLAB and Amazon SageMaker

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

Transitioning off Amazon Lookout for Metrics

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

Big data engineering simplified: Exploring roles of distributed systems

Navigating the Big Data Frontier: A Guide to Efficient Handling

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

11 Open-Source Data Engineering Tools Every Pro Should Use

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Top 15 Data Analytics Projects in 2023 for beginners to Experienced

What is Data Ingestion? Understanding the Basics

Predicting the Future of Data Science

Discover the Most Important Fundamentals of Data Engineering

How to Manage Unstructured Data in AI and Machine Learning Projects

Comparing Tools For Data Processing Pipelines

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

Top Big Data Tools Every Data Professional Should Know

Best Data Engineering Tools Every Engineer Should Know

Stay Connected