Apache Kafka, Data Lakes and Machine Learning

Apache Kafka

Data Lakes

Machine Learning

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

Navigating the Big Data Frontier: A Guide to Efficient Handling

Women in Big Data

OCTOBER 9, 2024

These procedures are central to effective data management and crucial for deploying machine learning models and making data-driven decisions. The success of any data initiative hinges on the robustness and flexibility of its big data pipeline. What is a Data Pipeline?

Big Data

Big Data Big Data Apache Kafka Data Pipeline

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Unstructured data makes up 80% of the world's data and is growing. Managing unstructured data is essential for the success of machine learning (ML) projects. Without structure, data is difficult to analyze and extracting meaningful insights and patterns is challenging.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

All of the Free Virtual Sessions Coming to ODSC Europe 2023

ODSC - Open Data Science

JUNE 7, 2023

Bilokon | Visiting Lecturer, CEO and Founder | Imperial College London, Thalesians Ltd Apache Kafka for Real-Time Machine Learning Without a Data Lake: Kai Waehner | Global Field CTO, Author, International Speaker Semantic Analysis and Procedural Language Understanding in the Era of Large Language Models: Dr. Gözde Gül Şahin | Assistant Professor, (..)

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Science

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

AWS Machine Learning Blog

SEPTEMBER 11, 2024

In this post, we demonstrate how to build a robust real-time anomaly detection solution for streaming time series data using Amazon Managed Service for Apache Flink and other AWS managed services. This solution employs machine learning (ML) for anomaly detection, and doesn’t require users to have prior AI expertise.

AWS

AWS ML ML Apache Kafka

Pictures and Highlights from ODSC Europe 2023

ODSC - Open Data Science

JULY 22, 2023

We had bigger sessions on getting started with machine learning or SQL, up to advanced topics in NLP, and how to make deepfakes.

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Science

Watch the Top ODSC Europe 2023 Virtual Sessions Here

ODSC - Open Data Science

JULY 14, 2023

AI and Bias: How to Detect It and How to Prevent It Sandra Wachter, PhD | Professor, Technology and Regulation | Oxford Internet Institute, University of Oxford In recognition of the extensive biases and inequality that are present in training data, there has been much work done to test for bias in machine learning and AI systems.

Machine Learning

Machine Learning Machine Learning Apache Kafka Data Science

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

Many CIOs argue the rise of big data pushed people to use data more proactively for business decision-making. Big data got“ more leaders and people in the organization to use data, analytics, and machine learning in their decision making,” says former CIO Isaac Sacolick. Big data can grow too big fast.

Big Data

Big Data Big Data Apache Kafka Data Lakes

A Comprehensive Guide to the main components of Big Data

Pickl AI

DECEMBER 2, 2024

As organisations grapple with this vast amount of information, understanding the main components of Big Data becomes essential for leveraging its potential effectively. Key Takeaways Big Data originates from diverse sources, including IoT and social media. Data lakes and cloud storage provide scalable solutions for large datasets.

Big Data

Big Data Big Data Data Lakes Apache Hadoop

What is Data Ingestion? Understanding the Basics

Pickl AI

JULY 25, 2024

Data Ingestion Meaning At its core, It refers to the act of absorbing data from multiple sources and transporting it to a destination, such as a database, data warehouse, or data lake. Batch Processing In this method, data is collected over a period and then processed in groups or batches.

Apache Kafka

Apache Kafka Data Lakes Data Warehouse Data Quality

A Comprehensive Guide to the Main Components of Big Data

Pickl AI

NOVEMBER 25, 2024

Big Data

Big Data Big Data Data Lakes Apache Hadoop

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Pickl AI

SEPTEMBER 18, 2024

This blog explores how Netflix applies Big Data across its business operations, focusing on its infrastructure, content strategies, customer engagement, operational efficiency, marketing insights, security measures, and future challenges. petabytes of data. What Technologies Does Netflix Use for Its Big Data Infrastructure?

Big Data

Big Data Big Data Apache Kafka Big Data Analytics

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Role of Data Engineers in the Data Ecosystem Data Engineers play a crucial role in the data ecosystem by bridging the gap between raw data and actionable insights. They are responsible for building and maintaining data architectures, which include databases, data warehouses, and data lakes.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Big Data Syllabus: A Comprehensive Overview

Pickl AI

AUGUST 9, 2024

APIs Understanding how to interact with Application Programming Interfaces (APIs) to gather data from external sources. Data Streaming Learning about real-time data collection methods using tools like Apache Kafka and Amazon Kinesis. Once data is collected, it needs to be stored efficiently.

Big Data

Big Data Big Data Big Data Analytics Big Data Analytics

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

ODSC - Open Data Science

OCTOBER 9, 2024

These tools use machine learning models trained on vast amounts of code to assist developers in writing cleaner, more efficient code. Tools like Testim and Applitools leverage machine learning to improve both unit testing and UI testing. How you might ask? What should you be looking for?

Apache Kafka

Apache Kafka AI AI Machine Learning

How data engineers tame Big Data?

Dataconomy

FEBRUARY 23, 2023

Data engineers are responsible for designing and building the systems that make it possible to store, process, and analyze large amounts of data. These systems include data pipelines, data warehouses, and data lakes, among others. However, building and maintaining these systems is not an easy task.

Big Data

Big Data Big Data Data Engineering Data Engineering

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

Data Processing : You need to save the processed data through computations such as aggregation, filtering and sorting. Data Storage : To store this processed data to retrieve it over time – be it a data warehouse or a data lake.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

AWS Machine Learning Blog

MARCH 30, 2023

Therefore, it’s no surprise that determining the proficiency of goalkeepers in preventing the ball from entering the net is considered one of the most difficult tasks in football data analysis. The result is a machine learning (ML)-powered insight that allows fans to easily evaluate and compare the goalkeepers’ proficiencies.

Machine Learning

Machine Learning Machine Learning AWS ML

Building a Business with a Real-Time Analytics Stack, Streaming ML Without a Data Lake, and…

ODSC - Open Data Science

MAY 24, 2023

Building a Business with a Real-Time Analytics Stack, Streaming ML Without a Data Lake, and Google’s PaLM 2 Building a Pizza Delivery Service with a Real-Time Analytics Stack The best businesses react quickly and with informed decisions. Here’s a use case of how you can use a real-time analytics stack to build a pizza delivery service.

Data Lakes

Data Lakes ML ML Analytics

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Looking to build a machine-learning model for churn prediction? The atomic data provides a perfect input, capturing the full richness of customer behavior over time. Technologies like Apache Kafka, often used in modern CDPs, use log-based approaches to stream customer events between systems in real-time.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Top Big Data Tools Every Data Professional Should Know

Pickl AI

FEBRUARY 23, 2025

Best Big Data Tools Popular tools such as Apache Hadoop, Apache Spark, Apache Kafka, and Apache Storm enable businesses to store, process, and analyse data efficiently. Key Features : Integration with Microsoft Services : Seamlessly integrates with other Azure services like Azure Data Lake Storage.

Big Data

Big Data Big Data Apache Hadoop Apache Kafka

Data Science Current

Streaming Machine Learning Without a Data Lake

Navigating the Big Data Frontier: A Guide to Efficient Handling

Webinars

Trending Sources

How to Manage Unstructured Data in AI and Machine Learning Projects

Webinars

All of the Free Virtual Sessions Coming to ODSC Europe 2023

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

Pictures and Highlights from ODSC Europe 2023

Watch the Top ODSC Europe 2023 Virtual Sessions Here

Did Big Data Deliver Business Transformation & Improved CX?

A Comprehensive Guide to the main components of Big Data

What is Data Ingestion? Understanding the Basics

A Comprehensive Guide to the Main Components of Big Data

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Discover the Most Important Fundamentals of Data Engineering

Big Data Syllabus: A Comprehensive Overview

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

How data engineers tame Big Data?

Comparing Tools For Data Processing Pipelines

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

Building a Business with a Real-Time Analytics Stack, Streaming ML Without a Data Lake, and…

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Top Big Data Tools Every Data Professional Should Know

Stay Connected