Apache Kafka, Data Lakes and ML - Data Science Current

Apache Kafka

Data Lakes

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

Building a Business with a Real-Time Analytics Stack, Streaming ML Without a Data Lake, and…

ODSC - Open Data Science

MAY 24, 2023

Building a Business with a Real-Time Analytics Stack, Streaming ML Without a Data Lake, and Google’s PaLM 2 Building a Pizza Delivery Service with a Real-Time Analytics Stack The best businesses react quickly and with informed decisions.

Data Lakes

Data Lakes ML ML Analytics

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Trending Sources

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

AWS Machine Learning Blog

SEPTEMBER 11, 2024

In this post, we demonstrate how to build a robust real-time anomaly detection solution for streaming time series data using Amazon Managed Service for Apache Flink and other AWS managed services. This solution employs machine learning (ML) for anomaly detection, and doesn’t require users to have prior AI expertise.

AWS

AWS ML ML Apache Kafka

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

All of the Free Virtual Sessions Coming to ODSC Europe 2023

ODSC - Open Data Science

JUNE 7, 2023

Wednesday, June 14th Me, my health, and AI: applications in medical diagnostics and prognostics: Sara Khalid | Associate Professor, Senior Research Fellow, Biomedical Data Science and Health Informatics | University of Oxford Iterated and Exponentially Weighted Moving Principal Component Analysis : Dr. Paul A.

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Science

Watch the Top ODSC Europe 2023 Virtual Sessions Here

ODSC - Open Data Science

JULY 14, 2023

Time Series Forecasting for Managers — All Forecasts Are Wrong but Some Are Useful Tanvir Ahmed Shaikh | Data Strategist (Director) | Genentech, Inc Time series forecasting remains an under-appreciated technique in data science education, often overshadowed by more popular machine learning methods.

Machine Learning

Machine Learning Machine Learning Apache Kafka Data Science

Pictures and Highlights from ODSC Europe 2023

ODSC - Open Data Science

JULY 22, 2023

Some of our most popular in-person sessions were: Data Communication in the Age of AI: Alan Rutter | Founder | Fire Plus Algebra Autoencoders — a Magical Approach to Unsupervised Machine Learning: Oliver Zeigermann | Blue Collar ML Architect | Freelancer Fast Option Pricing Using Deep Learning Methods: Chakri Cherukuri | Senior Quantitative Researcher (..)

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Science

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

AWS Machine Learning Blog

MARCH 30, 2023

The result is a machine learning (ML)-powered insight that allows fans to easily evaluate and compare the goalkeepers’ proficiencies. An ML model is trained through Amazon SageMaker , using data from four seasons of the first and second Bundesliga, encompassing all shots that landed on target (either resulting in a goal or being saved).

Machine Learning

Machine Learning Machine Learning AWS ML

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Managing unstructured data is essential for the success of machine learning (ML) projects. Without structure, data is difficult to analyze and extracting meaningful insights and patterns is challenging. This article will discuss managing unstructured data for AI and ML projects. What is Unstructured Data?

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

ODSC - Open Data Science

OCTOBER 9, 2024

What should you be looking for?

Apache Kafka

Apache Kafka AI AI Machine Learning

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Common options include: Relational Databases: Structured storage supporting ACID transactions, suitable for structured data. NoSQL Databases: Flexible, scalable solutions for unstructured or semi-structured data. Data Warehouses : Centralised repositories optimised for analytics and reporting.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

Data pipeline stages But before delving deeper into the technical aspects of these tools, let’s quickly understand the core components of a data pipeline succinctly captured in the image below: Data pipeline stages | Source: Author What does a good data pipeline look like?

Data Pipeline

Data Pipeline ETL SQL Data Quality

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Technologies like Apache Kafka, often used in modern CDPs, use log-based approaches to stream customer events between systems in real-time. Both persistent staging and data lakes involve storing large amounts of raw data. These changes are streamed into Iceberg tables in your data lake.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

And where data was available, the ability to access and interpret it proved problematic. Big data can grow too big fast. Left unchecked, data lakes became data swamps. Some data lake implementations required expensive ‘cleansing pumps’ to make them navigable again.

Big Data

Big Data Big Data Apache Kafka Data Lakes

Top Big Data Tools Every Data Professional Should Know

Pickl AI

FEBRUARY 23, 2025

Best Big Data Tools Popular tools such as Apache Hadoop, Apache Spark, Apache Kafka, and Apache Storm enable businesses to store, process, and analyse data efficiently. Key Features : Integration with Microsoft Services : Seamlessly integrates with other Azure services like Azure Data Lake Storage.

Big Data

Big Data Big Data Apache Hadoop Apache Kafka

Streaming Machine Learning Without a Data Lake

Building a Business with a Real-Time Analytics Stack, Streaming ML Without a Data Lake, and…

Webinars

Trending Sources

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

Webinars

All of the Free Virtual Sessions Coming to ODSC Europe 2023

Watch the Top ODSC Europe 2023 Virtual Sessions Here

Pictures and Highlights from ODSC Europe 2023

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

How to Manage Unstructured Data in AI and Machine Learning Projects

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

Build Data Pipelines: Comprehensive Step-by-Step Guide

Comparing Tools For Data Processing Pipelines

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Did Big Data Deliver Business Transformation & Improved CX?

Top Big Data Tools Every Data Professional Should Know

Stay Connected