Data Pipeline and Events - Data Science Current

Complex Event Processing (CEP)

Dataconomy

MARCH 11, 2025

Complex Event Processing (CEP) is at the forefront of modern analytics, enabling organizations to extract valuable insights from vast streams of real-time data. As industries evolve, the ability to process and respond to events in the moment becomes mission-critical. What is Complex Event Processing (CEP)?

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Mining

Streaming Langchain: Real-time Data Processing with AI

Data Science Dojo

NOVEMBER 25, 2024

Learn to build a recommendation system using Python Real-Time Interaction Whether it’s engaging with customers, analyzing live events, or responding to user queries, streaming enables more natural, responsive interactions. or later Install Langchain: Ensure that Langchain is installed in your Python environment.

AI

AI AI Predictive Analytics Python

Learn How to Build Airtight Data Pipelines for your AI Initiatives

databricks

OCTOBER 24, 2023

"I can't think of anything that's been more powerful since the desktop computer." — Michael Carbin, Associate Professor, MIT, and Founding Advisor, MosaicML A.

Data Pipeline

Data Pipeline AI AI

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

Let’s explore each of these components and its application in the sales domain: Synapse Data Engineering: Synapse Data Engineering provides a powerful Spark platform designed for large-scale data transformations through Lakehouse. Here, we changed the data types of columns and dealt with missing values.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Streaming Data Pipelines: What Are They and How to Build One

Precisely

DECEMBER 28, 2023

Business success is based on how we use continuously changing data. That’s where streaming data pipelines come into play. This article explores what streaming data pipelines are, how they work, and how to build this data pipeline architecture. What is a streaming data pipeline?

Data Pipeline

Data Pipeline Apache Kafka Big Data Big Data

Airbyte: The ultimate workhorse for all your ELT pipelines

Data Science Dojo

JANUARY 27, 2023

Data Science Dojo is offering Airbyte for FREE on Azure Marketplace packaged with a pre-configured web environment enabling you to quickly start the ELT process rather than spending time setting up the environment. Free to use. Conclusion  There are a ton of small services that aren’t supported on traditional data pipeline platforms.

Azure

Azure Data Science Data Pipeline Data Engineering

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. or a later version) database.

ETL

ETL Data Warehouse Analytics Analytics

Linked Data Event Streams and TimescaleDB for Real-time Timeseries Data Management

Towards AI

FEBRUARY 25, 2023

How to consume a Linked Data Event Stream and store it in a TimescaleDB database Photo by Scott Graham on Unsplash Linked data event stream Linked Data Event Streams represent and share fast and slow-moving data on the Web using the Resource Description Framework (RDF). and PostgreSQL 14.4

Database

Database Data Pipeline AI AI

Data sips and bites: An evening of data insights

Dataconomy

JULY 29, 2024

Hosted at one of Mindspace’s coworking locations, the event was a convergence of insightful talks and professional networking. Mindspace , a global coworking and flexible office provider with over 45 locations worldwide, including 13 in Germany, offered a conducive environment for this knowledge-sharing event.

Apache Kafka

Apache Kafka Data Pipeline Data Warehouse ETL

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Automate and streamline our ML inference pipeline with SageMaker and Airflow Building an inference data pipeline on large datasets is a challenge many companies face. Check Tweets Batch Inference Job Status: Create an SQS listener that reads a message from the queue when the event rule publishes it.

Data Pipeline

Data Pipeline ML ML AWS

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

AWS Machine Learning Blog

DECEMBER 4, 2024

Through simple conversations, business teams can use the chat agent to extract valuable insights from both structured and unstructured data sources without writing code or managing complex data pipelines. Prompt 2: Were there any major world events in 2016 affecting the sale of Vegetables?

AWS

AWS AI AI SQL

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Heartbeat

NOVEMBER 6, 2023

Image Source — Pixel Production Inc In the previous article, you were introduced to the intricacies of data pipelines, including the two major types of existing data pipelines. You might be curious how a simple tool like Apache Airflow can be powerful for managing complex data pipelines.

Data Pipeline

Data Pipeline Clean Data ETL Python

10 Data Engineering Topics and Trends You Need to Know in 2024

ODSC - Open Data Science

JANUARY 9, 2024

Data Observability and Monitoring Data observability is the ability to monitor and troubleshoot data pipelines. Data monitoring is the process of collecting and analyzing data about data pipelines to identify and resolve problems. Interested in attending an ODSC event?

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

Solution overview In brief, the solution involved building three pipelines: Data pipeline – Extracts the metadata of the images Machine learning pipeline – Classifies and labels images Human-in-the-loop review pipeline – Uses a human team to review results The following diagram illustrates the solution architecture.

ML

ML ML AWS Data Pipeline

Guide to Digital Transformation: Data-first Architecture

Dataversity

APRIL 30, 2021

The goal of digital transformation remains the same as ever – to become more data-driven. We have learned how to gain a competitive advantage by capturing business events in data. Events are data snap-shots of complex activity sourced from the web, customer systems, ERP transactions, social media, […].

Data Pipeline

Data Pipeline Data Warehouse Data Governance Data Quality

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

In this post we highlight how the AWS Generative AI Innovation Center collaborated with the AWS Professional Services and PGA TOUR to develop a prototype virtual assistant using Amazon Bedrock that could enable fans to extract information about any event, player, hole or shot level details in a seamless interactive manner.

SQL

SQL AWS AI AI

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

AWS Machine Learning Blog

FEBRUARY 5, 2025

The following diagram illustrates the data pipeline for indexing and query in the foundational search architecture. The listing writer microservice publishes listing change events to an Amazon Simple Notification Service (Amazon SNS) topic, which an Amazon Simple Queue Service (Amazon SQS) queue subscribes to.

K-nearest Neighbors

K-nearest Neighbors Machine Learning Machine Learning Database

ODSC West 2023 Recap in Pictures

ODSC - Open Data Science

DECEMBER 5, 2023

You can see our photos from the event here , and be sure to follow our YouTube for virtual highlights from the conference as well. Over in San Francisco, we had a keynote for each day of the event. Other Events Aside from networking events and all of our sessions, we had a few other special events. What’s next?

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

Getting started with Kafka client metrics

IBM Journey to AI blog

MARCH 14, 2024

Apache Kafka stands as a widely recognized open source event store and stream processing platform. It has evolved into the de facto standard for data streaming, as over 80% of Fortune 500 companies use it. All major cloud providers provide managed data streaming services to meet this growing demand. Take the next step.

Apache Kafka

Apache Kafka Data Pipeline

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

The result of these events can be evaluated afterwards so that they make better decisions in the future. With this proactive approach, Kakao Games can launch the right events at the right time. Kakao Games can then create a promotional event not to leave the game. However, this approach is reactive.

AWS

AWS ML ML ETL

Meet the Seattle-area startups that just graduated from Y Combinator

Flipboard

SEPTEMBER 25, 2023

Brian Chesky, CEO of Airbnb, spoke at a Y Combinator event this summer. (Y Neum AI Photo) Co-founders: David de Matheu and Pinhas Kevin Cohen Explain what your startup does in two sentences: Neum AI is the next generation of data pipelines built specifically for retrieval augmented generation (RAG).

Data Pipeline

Data Pipeline AI AI Natural Language Processing

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Lambda enables serverless, event-driven data processing tasks, allowing for real-time transformations and calculations as data arrives. Step Functions complements this by orchestrating complex workflows, coordinating multiple Lambda functions, and managing error handling for sophisticated data processing pipelines.

AWS

AWS Data Governance Data Silos SQL

Apache Kafka and Apache Flink: An open-source match made in heaven

IBM Journey to AI blog

NOVEMBER 3, 2023

Apache Kafka and Apache Flink working together Anyone who is familiar with the stream processing ecosystem is familiar with Apache Kafka: the de-facto enterprise standard for open-source event streaming. Apache Kafka streams get data to where it needs to go, but these capabilities are not maximized when Apache Kafka is deployed in isolation.

Apache Kafka

Apache Kafka Data Warehouse Data Pipeline Big Data

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

If the question was Whats the schedule for AWS events in December?, AWS usually announces the dates for their upcoming # re:Invent event around 6-9 months in advance. Rajesh Nedunuri is a Senior Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

How a modern data stack is unlocking agility across the retail industry

Tableau

MAY 19, 2021

Fortunately, a modern data stack (MDS) using Fivetran, Snowflake, and Tableau makes it easier to pull data from new and various systems, combine it into a single source of truth, and derive fast, actionable insights. What is a modern data stack? Transparency .

Tableau

Tableau Cloud Data Data Pipeline Analytics

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data Engineering : Building and maintaining data pipelines, ETL (Extract, Transform, Load) processes, and data warehousing. Career Support Some bootcamps include job placement services like resume assistance, mock interviews, networking events, and partnerships with employers to aid in job placement.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Robust time series forecasting with MLOps on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 28, 2023

In these applications, time series data can have heavy-tailed distributions, where the tails represent extreme values. Accurate forecasting in these regions is important in determining how likely an extreme event is and whether to raise an alarm. However, the extreme event will have zero probability.

AWS

AWS Machine Learning Machine Learning ML

6 Remote AI Jobs to Look for in 2024

ODSC - Open Data Science

DECEMBER 19, 2023

Data Engineer Data engineers are responsible for the end-to-end process of collecting, storing, and processing data. They use their knowledge of data warehousing, data lakes, and big data technologies to build and maintain data pipelines. Interested in attending an ODSC event?

Data Scientist

Data Scientist Machine Learning Machine Learning AI

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

Kafka And ETL Processing: You might be using Apache Kafka for high-performance data pipelines, stream various analytics data, or run company critical assets using Kafka, but did you know that you can also use Kafka clusters to move data between multiple systems. 5 Key Comparisons in Different Apache Kafka Architectures.

Apache Kafka

Apache Kafka ETL Data Lakes AWS

Real-Time Sentiment Analysis with Kafka and PySpark

Towards AI

FEBRUARY 29, 2024

Apache Kafka plays a crucial role in enabling data processing in real-time by efficiently managing data streams and facilitating seamless communication between various components of the system. Apache Kafka Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and streaming applications.

Apache Kafka

Apache Kafka SQL Clustering Data Pipeline

Apache Kafka use cases: Driving innovation across diverse industries

IBM Journey to AI blog

SEPTEMBER 4, 2024

Apache Kafka is an open-source , distributed streaming platform that allows developers to build real-time, event-driven applications. With Apache Kafka, developers can build applications that continuously use streaming data records and deliver real-time experiences to users.

Apache Kafka

Apache Kafka Internet of Things Data Pipeline Clustering

Ways Big Data Creates a Better Customer Experience In Fintech

Smart Data Collective

SEPTEMBER 19, 2022

Historical financial big data helps businesses scrutinize evolving customer behaviors, allowing them to come up with invaluable products and services that streamline banking processes. However, to take full advantage of big data’s powerful capabilities, choosing BI and ETL solutions cannot be over-emphasized.

Big Data

Big Data Big Data Big Data Analytics Big Data Analytics

What Does a Data Engineering Job Involve in 2024?

ODSC - Open Data Science

JANUARY 30, 2024

Not only does it involve the process of collecting, storing, and processing data so that it can be used for analysis and decision-making, but these professionals are responsible for building and maintaining the infrastructure that makes this possible; and so much more. Think of data engineers as the architects of the data ecosystem.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Apache Flink for all: Making Flink consumable across all areas of your business

IBM Journey to AI blog

AUGUST 29, 2024

Event-driven businesses across all industries thrive on real-time data, enabling companies to act on events as they happen rather than after the fact. Flink jobs, designed to process continuous data streams, are key to making this possible. They are able to adapt to changing demands quickly to seize new opportunities.

Apache Kafka

Apache Kafka Hadoop ETL Data Pipeline

Announcing the 2024 Data Engineering & Ai X Innovation Summits

ODSC - Open Data Science

JANUARY 2, 2024

We couldn’t be more excited to announce two events that will be co-located with ODSC East in Boston this April: The Data Engineering Summit and the Ai X Innovation Summit. These two co-located events represent an opportunity to dive even deeper into the topics and trends shaping these disciplines. Learn more about them below.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Training Models on Streaming Data [Practical Guide]

The MLOps Blog

FEBRUARY 5, 2023

In the later part of this article, we will discuss its importance and how we can use machine learning for streaming data analysis with the help of a hands-on example. What is streaming data? This will also help us observe the importance of stream data. It can be used to collect, store, and process streaming data in real-time.

Machine Learning

Machine Learning Machine Learning Data Pipeline Apache Kafka

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

IBM Data Science in Practice

MARCH 8, 2023

Source: IBM Cloud Pak for Data Feature Computation Engine Users can transform batch, streaming, and real-time data into features Source: IBM Cloud Pak for Data To productionize a machine learning system, it is necessary to process new data continuously. Spark, Flink, etc.)

Machine Learning

Machine Learning Machine Learning ML ML

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

Iguazio

MAY 29, 2024

The 4 Gen AI Architecture Pipelines The four pipelines are: 1. The Data Pipeline The data pipeline is the foundation of any AI system. It's responsible for collecting and ingesting the data from various external sources, processing it and managing the data.

Data Pipeline

Data Pipeline AI AI ML

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

This unified schema streamlines downstream consumption and analytics because the data follows a standardized schema and new sources can be added with minimal data pipeline changes. After the security log data is stored in Amazon Security Lake, the question becomes how to analyze it.

AWS

AWS ML ML Algorithm

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

MLOps aims to bridge the gap between data science and operational teams so they can reliably and efficiently transition ML models from development to production environments, all while maintaining high model performance and accuracy. AIOps integrates these models into existing IT systems to enhance their functions and performance.

Big Data

Big Data Big Data ML ML

How to Load Google Analytics 4 Dataset into Snowflake with BigQuery & Azure Data Factory

phData

SEPTEMBER 5, 2023

Google Analytics 4 (GA4) is a powerful tool for collecting and analyzing website and app data that many businesses rely heavily on to make informed business decisions. However, there might be instances where you need to migrate the raw event data from GA4 to Snowflake for more in-depth analysis and business intelligence purposes.

Azure

Azure Analytics Analytics Data Pipeline

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

AWS Lambda is an event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. Tayo Olajide is a seasoned Cloud Data Engineering generalist with over a decade of experience in architecting and implementing data solutions in cloud environments.

AWS

AWS ML ML Machine Learning

Know Before You Go: Precisely at Confluent’s Current 2023

Precisely

SEPTEMBER 12, 2023

As a proud member of the Connect with Confluent program , we help organizations going through digital transformation and IT infrastructure modernization break down data silos and power their streaming data pipelines with trusted data. Let’s cover some additional information to know before attending.

Data Silos

Data Silos Apache Kafka Data Pipeline Data Quality

Complex Event Processing (CEP)

Streaming Langchain: Real-time Data Processing with AI

Webinars

Trending Sources

Learn How to Build Airtight Data Pipelines for your AI Initiatives

Webinars

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Streaming Data Pipelines: What Are They and How to Build One

Airbyte: The ultimate workhorse for all your ELT pipelines

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Linked Data Event Streams and TimescaleDB for Real-time Timeseries Data Management

Data sips and bites: An evening of data insights

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

10 Data Engineering Topics and Trends You Need to Know in 2024

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Guide to Digital Transformation: Data-first Architecture

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

ODSC West 2023 Recap in Pictures

Getting started with Kafka client metrics

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Meet the Seattle-area startups that just graduated from Y Combinator

Shaping the future: OMRON’s data-driven journey with AWS

Apache Kafka and Apache Flink: An open-source match made in heaven

Comparing Tools For Data Processing Pipelines

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

How a modern data stack is unlocking agility across the retail industry

A Guide to Choose the Best Data Science Bootcamp

Robust time series forecasting with MLOps on Amazon SageMaker

6 Remote AI Jobs to Look for in 2024

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Real-Time Sentiment Analysis with Kafka and PySpark

Apache Kafka use cases: Driving innovation across diverse industries

Ways Big Data Creates a Better Customer Experience In Fintech

What Does a Data Engineering Job Involve in 2024?

Apache Flink for all: Making Flink consumable across all areas of your business

Announcing the 2024 Data Engineering & Ai X Innovation Summits

Training Models on Streaming Data [Practical Guide]

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

How to Load Google Analytics 4 Dataset into Snowflake with BigQuery & Azure Data Factory

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Know Before You Go: Precisely at Confluent’s Current 2023

Stay Connected