AWS, Data Pipeline and Events - Data Science Current

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. Some of these tools included AWS Cloud based solutions, such as AWS Lambda and AWS Step Functions.

AWS

AWS Data Governance Data Silos SQL

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. or a later version) database.

ETL

ETL Data Warehouse Analytics Analytics

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

AWS Machine Learning Blog

DECEMBER 4, 2024

SageMaker Unified Studio combines various AWS services, including Amazon Bedrock , Amazon SageMaker , Amazon Redshift , Amazon Glue , Amazon Athena , and Amazon Managed Workflows for Apache Airflow (MWAA) , into a comprehensive data and AI development platform. Navigate to the AWS Secrets Manager console and find the secret -api-keys.

AWS

AWS AI AI SQL

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Lets assume that the question What date will AWS re:invent 2024 occur? The corresponding answer is also input as AWS re:Invent 2024 takes place on December 26, 2024. If the question was Whats the schedule for AWS events in December?, This setup uses the AWS SDK for Python (Boto3) to interact with AWS services.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. AWS CodeBuild is a fully managed continuous integration service in the cloud.

AWS

AWS ML ML Machine Learning

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

The result of these events can be evaluated afterwards so that they make better decisions in the future. With this proactive approach, Kakao Games can launch the right events at the right time. Kakao Games can then create a promotional event not to leave the game. However, this approach is reactive.

AWS

AWS ML ML ETL

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

AWS Machine Learning Blog

FEBRUARY 5, 2025

The following diagram illustrates the data pipeline for indexing and query in the foundational search architecture. The listing writer microservice publishes listing change events to an Amazon Simple Notification Service (Amazon SNS) topic, which an Amazon Simple Queue Service (Amazon SQS) queue subscribes to.

K-nearest Neighbors

K-nearest Neighbors Machine Learning Machine Learning Database

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

SnapLogic uses Amazon Bedrock to build its platform, capitalizing on the proximity to data already stored in Amazon Web Services (AWS). Control plane and data plane implementation SnapLogic’s Agent Creator platform follows a decoupled architecture, separating the control plane and data plane for enhanced security and scalability.

AI

AI AI AWS Database

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

In this post we highlight how the AWS Generative AI Innovation Center collaborated with the AWS Professional Services and PGA TOUR to develop a prototype virtual assistant using Amazon Bedrock that could enable fans to extract information about any event, player, hole or shot level details in a seamless interactive manner.

SQL

SQL AWS AI AI

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. In the following sections, we dive into each pipeline in more detail.

ML

ML ML AWS Data Pipeline

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Automate and streamline our ML inference pipeline with SageMaker and Airflow Building an inference data pipeline on large datasets is a challenge many companies face. Check Tweets Batch Inference Job Status: Create an SQS listener that reads a message from the queue when the event rule publishes it.

Data Pipeline

Data Pipeline ML ML AWS

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

Whether logs are coming from Amazon Web Services (AWS), other cloud providers, on-premises, or edge devices, customers need to centralize and standardize security data. After the security log data is stored in Amazon Security Lake, the question becomes how to analyze it. Subscribe an AWS Lambda function to the SQS queue.

AWS

AWS ML ML Algorithm

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts. With this launch, account owners can grant access to select feature groups by other accounts using AWS Resource Access Manager (AWS RAM). Their task is to construct and oversee efficient data pipelines.

AWS

AWS ML ML Machine Learning

Robust time series forecasting with MLOps on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 28, 2023

In these applications, time series data can have heavy-tailed distributions, where the tails represent extreme values. Accurate forecasting in these regions is important in determining how likely an extreme event is and whether to raise an alarm. The following diagram illustrates the inference pipeline.

AWS

AWS Machine Learning Machine Learning ML

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

Kafka And ETL Processing: You might be using Apache Kafka for high-performance data pipelines, stream various analytics data, or run company critical assets using Kafka, but did you know that you can also use Kafka clusters to move data between multiple systems. Step 2: Create a Data Catalog table.

Apache Kafka

Apache Kafka ETL Data Lakes AWS

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

Traditional maintenance activities rely on a sizable workforce distributed across key locations along the BHS dispatched by operators in the event of an operational fault. With this service, industrial sensors, smart meters, and OPC UA servers can be connected to an AWS data lake with just a few clicks.

AWS

AWS ML ML Machine Learning

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

In this post, we discuss how to bring data stored in Amazon DocumentDB into SageMaker Canvas and use that data to build ML models for predictive analytics. Without creating and maintaining data pipelines, you will be able to power ML models with your unstructured data stored in Amazon DocumentDB.

Machine Learning

Machine Learning Machine Learning AWS ML

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

Examples of other PBAs now available include AWS Inferentia and AWS Trainium , Google TPU, and Graphcore IPU. Around this time, industry observers reported NVIDIA’s strategy pivoting from its traditional gaming and graphics focus to moving into scientific computing and data analytics.

AWS

AWS ML ML Clustering

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Big Data Technologies : Handling and processing large datasets using tools like Hadoop, Spark, and cloud platforms such as AWS and Google Cloud. Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

The full code can be found on the aws-samples-for-ray GitHub repository. Prepare the source data for the feature store by adding an event time and record ID for each row of data. Ingest the prepared data into the feature group by using the Boto3 SDK.

Machine Learning

Machine Learning Machine Learning ML ML

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

APRIL 5, 2024

This analytical model provides accurate estimates of land surface temperature (LST) at a granular level, allowing Gramener to quantify changes in the UHI effect based on parameters (names of indexes and data used). Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. Outside work, he is a travel enthusiast.

Clustering

Clustering ML ML AWS

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Recognizing these specific needs, Fivetran has developed a range of connectors, including dedicated applications, databases, files, and events, which can accommodate the diverse formats used by healthcare systems. Addressing these needs may pose challenges that lead to the implementation of custom solutions rather than a uniform approach.

SQL

SQL Data Warehouse Azure Cloud Data

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

MLOps aims to bridge the gap between data science and operational teams so they can reliably and efficiently transition ML models from development to production environments, all while maintaining high model performance and accuracy. AIOps integrates these models into existing IT systems to enhance their functions and performance.

Big Data

Big Data Big Data ML ML

Apache Kafka use cases: Driving innovation across diverse industries

IBM Journey to AI blog

SEPTEMBER 4, 2024

Apache Kafka is an open-source , distributed streaming platform that allows developers to build real-time, event-driven applications. With Apache Kafka, developers can build applications that continuously use streaming data records and deliver real-time experiences to users.

Apache Kafka

Apache Kafka Internet of Things Data Pipeline Clustering

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

Apache Kafka For data engineers dealing with real-time data, Apache Kafka is a game-changer. This open-source streaming platform enables the handling of high-throughput data feeds, ensuring that data pipelines are efficient, reliable, and capable of handling massive volumes of data in real-time.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

AWS provides several tools to create and manage ML model deployments. 2 If you are somewhat familiar with AWS ML base tools, the first thing that comes to mind is “Sagemaker”. AWS Sagemeaker is in fact a great tool for machine learning operations (MLOps) to automate and standardize processes across the ML lifecycle. S3 buckets.

AWS

AWS ETL ML ML

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. SageMaker Studio offers built-in algorithms, automated model tuning, and seamless integration with AWS services, making it a powerful platform for developing and deploying machine learning solutions at scale.

Machine Learning

Machine Learning Machine Learning ML ML

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. They are crucial in ensuring data is readily available for analysis and reporting.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

Enterprise data architects, data engineers, and business leaders from around the globe gathered in New York last week for the 3-day Strata Data Conference , which featured new technologies, innovations, and many collaborative ideas. 2) When data becomes information, many (incremental) use cases surface.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

The Data Integration Solution Checklist: Top 10 Considerations

Precisely

MAY 13, 2024

A true enterprise-grade integration solution calls for source and target connectors that can accommodate: VSAM files COBOL copybooks open standards like JSON modern platforms like Amazon Web Services ( AWS ), Confluent , Databricks , or Snowflake Questions to ask each vendor: Which enterprise data sources and targets do you support?

Data Governance

Data Governance Data Pipeline Cloud Data Data Quality

Serverless use cases: How enterprises are using the technology to let developers innovate

IBM Journey to AI blog

AUGUST 6, 2024

Today, all leading CSPs, including Amazon Web Services (AWS Lambda), Microsoft Azure (Azure Functions) and IBM (IBM Cloud Code Engine) offer serverless platforms. In a serverless model, an event triggers app code to run. Automated serverless functions are stateless and designed to handle individual events.

Cloud Computing

Cloud Computing Internet of Things Big Data Big Data

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

Data engineers will also work with data scientists to design and implement data pipelines; ensuring steady flows and minimal issues for data teams. They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable. Interested in attending an ODSC event?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Introducing the DataRobot AI Cloud: A Closer Look

DataRobot

SEPTEMBER 14, 2021

DataRobot now delivers both visual and code-centric data preparation and data pipelines, along with automated machine learning that is composable, and can be driven by hosted notebooks or a graphical user experience. Virtual Event. Modular and Extensible, Building on Existing Investments. September 23. Register Now.

AI

AI AI Data Pipeline Data Preparation

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

What Are the Best Third-Party Data Ingestion Tools for Snowflake? Fivetran Fivetran is a tool dedicated to replicating applications, databases, events, and files into a high-performance data warehouse, such as Snowflake. Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.).

Data Warehouse

Data Warehouse Azure AWS Database

Deployment of Data and ML Pipelines for the Most Chaotic Industry: The Stirred Rivers of Crypto

The MLOps Blog

DECEMBER 7, 2022

Knowing what needs to be done and in what order (the whole process and management side of data) is often overlooked , and we know sometimes keeping everyone up to date can be a bit tedious in its own way, but if you can orchestrate pipelines with dozens of steps in your sleep, you surely can take a moment to write what you’re up to, right?

ML

ML ML AWS ETL

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

Image generated with Midjourney In today’s fast-paced world of data science, building impactful machine learning models relies on much more than selecting the best algorithm for the job. Data scientists and machine learning engineers need to collaborate to make sure that together with the model, they develop robust data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

ODSC - Open Data Science

JANUARY 7, 2025

Environments Data science environments encompass the tools and platforms where professionals perform their work. From development environments like Jupyter Notebooks to robust cloud-hosted solutions such as AWS SageMaker, proficiency in these systems is critical.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

How to Monitor Costs in Snowflake

phData

JANUARY 5, 2024

This per-byte fee for data egress depends on the region where your Snowflake account is hosted. Operational Risks: Uncover operational risks such as data loss or failures in the event of an unforeseen outage or disaster. Luckily, there are several tools in place to monitor these costs in Snowflake.

Power BI

Power BI Cloud Computing Data Pipeline AWS

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

Through this unified query capability, you can create comprehensive insights into customer transaction patterns and purchase behavior for active products without the traditional barriers of data silos or the need to copy data between systems. Environments are the actual data infrastructure behind a project.

SQL

SQL Data Analyst Data Warehouse AWS

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

The service will consume the features in real time, generate predictions in near real-time , such as in an event processing pipeline, and write the outputs to a prediction queue. If your organization runs its workloads on AWS , it might be worth it to leverage AWS SageMaker. Data engineers are mostly in charge of it.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

However, if the tool supposes an option where we can write our custom programming code to implement features that cannot be achieved using the drag-and-drop components, it broadens the horizon of what we can do with our data pipelines. In this example, the secret is an API key, which will be used later on in the pipeline.

Python

Python ETL AWS Database

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

Internally within Netflix’s engineering team, Meson was built to manage, orchestrate, schedule, and execute workflows within ML/Data pipelines. Meson managed the lifecycle of ML pipelines, providing functionality such as recommendations and content analysis, and leveraged the Single Leader Architecture.

ML

ML ML Machine Learning Machine Learning

Shaping the future: OMRON’s data-driven journey with AWS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Webinars

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Robust time series forecasting with MLOps on Amazon SageMaker

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

A review of purpose-built accelerators for financial services

A Guide to Choose the Best Data Science Bootcamp

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

Comparing Tools For Data Processing Pipelines

Top 5 Fivetran Connectors for Healthcare

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Apache Kafka use cases: Driving innovation across diverse industries

11 Open-Source Data Engineering Tools Every Pro Should Use

How to Build a CI/CD MLOps Pipeline [Case Study]

MLOps Landscape in 2023: Top Tools and Platforms

Discover the Most Important Fundamentals of Data Engineering

3 Major Trends at Strata New York 2017

The Data Integration Solution Checklist: Top 10 Considerations

Serverless use cases: How enterprises are using the technology to let developers innovate

How to Shift from Data Science to Data Engineering

Introducing the DataRobot AI Cloud: A Closer Look

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Deployment of Data and ML Pipelines for the Most Chaotic Industry: The Stirred Rivers of Crypto

How to Manage Unstructured Data in AI and Machine Learning Projects

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

How to Monitor Costs in Snowflake

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Definite Guide to Building a Machine Learning Platform

Top 10 Python Scripts for use in Matillion for Snowflake

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Stay Connected