AWS, Data Pipeline and Information - Data Science Current

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Data is one of the most critical assets of many organizations. Theyre constantly seeking ways to use their vast amounts of information to gain competitive advantages. This post explores how OMRON Europe is using Amazon Web Services (AWS) to build its advanced ODAP and its progress toward harnessing the power of generative AI.

AWS

AWS Data Governance Data Silos SQL

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. Create dbt models in dbt Cloud.

ETL

ETL Data Warehouse Analytics Analytics

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

While these models are trained on vast amounts of generic data, they often lack the organization-specific context and up-to-date information needed for accurate responses in business settings. Lets assume that the question What date will AWS re:invent 2024 occur? If the question was Whats the schedule for AWS events in December?,

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

AWS Machine Learning Blog

DECEMBER 4, 2024

SageMaker Unified Studio combines various AWS services, including Amazon Bedrock , Amazon SageMaker , Amazon Redshift , Amazon Glue , Amazon Athena , and Amazon Managed Workflows for Apache Airflow (MWAA) , into a comprehensive data and AI development platform. Navigate to the AWS Secrets Manager console and find the secret -api-keys.

AWS

AWS AI AI SQL

Amazon Aurora MySQL zero-ETL integration with Amazon Redshift is now generally available

Flipboard

NOVEMBER 7, 2023

“Data is at the center of every application, process, and business decision,” wrote Swami Sivasubramanian, VP of Database, Analytics, and Machine Learning at AWS, and I couldn’t agree more. A common pattern customers use today is to build data pipelines to move data from Amazon Aurora to Amazon Redshift.

ETL

ETL Data Pipeline Machine Learning Machine Learning

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS Machine Learning Blog

DECEMBER 4, 2024

It seems straightforward at first for batch data, but the engineering gets even more complicated when you need to go from batch data to incorporating real-time and streaming data sources, and from batch inference to real-time serving. You can also find Tecton at AWS re:Invent.

ML

ML ML AWS AI

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. AWS CodeBuild is a fully managed continuous integration service in the cloud.

AWS

AWS ML ML Machine Learning

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

AWS Machine Learning Blog

FEBRUARY 23, 2023

At the higher levels of automation (Level 2 and above), the AD system performs multiple functions: Data collection – The AV system gathers information about the vehicle’s surroundings in real time with centimeter accuracy. AV systems fuse data from the devices that are integrated together to build a comprehensive perception.

AWS

AWS ML ML Machine Learning

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. For more information, see Zeta Global’s home page. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker. These types of data are historical raw data from an ML perspective.

AWS

AWS ML ML ETL

Designing generative AI workloads for resilience

AWS Machine Learning Blog

FEBRUARY 1, 2024

Consider the following picture, which is an AWS view of the a16z emerging application stack for large language models (LLMs). This pipeline could be a batch pipeline if you prepare contextual data in advance, or a low-latency pipeline if you’re incorporating new contextual data on the fly.

AWS

AWS AI AI Database

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

SnapLogic uses Amazon Bedrock to build its platform, capitalizing on the proximity to data already stored in Amazon Web Services (AWS). Control plane and data plane implementation SnapLogic’s Agent Creator platform follows a decoupled architecture, separating the control plane and data plane for enhanced security and scalability.

AI

AI AI Database AWS

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

We also discuss different types of ETL pipelines for ML use cases and provide real-world examples of their use to help data engineers choose the right one. What is an ETL data pipeline in ML? Xoriant It is common to use ETL data pipeline and data pipeline interchangeably.

ETL

ETL Data Pipeline ML ML

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineer Data Engineering

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

Recent improvements in Generative AI based large language models (LLMs) have enabled their use in a variety of applications surrounding information retrieval. Given the data sources, LLMs provided tools that would allow us to build a Q&A chatbot in weeks, rather than what may have taken years previously, and likely with worse performance.

SQL

SQL AWS AI AI

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

With that, the need for data scientists and machine learning (ML) engineers has grown significantly. These skilled professionals are tasked with building and deploying models that improve the quality and efficiency of BMW’s business processes and enable informed leadership decisions.

ML

ML ML AWS AI

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

Through evaluations of sensors and informed decision-making support, Afri-SET empowers governments and civil society for effective air quality management. The attempt is disadvantaged by the current focus on data cleaning, diverting valuable skills away from building ML models for sensor calibration.

AWS

AWS AI Python AI

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications.

SQL

SQL Data Lakes Data Analyst AWS

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. For more information, refer to Preview: Use Amazon SageMaker to Build, Train, and Deploy ML Models Using Geospatial Data.

ML

ML ML AWS Data Pipeline

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 12, 2024

Furthermore, the democratization of AI and ML through AWS and AWS Partner solutions is accelerating its adoption across all industries. For example, a health-tech company may be looking to improve patient care by predicting the probability that an elderly patient may become hospitalized by analyzing both clinical and non-clinical data.

ML

ML ML AWS AI

Build a generative AI Slack chat assistant using Amazon Bedrock and Amazon Kendra

AWS Machine Learning Blog

OCTOBER 7, 2024

Despite the proliferation of information and data in business environments, employees and stakeholders often find themselves searching for information and struggling to get their questions answered quickly and efficiently. See Data source connectors for a list of supported data source connectors for Amazon Kendra.

AWS

AWS AI AI Natural Language Processing

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts. With this launch, account owners can grant access to select feature groups by other accounts using AWS Resource Access Manager (AWS RAM). Their task is to construct and oversee efficient data pipelines.

AWS

AWS ML ML Machine Learning

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

In this post, we discuss how to bring data stored in Amazon DocumentDB into SageMaker Canvas and use that data to build ML models for predictive analytics. Without creating and maintaining data pipelines, you will be able to power ML models with your unstructured data stored in Amazon DocumentDB.

Machine Learning

Machine Learning Machine Learning AWS ML

Foundational data protection for enterprise LLM acceleration with Protopia AI

AWS Machine Learning Blog

DECEMBER 5, 2023

AWS is especially well suited to provide enterprises the tools necessary for deploying LLMs at scale to enable critical decision-making. In their implementation of generative AI technology, enterprises have real concerns about data exposure and ownership of confidential information that may be sent to LLMs.

AI

AI AI AWS ML

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

SageMaker requires that the training data for an ML model be present either in Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS) or Amazon FSx for Lustre (for more information, refer to Access Training Data). Store your Snowflake account credentials in AWS Secrets Manager.

ML

ML ML AWS Python

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 21, 2024

By walking through this specific implementation, we aim to showcase how you can adapt batch inference to suit various data processing needs, regardless of the data source or nature. Prerequisites To use the batch inference feature, make sure you have satisfied the following requirements: An active AWS account.

AWS

AWS Data Preparation ML ML

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

Kafka And ETL Processing: You might be using Apache Kafka for high-performance data pipelines, stream various analytics data, or run company critical assets using Kafka, but did you know that you can also use Kafka clusters to move data between multiple systems. 5 Key Comparisons in Different Apache Kafka Architectures.

Apache Kafka

Apache Kafka ETL Data Lakes AWS

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

This makes managing and deploying these updates across a large-scale deployment pipeline while providing consistency and minimizing downtime a significant undertaking. Generative AI applications require continuous ingestion, preprocessing, and formatting of vast amounts of data from various sources.

ML

ML ML Python AWS

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. To do this, we provide an AWS CloudFormation template to create a stack that contains the resources.

ML

ML ML AWS Data Warehouse

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

SEPTEMBER 24, 2024

Working with the AWS Generative AI Innovation Center , DoorDash built a solution to provide Dashers with a low-latency self-service voice experience to answer frequently asked questions, reducing the need for live agent assistance, in just 2 months. “We You can deploy the solution in your own AWS account and try the example solution.

AWS

AWS AI AI Analytics

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

AWS Machine Learning Blog

NOVEMBER 1, 2023

Today, personally identifiable information (PII) is everywhere. It refers to any data or information that can be used to identify a specific individual. It’s a critical component of modern data management and cybersecurity practices. PII is in emails, slack messages, videos, PDFs, and so on.

AWS

AWS Machine Learning Machine Learning ML

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

In this post, we describe how AWS Partner Airis Solutions used Amazon Lookout for Equipment , AWS Internet of Things (IoT) services, and CloudRail sensor technologies to provide a state-of-the-art solution to address these challenges. It’s an easy way to run analytics on IoT data to gain accurate insights.

AWS

AWS ML ML Machine Learning

Robust time series forecasting with MLOps on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 28, 2023

This empowers you to make well-informed and responsive decisions based on the most recent data. By following these steps, businesses can harness the power of robust time series forecasting to make informed decisions and stay ahead in a rapidly changing environment. The following diagram illustrates the inference pipeline.

AWS

AWS Machine Learning Machine Learning ML

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

AWS Machine Learning Blog

DECEMBER 6, 2023

This method of enriching the LLM generation context with information retrieved from your internal data sources is called Retrieval Augmented Generation (RAG), and produces assistants that are domain specific and more trustworthy, as shown by Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.

SQL

SQL AWS Analytics Analytics

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Flipboard

NOVEMBER 24, 2023

In this post, we show you how SnapLogic , an AWS customer, used Amazon Bedrock to power their SnapGPT product through automated creation of these complex DSL artifacts from human language. SnapLogic background SnapLogic is an AWS customer on a mission to bring enterprise automation to the world.

Database

Database AWS ETL SQL

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

which play a crucial role in building end-to-end data pipelines, to be included in your CI/CD pipelines. These practices also establish a unified and reliable source of information for all changes, ensuring that the history of changes is readily accessible for auditing purposes.

Data Pipeline

Data Pipeline Database SQL Data Engineer

Navigating the Big Data Frontier: A Guide to Efficient Handling

Women in Big Data

OCTOBER 9, 2024

With the explosive growth of big data over the past decade and the daily surge in data volumes, it’s essential to have a resilient system to manage the vast influx of information without failures. The success of any data initiative hinges on the robustness and flexibility of its big data pipeline.

Big Data

Big Data Big Data Apache Kafka Data Pipeline

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

APRIL 5, 2024

By providing authorities with the tools and insights they need to make informed decisions about environmental and social impact, Gramener is playing a vital role in building a more sustainable future. Discover the potential of integrating Earth observation data in your sustainability projects with SageMaker.

Clustering

Clustering ML ML AWS

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

Data contains information, and information can be used to predict future behaviors, from the buying habits of customers to securities returns. The financial services industry (FSI) is no exception to this, and is a well-established producer and consumer of data and analytics.

AWS

AWS ML ML Clustering

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

The full code can be found on the aws-samples-for-ray GitHub repository. It integrates smoothly with other data processing libraries like Spark, Pandas, NumPy, and more, as well as ML frameworks like TensorFlow and PyTorch. This allows building end-to-end data pipelines and ML workflows on top of Ray.

Machine Learning

Machine Learning Machine Learning ML ML

Announcing DagsHub Support for DVC 3.0

DagsHub

SEPTEMBER 28, 2023

DagsHub integration with DVC includes a fully configured remote object storage managed by DVC, showing and diffing DVC tracked files hosted on DagsHub Storage or S3 compatible, and Data Pipeline visualization. This means you can see your data and diff it, next to your code file, and create a single source of truth for your project.

Data Pipeline

Data Pipeline AWS

Big Data vs. Data Science: Demystifying the Buzzwords

Pickl AI

APRIL 21, 2025

Summary: Big Data refers to the vast volumes of structured and unstructured data generated at high speed, requiring specialized tools for storage and processing. Data Science, on the other hand, uses scientific methods and algorithms to analyses this data, extract insights, and inform decisions.

Big Data

Big Data Big Data Data Science Machine Learning

2021 Data/AI Salary Survey

O'Reilly Media

SEPTEMBER 15, 2021

Cloud certifications, specifically in AWS and Microsoft Azure, were most strongly associated with salary increases. As we’ll see later, cloud certifications (specifically in AWS and Microsoft Azure) were the most popular and appeared to have the largest effect on salaries. Certified Information Systems Security Professional a.k.a.

AI

AI AI Azure AWS

Shaping the future: OMRON’s data-driven journey with AWS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Webinars

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Amazon Aurora MySQL zero-ETL integration with Amazon Redshift is now generally available

Real value, real time: Production AI with Amazon SageMaker and Tecton

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Designing generative AI workloads for resilience

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

How to Build ETL Data Pipeline in ML

How to Build Effective Data Pipelines in Snowpark

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Improving air quality with generative AI

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

Build a generative AI Slack chat assistant using Amazon Bedrock and Amazon Kendra

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

Foundational data protection for enterprise LLM acceleration with Protopia AI

Use Snowflake as a data source to train ML models with Amazon SageMaker

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

Robust time series forecasting with MLOps on Amazon SageMaker

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

Navigating the Big Data Frontier: A Guide to Efficient Handling

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

A review of purpose-built accelerators for financial services

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Announcing DagsHub Support for DVC 3.0

Big Data vs. Data Science: Demystifying the Buzzwords

2021 Data/AI Salary Survey

Stay Connected