AWS, Data Pipeline and Definition - Data Science Current

Architect a mature generative AI foundation on AWS

Flipboard

MAY 30, 2025

Scaling and load balancing The gateway can handle load balancing across different servers, model instances, or AWS Regions so that applications remain responsive. The AWS Solutions Library offers solution guidance to set up a multi-provider generative AI gateway. Leave us a comment and we will be glad to collaborate.

AWS

AWS AI AI Database

Build a conversational data assistant, Part 1: Text-to-SQL with Amazon Bedrock Agents

AWS Machine Learning Blog

JULY 11, 2025

Large language models can transform how we bridge the gap between business questions and actionable data insights. For most organizations, this gap remains stubbornly wide, with business teams trapped in endless cycles—decoding metric definitions and hunting for the correct data sources to manually craft each SQL query.

SQL

SQL AWS Database Business Intelligence

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

AWS Machine Learning Blog

OCTOBER 24, 2024

Amazon Bedrock Agents is instrumental in customization and tailoring apps to help meet specific project requirements while protecting private data and securing their applications. These agents work with AWS managed infrastructure capabilities and Amazon Bedrock , reducing infrastructure management overhead.

AWS

AWS SQL Database AI

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Build a conversational data assistant, Part 2 – Embedding generative business intelligence with Amazon Q in QuickSight

AWS Machine Learning Blog

JULY 11, 2025

This creates a seamless bridge between learning about metrics and visualizing them—users can start with simple queries about metric definitions and quickly transition to data exploration without reformulating their requests. Lakshdeep Vatsa is a Senior Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team.

Business Intelligence

Business Intelligence Business Intelligence SQL AWS

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

In addition to its groundbreaking AI innovations, Zeta Global has harnessed Amazon Elastic Container Service (Amazon ECS) with AWS Fargate to deploy a multitude of smaller models efficiently. It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines.

AWS

AWS Machine Learning Machine Learning ML

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. To do this, we provide an AWS CloudFormation template to create a stack that contains the resources.

ML

ML ML AWS Data Warehouse

Generate training data and cost-effectively train categorical models with Amazon Bedrock

AWS Machine Learning Blog

MARCH 27, 2025

Designing the prompt Before starting any scaled use of generative AI, you should have the following in place: A clear definition of the problem you are trying to solve along with the end goal. If prompted, set up a user profile for SageMaker Studio by providing a user name and specifying AWS Identity and Access Management (IAM) permissions.

AWS

AWS ETL ML ML

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

SEPTEMBER 24, 2024

Working with the AWS Generative AI Innovation Center , DoorDash built a solution to provide Dashers with a low-latency self-service voice experience to answer frequently asked questions, reducing the need for live agent assistance, in just 2 months. “We You can deploy the solution in your own AWS account and try the example solution.

AWS

AWS AI AI Analytics

Data Ingestion from PostgreSQL to Snowflake using Openflow

phData

JUNE 30, 2025

What we like most about Openflow is that it simplifies data ingestion from multiple sources and accelerates Snowflake customers’ success by eliminating the need for third-party ingestion tools, enabling quick prototyping, and supporting reusable data pipelines. This will be the starting point of the flow.

Database

Database ETL AWS Data Pipeline

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

This makes managing and deploying these updates across a large-scale deployment pipeline while providing consistency and minimizing downtime a significant undertaking. Generative AI applications require continuous ingestion, preprocessing, and formatting of vast amounts of data from various sources.

ML

ML ML Python AWS

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

Your data scientists develop models on this component, which stores all parameters, feature definitions, artifacts, and other experiment-related information they care about for every experiment they run. Machine Learning Operations (MLOps): Overview, Definition, and Architecture (by Kreuzberger, et al., AIIA MLOps blueprints.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

The full code can be found on the aws-samples-for-ray GitHub repository. It integrates smoothly with other data processing libraries like Spark, Pandas, NumPy, and more, as well as ML frameworks like TensorFlow and PyTorch. This allows building end-to-end data pipelines and ML workflows on top of Ray.

Machine Learning

Machine Learning Machine Learning ML ML

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

Examples of other PBAs now available include AWS Inferentia and AWS Trainium , Google TPU, and Graphcore IPU. Around this time, industry observers reported NVIDIA’s strategy pivoting from its traditional gaming and graphics focus to moving into scientific computing and data analytics.

AWS

AWS ML ML Clustering

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

To get a better grip on those changes we reviewed over 25,000 data scientist job descriptions from that past year to find out what employers are looking for in 2023. Much of what we found was to be expected, though there were definitely a few surprises. You’ll see specific tools in the next section.

Data Science

Data Science Data Scientist Computer Science Computer Science

Build a Stocks Price Prediction App powered by Snowflake, AWS, Python and Streamlit?—?Part 2 of 3

Mlearning.ai

MARCH 15, 2023

Build a Stocks Price Prediction App powered by Snowflake, AWS, Python and Streamlit — Part 2 of 3 A comprehensive guide to develop machine learning applications from start to finish. Introduction Welcome Back, Let's continue with our Data Science journey to create the Stock Price Prediction web application.

Python

Python AWS Exploratory Data Analysis EDA

Cookiecutter Data Science V2

DrivenData Labs

MAY 21, 2024

Hello from our new, friendly, welcoming, definitely not an AI overlord cookie logo! Some projects manage this folder like the data folder and sync it to a canonical store (e.g., AWS S3) separately from source code. The second is to provide a directed acyclic graph (DAG) for data pipelining and model building.

Data Science

Data Science Python Data Scientist Data Warehouse

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

AWS provides several tools to create and manage ML model deployments. 2 If you are somewhat familiar with AWS ML base tools, the first thing that comes to mind is “Sagemaker”. AWS Sagemeaker is in fact a great tool for machine learning operations (MLOps) to automate and standardize processes across the ML lifecycle. S3 buckets.

AWS

AWS ETL ML ML

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Summary: Choosing the right ETL tool is crucial for seamless data integration. Top contenders like Apache Airflow and AWS Glue offer unique features, empowering businesses with efficient workflows, high data quality, and informed decision-making capabilities. Read Further: Azure Data Engineer Jobs.

ETL

ETL Data Pipeline Data Quality Data Warehouse

How to Ingest Salesforce Data into Snowflake Using Salesforce Sync Out

phData

SEPTEMBER 15, 2023

Salesforce Sync Out is a crucial tool that enables businesses to transfer data from their Salesforce platform to external systems like Snowflake, AWS S3, and Azure ADLS. The Salesforce Sync Out connector moves Salesforce data directly into Snowflake, simplifying the data pipeline and reducing latency.

Data Warehouse

Data Warehouse Tableau Data Silos Analytics

What are Snowflake Hybrid Tables, and What Workloads Do They Support?

phData

MARCH 26, 2024

However, it is now available in public preview in specific AWS regions, excluding trial accounts. The real benefit of utilizing Hybrid tables is that they bring transactional and analytical data together in a single platform. Hybrid tables can streamline data pipelines, reduce costs, and unlock deeper insights from data.

Clustering

Clustering Internet of Things Analytics Analytics

Gen AI 101: Technology Choices (Part 1)

phData

JULY 5, 2024

The generative AI solutions from GCP Vertex AI, AWS Bedrock, Azure AI, and Snowflake Cortex all provide access to a variety of industry-leading foundational models. It is definitely an exciting time as the open-source community enhances and builds out these frameworks, but they are still being refined with best practices and new features.

AI

AI AI Database AWS

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

Image generated with Midjourney In today’s fast-paced world of data science, building impactful machine learning models relies on much more than selecting the best algorithm for the job. Data scientists and machine learning engineers need to collaborate to make sure that together with the model, they develop robust data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Managing Dataset Versions in Long-Term ML Projects

The MLOps Blog

MARCH 20, 2023

However, in scenarios where dataset versioning solutions are leveraged, there can still be various challenges experienced by ML/AI/Data teams. Data aggregation: Data sources could increase as more data points are required to train ML models. Existing data pipelines will have to be modified to accommodate new data sources.

ML

ML ML Machine Learning Machine Learning

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

GPT-4 Data Pipelines: Transform JSON to SQL Schema Instantly Blockstream’s public Bitcoin API. The data would be interesting to analyze. From Data Engineering to Prompt Engineering Prompt to do data analysis BI report generation/data analysis In BI/data analysis world, people usually need to query data (small/large).

AI

AI AI Data Analysis Data Analysis

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

However, Snowflake runs better on Azure than it does on AWS – so even though it’s not the ideal situation, Microsoft still sees Azure consumption when organizations host Snowflake on Azure. The most commonly used functions that you lose when using DirectQuery are Time Intelligence functions such as TOTALYTD, DATESYTD, and EOMONTH.

Power BI

Power BI Analytics Analytics Azure

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

Through this unified query capability, you can create comprehensive insights into customer transaction patterns and purchase behavior for active products without the traditional barriers of data silos or the need to copy data between systems. Environments are the actual data infrastructure behind a project.

SQL

SQL Data Analyst Data Warehouse AWS

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Key Advantages of Governance Simplified Change Managment: The complexity of the underlying systems is abstracted away from the user, allowing them to simply and declaratively build and change data pipelines. This reduces risk, enables automation, and allows for less technical users to assist in the development process.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Ask HN: Who wants to be hired? (July 2025)

Hacker News

JULY 1, 2025

Prior to that, I spent a couple years at First Orion - a smaller data company - helping found & build out a data engineering team as one of the first engineers. We were focused on building data pipelines and models to protect our users from malicious phonecalls. Some: React, IoT, bit o elm, ML, LLM ops and auotmation.

Python

Python AWS SQL ML

Ask HN: Who is hiring? (July 2025)

Hacker News

JULY 1, 2025

On the backend we're using 100% Go with AWS primitives. Stack : Python/Django, JavaScript, VueJS, PostgreSQL, Snowflake, Docker, Git, AWS, AI/LLM integrations (OpenAI & Gemini). All on Serverless AWS. Designing AI data pipelines to process billions of data points.

Python

Python AWS ML ML

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

Internally within Netflix’s engineering team, Meson was built to manage, orchestrate, schedule, and execute workflows within ML/Data pipelines. Meson managed the lifecycle of ML pipelines, providing functionality such as recommendations and content analysis, and leveraged the Single Leader Architecture.

ML

ML ML Machine Learning Machine Learning

Ask HN: What Are You Working On? (June 2025)

Hacker News

JUNE 29, 2025

reply jasondc 10 hours ago | parent | prev | next [–] Really cool, definitely donating to a few products! This is a great idea and definitely fits into the vision. reply ttd 11 hours ago | root | parent | next [–] Some type of programmatic diagram creation is definitely something I'm interested in supporting. Best of luck.

AI

AI AI Database Python

Data Science Current

Architect a mature generative AI foundation on AWS

Build a conversational data assistant, Part 1: Text-to-SQL with Amazon Bedrock Agents

Webinars

Trending Sources

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

Webinars

Build a conversational data assistant, Part 2 – Embedding generative business intelligence with Amazon Q in QuickSight

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Generate training data and cost-effectively train categorical models with Amazon Bedrock

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

Data Ingestion from PostgreSQL to Snowflake using Openflow

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

Definite Guide to Building a Machine Learning Platform

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

A review of purpose-built accelerators for financial services

40 Must-Know Data Science Skills and Frameworks for 2023

Build a Stocks Price Prediction App powered by Snowflake, AWS, Python and Streamlit?—?Part 2 of 3

Cookiecutter Data Science V2

How to Build a CI/CD MLOps Pipeline [Case Study]

Top ETL Tools: Unveiling the Best Solutions for Data Integration

How to Ingest Salesforce Data into Snowflake Using Salesforce Sync Out

What are Snowflake Hybrid Tables, and What Workloads Do They Support?

Gen AI 101: Technology Choices (Part 1)

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

How to Manage Unstructured Data in AI and Machine Learning Projects

Managing Dataset Versions in Long-Term ML Projects

Generative AI in Software Development

How to Optimize Power BI and Snowflake for Advanced Analytics

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

The Ultimate Modern Data Stack Migration Guide

Ask HN: Who wants to be hired? (July 2025)

Ask HN: Who is hiring? (July 2025)

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Ask HN: What Are You Working On? (June 2025)

Stay Connected