AWS, Data Preparation and Database - Data Science Current

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

To simplify infrastructure setup and accelerate distributed training, AWS introduced Amazon SageMaker HyperPod in late 2023. In this blog post, we showcase how you can perform efficient supervised fine tuning for a Meta Llama 3 model using PEFT on AWS Trainium with SageMaker HyperPod. architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/

AWS

AWS Clustering Deep Learning Deep Learning

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. Within the data flow, add an Amazon S3 destination node.

Data Preparation

Data Preparation ML ML Data Quality

Fine-tuning large language models (LLMs) for 2025

Dataconomy

NOVEMBER 11, 2024

RAG helps models access a specific library or database, making it suitable for tasks that require factual accuracy. What is Retrieval-Augmented Generation (RAG) and when to use it Retrieval-Augmented Generation (RAG) is a method that integrates the capabilities of a language model with a specific library or database.

Data Preparation

Data Preparation Database Data Quality Machine Learning

Webinars

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage. Also, traditional database management tasks, including backups, upgrades and routine maintenance drain valuable time and resources, hindering innovation.

AWS

AWS Database ETL AI

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

Multimodal Retrieval Augmented Generation (MM-RAG) is emerging as a powerful evolution of traditional RAG systems, addressing limitations and expanding capabilities across diverse data types. Traditionally, RAG systems were text-centric, retrieving information from large text databases to provide relevant context for language models.

AWS

AWS Computer Science Computer Science Database

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

AWS Machine Learning Blog

FEBRUARY 27, 2025

Lets examine the key components of this architecture in the following figure, following the data flow from left to right. The workflow consists of the following phases: Data preparation Our evaluation process begins with a prompt dataset containing paired radiology findings and impressions.

AWS

AWS AI AI ML

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

Prerequisites Before proceeding with this tutorial, make sure you have the following in place: AWS account – You should have an AWS account with access to Amazon Bedrock. Knowledge base – You need a knowledge base created in Amazon Bedrock with ingested data and metadata. model in Amazon Bedrock.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Yes, the AWS re:Invent season is upon us and as always, the place to be is Las Vegas! are the sessions dedicated to AWS DeepRacer ! Generative AI is at the heart of the AWS Village this year. You marked your calendars, you booked your hotel, and you even purchased the airfare. And last but not least (and always fun!)

AWS

AWS ML ML AI

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. AWS CodeBuild is a fully managed continuous integration service in the cloud.

AWS

AWS ML ML Machine Learning

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

SageMaker Unied Studio is an integrated development environment (IDE) for data, analytics, and AI. Discover your data and put it to work using familiar AWS tools to complete end-to-end development workflows, including data analysis, data processing, model training, generative AI app building, and more, in a single governed environment.

SQL

SQL AWS Data Lakes AI

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Solution overview With SageMaker Studio JupyterLab notebook’s SQL integration, you can now connect to popular data sources like Snowflake, Athena, Amazon Redshift, and Amazon DataZone. For example, you can visually explore data sources like databases, tables, and schemas directly from your JupyterLab ecosystem.

SQL

SQL AWS Database Data Scientist

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS.

ML

ML ML AWS AI

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

At AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With Knowledge Bases for Amazon Bedrock, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for fully managed Retrieval Augmented Generation (RAG).

AWS

AWS Database Python AI

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. An Amazon DataZone domain and an associated Amazon DataZone project configured in your AWS account. Choose Data Wrangler in the navigation pane.

Machine Learning

Machine Learning Machine Learning Data Governance ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker. The ETL pipeline, MLOps pipeline, and ML inference should be rebuilt in a different AWS account.

AWS

AWS ML ML ETL

GenASL: Generative AI-powered American Sign Language avatars

AWS Machine Learning Blog

AUGUST 26, 2024

AWS makes it possible for organizations of all sizes and developers of all skill levels to build and scale generative AI applications with security, privacy, and responsible AI. In this post, we dive into the architecture and implementation details of GenASL, which uses AWS generative AI capabilities to create human-like ASL avatar videos.

AWS

AWS AI AI ML

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the Amazon Web Services (AWS) tools without having to manage infrastructure. The following diagram depicts a high-level RAG architecture. Choose Next.

AWS

AWS Machine Learning Machine Learning Database

Implement real-time personalized recommendations using Amazon Personalize

AWS Machine Learning Blog

NOVEMBER 13, 2023

The post Architecting near real-time personalized recommendations with Amazon Personalize shows how to architect near real-time personalized recommendations using Amazon Personalize and AWS purpose-built data services. For this particular use case, you will be uploading interactions data and items data.

AWS

AWS Data Preparation ML ML

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

On December 6 th -8 th 2023, the non-profit organization, Tech to the Rescue , in collaboration with AWS, organized the world’s largest Air Quality Hackathon – aimed at tackling one of the world’s most pressing health and environmental challenges, air pollution. This allows for data to be aggregated for further manufacturer-agnostic analysis.

AWS

AWS AI Python AI

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

Building a production-ready solution in AWS involves a series of trade-offs between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS.

AWS

AWS ML ML Machine Learning

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is a cloud data platform that provides data solutions for data warehousing to data science. Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics.

AWS

AWS Data Preparation Azure Data Scientist

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

To address this challenge, AWS recently announced the preview of Amazon Bedrock Custom Model Import , a feature that you can use to import customized models created in other environments—such as Amazon SageMaker , Amazon Elastic Compute Cloud (Amazon EC2) instances, and on premises—into Amazon Bedrock.

SQL

SQL AWS ML ML

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. This same interface is also used for provisioning EMR clusters.

AWS

AWS Clustering Big Data Big Data

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

AWS Machine Learning Blog

JULY 11, 2024

RAG provides additional knowledge to the LLM through its input prompt space and its architecture typically consists of the following components: Indexing : Prepare a corpus of unstructured text, parse and chunk it, and then, embed each chunk and store it in a vector database. client('s3') # Get the region name session = boto3.Session()

AWS

AWS ML ML Machine Learning

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. If you’re familiar with SageMaker and writing Spark code, option B could be your choice.

ML

ML ML AWS Data Warehouse

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

AWS Machine Learning Blog

MARCH 10, 2023

This blog post will go through how data professionals may use SageMaker Data Wrangler’s visual interface to locate and connect to existing Amazon EMR clusters with Hive endpoints. To get ready for modeling or reporting, they can visually analyze the database, tables, schema, and author Hive queries to create the ML dataset.

Clustering

Clustering AWS ML ML

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Data preparation is important at multiple stages in Retrieval Augmented Generation ( RAG ) models. Below, we show how you can do all these main preprocessing steps from Amazon SageMaker Data Wrangler : Extracting text from a PDF document (powered by Textract) Remove sensitive information (powered by Comprehend) Chunk text into pieces.

Data Preparation

Data Preparation AI AI Python

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. Here, we predict whether an order is a high_value_order or a low_value_order based on the orderpriority as given from the TPC-H data.

ML

ML ML AWS Machine Learning

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

The final retrieval augmentation workflow covers the following high-level steps: The user query is used for a retriever component, which does a vector search, to retrieve the most relevant context from our database. A vector database provides efficient vector similarity search by providing specialized indexes like k-NN indexes.

AWS

AWS Database AI AI

Migrating From AWS Redshift to Snowflake: 2 Methods to Explore

phData

FEBRUARY 9, 2023

Welcome to our AWS Redshift to the Snowflake Data Cloud migration blog! In this blog, we’ll walk you through the process of migrating your data from AWS Redshift to the Snowflake Data Cloud. As an experienced data engineering consulting company, phData has helped with numerous migrations to Snowflake.

AWS

AWS ETL Data Preparation SQL

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

This allows you to create unique views and filters, and grants management teams access to a streamlined, one-click dashboard without needing to log in to the AWS Management Console and search for the appropriate dashboard. On the AWS CloudFormation console, create a new stack. amazonaws.com docker build -t. docker tag :latest.dkr.ecr.us-east-1.amazonaws.com/

ML

ML ML AWS Database

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

AWS Machine Learning Blog

MAY 23, 2023

We finish with a case study highlighting the benefits realize by a large AWS and PwC customer who implemented this solution. Solution overview AWS offers a comprehensive portfolio of cloud-native services for developing and running MLOps pipelines in a scalable and sustainable manner. The following diagram illustrates the workflow.

Machine Learning

Machine Learning Machine Learning AWS ML

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

Examples of other PBAs now available include AWS Inferentia and AWS Trainium , Google TPU, and Graphcore IPU. Around this time, industry observers reported NVIDIA’s strategy pivoting from its traditional gaming and graphics focus to moving into scientific computing and data analytics.

AWS

AWS ML ML Clustering

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Becoming Human

MAY 15, 2023

One is a scripting language such as Python, and the other is a Query language like SQL (Structured Query Language) for SQL Databases. Python is a High-level, Procedural, and object-oriented language; it is also a vast language itself, and covering the whole of Python is one the worst mistakes we can make in the data science journey.

Data Science

Data Science Machine Learning Machine Learning Database

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 22, 2023

Amazon SageMaker Canvas is a low-code/no-code ML service that enables business analysts to perform data preparation and transformation, build ML models, and deploy these models into a governed workflow. Prerequisites For this post, you should complete the following prerequisites: Have an AWS account. Set up SageMaker Canvas.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

The following is an example of notable proprietary FMs available in AWS (July 2023). The following is an example of notable open-source FM available in AWS (July 2023). Additions are required in historical data preparation, model evaluation, and monitoring. The following figure illustrates their journey.

AI

AI AI ML ML

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Visual modeling: Delivers easy-to-use workflows for data scientists to build data preparation and predictive machine learning pipelines that include text analytics, visualizations and a variety of modeling methods. foundation models to help users discover, augment, and enrich data with natural language. .”

AI

AI AI Machine Learning Machine Learning

Build a multimodal social media content generator using Amazon Bedrock

AWS Machine Learning Blog

SEPTEMBER 25, 2024

Solution overview In this solution, we start with data preparation, where the raw datasets can be stored in an Amazon Simple Storage Service (Amazon S3) bucket. We provide a Jupyter notebook to preprocess the raw data and use the Amazon Titan Multimodal Embeddings model to convert the image and text into embedding vectors.

AWS

AWS K-nearest Neighbors ML ML

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Talend Talend is a leading data integration platform known for its extensive tools for transforming, cleansing, and integrating data across multiple sources. It integrates well with cloud services, databases, and big data platforms like Hadoop, making it suitable for various data environments.

Data Quality

Data Quality AWS Machine Learning Machine Learning

Top Ten Power BI Alternatives For Your Data Needs

Pickl AI

NOVEMBER 18, 2024

Advanced tools like AWS QuickSight support large datasets and growing businesses. Microsoft Power BI is a comprehensive business intelligence (BI) tool designed to help organisations turn raw data into meaningful insights. SQL Integration : Easily integrates with most SQL databases via SQL Alchemy. What is Power BI?

Power BI

Power BI Tableau Data Analysis Data Analysis

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. SageMaker Studio offers built-in algorithms, automated model tuning, and seamless integration with AWS services, making it a powerful platform for developing and deploying machine learning solutions at scale.

Machine Learning

Machine Learning Machine Learning ML ML

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Accelerate data preparation for ML in Amazon SageMaker Canvas

Webinars

Trending Sources

Fine-tuning large language models (LLMs) for 2025

Webinars

Tackling AI’s data challenges with IBM databases on AWS

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Your guide to generative AI and ML at AWS re:Invent 2023

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

GenASL: Generative AI-powered American Sign Language avatars

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Implement real-time personalized recommendations using Amazon Personalize

Improving air quality with generative AI

Build well-architected IDP solutions with a custom lens – Part 2: Security

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

Migrating From AWS Redshift to Snowflake: 2 Methods to Explore

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

A review of purpose-built accelerators for financial services

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Exploring the AI and data capabilities of watsonx

Build a multimodal social media content generator using Amazon Bedrock

Popular Data Transformation Tools: Importance and Best Practices

Top Ten Power BI Alternatives For Your Data Needs

MLOps Landscape in 2023: Top Tools and Platforms

Stay Connected