AI, AWS and Data Preparation - Data Science Current

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

However, companies are discovering that performing full fine tuning for these models with their data isnt cost effective. To reduce costs while continuing to use the power of AI , many companies have shifted to fine tuning LLMs on their domain-specific data using Parameter-Efficient Fine Tuning (PEFT).

AWS

AWS Clustering Deep Learning Deep Learning

A secure approach to generative AI with AWS

AWS Machine Learning Blog

APRIL 16, 2024

Generative artificial intelligence (AI) is transforming the customer experience in industries across the globe. They’re often used with highly sensitive business data, like personal data, compliance data, operational data, and financial information, to optimize the model’s output.

AWS

AWS AI AI ML

Using responsible AI principles with Amazon Bedrock Batch Inference

AWS Machine Learning Blog

NOVEMBER 21, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

AI

AI AI AWS Data Preparation

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. Within the data flow, add an Amazon S3 destination node.

Data Preparation

Data Preparation ML ML Data Quality

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

AWS Machine Learning Blog

FEBRUARY 27, 2025

In our previous blog posts, we explored various techniques such as fine-tuning large language models (LLMs), prompt engineering, and Retrieval Augmented Generation (RAG) using Amazon Bedrock to generate impressions from the findings section in radiology reports using generative AI. Part 1 focused on model fine-tuning.

AWS

AWS AI AI ML

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

Retrieval Augmented Generation (RAG) has become a crucial technique for improving the accuracy and relevance of AI-generated responses. Prerequisites Before proceeding with this tutorial, make sure you have the following in place: AWS account – You should have an AWS account with access to Amazon Bedrock.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

AWS Machine Learning Blog

FEBRUARY 1, 2024

Data is the foundation to capturing the maximum value from AI technology and solving business problems quickly. To unlock the potential of generative AI technologies, however, there’s a key prerequisite: your data needs to be appropriately prepared. Prerequisites Before starting, you need an AWS account.

Data Preparation

Data Preparation AWS AI AI

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

AWS Machine Learning Blog

AUGUST 20, 2024

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. About the Authors Charles Laughlin is a Principal AI Specialist at Amazon Web Services (AWS).

Data Preparation

Data Preparation ML ML AWS

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Optimize data preparation with new features in AWS SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 4, 2023

Data preparation is a critical step in any data-driven project, and having the right tools can greatly enhance operational efficiency. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for machine learning (ML) from weeks to minutes.

Data Preparation

Data Preparation AWS ML ML

Fine-tuning large language models (LLMs) for 2025

Dataconomy

NOVEMBER 11, 2024

Granite 3.0 : IBM launched open-source LLMs for enterprise AI 1. Fine-tuning large language models allows businesses to adapt AI to industry-specific needs 2. Data preparation for LLM fine-tuning Proper data preparation is key to achieving high-quality results when fine-tuning LLMs for specific purposes.

Data Preparation

Data Preparation Database Data Quality Machine Learning

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

This trend toward multimodality enhances the capabilities of AI systems in tasks like cross-modal retrieval, where a query in one modality (such as text) retrieves data in another modality (such as images or design files). All businesses, across industry and size, can benefit from multimodal AI search.

AWS

AWS Computer Science Computer Science Database

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 15, 2024

Traditionally, developers have had two options when working with SageMaker: the AWS SDK for Python , also known as boto3 , or the SageMaker Python SDK. For this walkthrough, we use a straightforward generative AI lifecycle involving data preparation, fine-tuning, and a deployment of Meta’s Llama-3-8B LLM.

Python

Python AWS ML ML

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

AWS

AWS Database ETL AI

5 Top Large Language Models & Generative AI Books

Towards AI

AUGUST 6, 2024

Author(s): Youssef Hosni Originally published on Towards AI. Master LLMs & Generative AI Through These Five Books This article reviews five key books that explore the rapidly evolving fields of large language models (LLMs) and generative AI, providing essential insights into these transformative technologies.

Natural Language Processing

Natural Language Processing AI AI AWS

Unlock proprietary data with Snorkel Flow and Amazon SageMaker

Snorkel AI

DECEMBER 2, 2024

We made this process much easier through Snorkel Flow’s integration with Amazon SageMaker and other tools and services from Amazon Web Services (AWS). Snorkel Flow: the AI data development platform Snorkel Flow accelerates AI development by focusing on data development. Here’s what that looks like in practice.

AWS

AWS Machine Learning Machine Learning Data Preparation

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

SageMaker Unied Studio is an integrated development environment (IDE) for data, analytics, and AI. Discover your data and put it to work using familiar AWS tools to complete end-to-end development workflows, including data analysis, data processing, model training, generative AI app building, and more, in a single governed environment.

SQL

SQL AWS Data Lakes AI

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

To pave the way for the growth of AI, BMW Group needed to make a leap regarding scalability and elasticity while reducing operational overhead, software licensing, and hardware management. These platforms were too limited regarding CPU, GPU, and memory to allow the scalability of AI at BMW Group.

ML

ML ML AWS AI

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

AWS Machine Learning Blog

MAY 31, 2024

In this blog post and open source project , we show you how you can pre-train a genomics language model, HyenaDNA , using your genomic data in the AWS Cloud. Amazon SageMaker Amazon SageMaker is a fully managed ML service offered by AWS, designed to reduce the time and cost associated with training and tuning ML models at scale.

AWS

AWS ML ML Machine Learning

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

AWS Machine Learning Blog

JANUARY 6, 2023

The recently published IDC MarketScape: Asia/Pacific (Excluding Japan) AI Life-Cycle Software Tools and Platforms 2022 Vendor Assessment positions AWS in the Leaders category. This was the first and only APEJ-specific analyst evaluation focused on AI life-cycle software from IDC. AWS position.

AWS

AWS ML ML Data Preparation

GenASL: Generative AI-powered American Sign Language avatars

AWS Machine Learning Blog

AUGUST 26, 2024

GenASL is a generative artificial intelligence (AI) -powered solution that translates speech or text into expressive ASL avatar animations, bridging the gap between spoken and written language and sign language. Users can input audio, video, or text into GenASL, which generates an ASL avatar video that interprets the provided data.

AWS

AWS AI AI ML

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Specifically, we cover the computer vision and artificial intelligence (AI) techniques used to combine datasets into a list of prioritized tasks for field teams to investigate and mitigate. Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images.

AWS

AWS Data Lakes ML ML

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

On December 6 th -8 th 2023, the non-profit organization, Tech to the Rescue , in collaboration with AWS, organized the world’s largest Air Quality Hackathon – aimed at tackling one of the world’s most pressing health and environmental challenges, air pollution. As always, AWS welcomes your feedback.

AWS

AWS AI AI Python

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface. Choose Create stack.

AWS

AWS Data Lakes Clustering Data Preparation

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

Fine-tuning is a powerful approach in natural language processing (NLP) and generative AI , allowing businesses to tailor pre-trained large language models (LLMs) for specific tasks. By fine-tuning, the LLM can adapt its knowledge base to specific data and tasks, resulting in enhanced task-specific capabilities.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

Govern generative AI in the enterprise with Amazon SageMaker Canvas

AWS Machine Learning Blog

SEPTEMBER 23, 2024

This simplifies access to generative artificial intelligence (AI) capabilities to business analysts and data scientists without the need for technical knowledge or having to write code, thereby accelerating productivity. Provide the AWS Region, account, and model IDs appropriate for your environment.

ML

ML ML AWS AI

Optimizing MLOps for Sustainability

AWS Machine Learning Blog

SEPTEMBER 11, 2024

AWS published Guidance for Optimizing MLOps for Sustainability on AWS to help customers maximize utilization and minimize waste in their ML workloads. The process begins with data preparation, followed by model training and tuning, and then model deployment and management. This leads to substantial resource consumption.

AWS

AWS Data Preparation ML ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker. The ETL pipeline, MLOps pipeline, and ML inference should be rebuilt in a different AWS account.

AWS

AWS ML ML ETL

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

This is where the AWS suite of low-code and no-code ML services becomes an essential tool. As a strategic systems integrator with deep ML experience, Deloitte utilizes the no-code and low-code ML tools from AWS to efficiently build and deploy ML models for Deloitte’s clients and for internal assets.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. An Amazon DataZone domain and an associated Amazon DataZone project configured in your AWS account. Choose Data Wrangler in the navigation pane.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 21, 2024

In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks.

AWS

AWS Data Preparation ML ML

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

This is a joint blog with AWS and Philips. Since 2014, the company has been offering customers its Philips HealthSuite Platform, which orchestrates dozens of AWS services that healthcare and life sciences companies use to improve patient care.

ML

ML ML AWS AI

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. This copy will remain until the custom model is deleted.

SQL

SQL AWS ML ML

Prioritizing employee well-being: An innovative approach with generative AI and Amazon SageMaker Canvas

AWS Machine Learning Blog

JUNE 3, 2024

SageMaker Data Wrangler has also been integrated into SageMaker Canvas, reducing the time it takes to import, prepare, transform, featurize, and analyze data. In a single visual interface, you can complete each step of a data preparation workflow: data selection, cleansing, exploration, visualization, and processing.

AWS

AWS ML ML AI

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 6, 2024

In this solution, we fine-tune a variety of models on Hugging Face that were pre-trained on medical data and use the BioBERT model, which was pre-trained on the Pubmed dataset and performs the best out of those tried. We implemented the solution using the AWS Cloud Development Kit (AWS CDK). BioBERT with HPO 0.89

AWS

AWS ML ML Data Preparation

How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart

Flipboard

DECEMBER 13, 2024

The explosion of data creation and utilization, paired with the increasing need for rapid decision-making, has intensified competition and unlocked opportunities within the industry. As of September 2024, the AI solution supports three core applications: Clearwater Intelligent Console (CWIC) Clearwaters customer-facing AI application.

Analytics

Analytics Analytics AI AI

The Ultimate Guide to Data Preparation for Machine Learning

DagsHub

FEBRUARY 29, 2024

Data, is therefore, essential to the quality and performance of machine learning models. This makes data preparation for machine learning all the more critical, so that the models generate reliable and accurate predictions and drive business value for the organization. Why do you need Data Preparation for Machine Learning?

Data Preparation

Data Preparation Machine Learning Machine Learning Data Governance

Advanced RAG patterns on Amazon SageMaker

AWS Machine Learning Blog

MARCH 28, 2024

These generative AI applications are not only used to automate existing business processes, but also have the ability to transform the experience for customers using these applications. For more information on Mixtral-8x7B Instruct on AWS, refer to Mixtral-8x7B is now available in Amazon SageMaker JumpStart.

AWS

AWS Machine Learning Machine Learning AI

Improve prediction quality in custom classification models with Amazon Comprehend

AWS Machine Learning Blog

OCTOBER 5, 2023

Artificial intelligence (AI) and machine learning (ML) have seen widespread adoption across enterprise and government organizations. Processing unstructured data has become easier with the advancements in natural language processing (NLP) and user-friendly AI/ML services like Amazon Textract , Amazon Transcribe , and Amazon Comprehend.

Data Preparation

Data Preparation ML ML AWS

Machine Learning with MATLAB and Amazon SageMaker

Flipboard

NOVEMBER 21, 2023

In recent years, MathWorks has brought many product offerings into the cloud, especially on Amazon Web Services (AWS). Here is a quick guide on how to run MATLAB on AWS. Installation of AWS Command-Line Interface (AWS CLI) , AWS Configure , and Python3. Set up AWS Configure to interact with AWS resources.

Machine Learning

Machine Learning Machine Learning AWS Decision Trees

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

Building a production-ready solution in AWS involves a series of trade-offs between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS.

AWS

AWS ML ML Machine Learning

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

Harnessing the power of big data has become increasingly critical for businesses looking to gain a competitive edge. From deriving insights to powering generative artificial intelligence (AI) -driven applications, the ability to efficiently process and analyze large datasets is a vital capability.

AWS

AWS Clustering Big Data Big Data

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

AWS Machine Learning Blog

AUGUST 4, 2023

This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. We’re excited to announce Amazon SageMaker and Salesforce Data Cloud integration. The inference endpoints are connected with Data Cloud to drive predictions in real time.

AWS

AWS ML ML AI

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

A secure approach to generative AI with AWS

Webinars

Trending Sources

Using responsible AI principles with Amazon Bedrock Batch Inference

Webinars

Accelerate data preparation for ML in Amazon SageMaker Canvas

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Optimize data preparation with new features in AWS SageMaker Data Wrangler

Fine-tuning large language models (LLMs) for 2025

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

Tackling AI’s data challenges with IBM databases on AWS

5 Top Large Language Models & Generative AI Books

Unlock proprietary data with Snorkel Flow and Amazon SageMaker

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

GenASL: Generative AI-powered American Sign Language avatars

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Improving air quality with generative AI

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Govern generative AI in the enterprise with Amazon SageMaker Canvas

Optimizing MLOps for Sustainability

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Prioritizing employee well-being: An innovative approach with generative AI and Amazon SageMaker Canvas

Deploy large language models for a healthtech use case on Amazon SageMaker

How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart

The Ultimate Guide to Data Preparation for Machine Learning

Advanced RAG patterns on Amazon SageMaker

Improve prediction quality in custom classification models with Amazon Comprehend

Machine Learning with MATLAB and Amazon SageMaker

Build well-architected IDP solutions with a custom lens – Part 2: Security

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

Stay Connected