AI, AWS and Clustering - Data Science Current

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

To reduce costs while continuing to use the power of AI , many companies have shifted to fine tuning LLMs on their domain-specific data using Parameter-Efficient Fine Tuning (PEFT). Manually managing such complexity can often be counter-productive and take away valuable resources from your businesses AI development.

AWS

AWS Clustering Deep Learning Deep Learning

Racing into the future: How AWS DeepRacer fueled my AI and ML journey

AWS Machine Learning Blog

NOVEMBER 19, 2024

In 2018, I sat in the audience at AWS re:Invent as Andy Jassy announced AWS DeepRacer —a fully autonomous 1/18th scale race car driven by reinforcement learning. At the time, I knew little about AI or machine learning (ML). seconds, securing the 2018 AWS DeepRacer grand champion title!

AWS

AWS ML ML AI

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM

AWS Machine Learning Blog

NOVEMBER 26, 2024

AWS Trainium and AWS Inferentia based instances, combined with Amazon Elastic Kubernetes Service (Amazon EKS), provide a performant and low cost framework to run LLMs efficiently in a containerized environment. Solution overview The steps to implement the solution are as follows: Create the EKS cluster.

AWS

AWS Clustering ML ML

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Build a reverse image search engine with Amazon Titan Multimodal Embeddings in Amazon Bedrock and AWS managed services

AWS Machine Learning Blog

NOVEMBER 13, 2024

In the context of generative AI , significant progress has been made in developing multimodal embedding models that can embed various data modalities—such as text, image, video, and audio data—into a shared vector space. The AWS Command Line Interface (AWS CLI) installed on your machine to upload the dataset to Amazon S3.

AWS

AWS Database K-nearest Neighbors AI

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Prerequisites Before you begin, make sure you have the following prerequisites in place: An AWS account and role with the AWS Identity and Access Management (IAM) privileges to deploy the following resources: IAM roles. For this post we’ll use a provisioned Amazon Redshift cluster. A SageMaker domain. Database name : Enter dev.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

ByteDance processes billions of daily videos using their multimodal video understanding models on AWS Inferentia2

AWS Machine Learning Blog

FEBRUARY 26, 2025

At ByteDance, we collaborated with Amazon Web Services (AWS) to deploy multimodal large language models (LLMs) for video understanding using AWS Inferentia2 across multiple AWS Regions around the world. The need for AI systems capable of processing various content forms has become increasingly apparent.

AWS

AWS ML ML Clustering

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

AWS Machine Learning Blog

JULY 25, 2024

In the post, we introduce the AWS Neuron node problem detector and recovery DaemonSet for AWS Trainium and AWS Inferentia on Amazon Elastic Kubernetes Service (Amazon EKS). eks-5e0fdde Install the required AWS Identity and Access Management (IAM) role for the service account and the node problem detector plugin.

Clustering

Clustering AWS ML ML

Customize DeepSeek-R1 distilled models using Amazon SageMaker HyperPod recipes – Part 1

AWS Machine Learning Blog

MARCH 3, 2025

Increasingly, organizations across industries are turning to generative AI foundation models (FMs) to enhance their applications. The launcher interfaces with underlying cluster management systems such as SageMaker HyperPod (Slurm or Kubernetes) or training jobs, which handle resource allocation and scheduling. recipes=recipe-name.

Clustering

Clustering AWS ML ML

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS Machine Learning Blog

DECEMBER 4, 2024

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. Only 54% of ML prototypes make it to production, and only 5% of generative AI use cases make it to production. This post is cowritten with Isaac Cameron and Alex Gnibus from Tecton.

ML

ML ML AWS AI

AWS at NVIDIA GTC 2024: Accelerate innovation with generative AI on AWS

AWS Machine Learning Blog

APRIL 11, 2024

AWS was delighted to present to and connect with over 18,000 in-person and 267,000 virtual attendees at NVIDIA GTC, a global artificial intelligence (AI) conference that took place March 2024 in San Jose, California, returning to a hybrid, in-person experience for the first time since 2019.

AWS

AWS AI AI Clustering

Map Earth’s vegetation in under 20 minutes with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 16, 2024

Although setting up a processing cluster is an alternative, it introduces its own set of complexities, from data distribution to infrastructure management. We use the purpose-built geospatial container with SageMaker Processing jobs for a simplified, managed experience to create and run a cluster. format("/".join(tile_prefix),

ML

ML ML Clustering Machine Learning

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

With the current housing shortage and affordability concerns, Rocket simplifies the homeownership process through an intuitive and AI-driven experience. Communication between the two systems was established through Kerberized Apache Livy (HTTPS) connections over AWS PrivateLink.

Data Science

Data Science AWS Hadoop Data Scientist

Get started quickly with AWS Trainium and AWS Inferentia using AWS Neuron DLAMI and AWS Neuron DLC

AWS Machine Learning Blog

JUNE 11, 2024

Starting with the AWS Neuron 2.18 release , you can now launch Neuron DLAMIs (AWS Deep Learning AMIs) and Neuron DLCs (AWS Deep Learning Containers) with the latest released Neuron packages on the same day as the Neuron SDK release. AWS DLCs provide a set of Docker images that are pre-installed with deep learning frameworks.

AWS

AWS Deep Learning Deep Learning ML

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

AWS Machine Learning Blog

MARCH 10, 2025

The intersection of AI and financial analysis presents a compelling opportunity to transform how investment professionals access and use credit intelligence, leading to more efficient decision-making processes and better risk management outcomes. It became apparent that a cost-effective solution for our generative AI needs was required.

AWS

AWS Database AI AI

Accelerate pre-training of Mistral’s Mathstral model with highly resilient clusters on Amazon SageMaker HyperPod

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The compute clusters used in these scenarios are composed of more than thousands of AI accelerators such as GPUs or AWS Trainium and AWS Inferentia , custom machine learning (ML) chips designed by Amazon Web Services (AWS) to accelerate deep learning workloads in the cloud.

Clustering

Clustering AWS ML ML

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Development Support Program

AWS Machine Learning Blog

JULY 31, 2024

Amazon Web Services (AWS) is committed to supporting the development of cutting-edge generative artificial intelligence (AI) technologies by companies and organizations across the globe. Let’s dive in and explore how these organizations are transforming what’s possible with generative AI on AWS.

AWS

AWS AI AI Clustering

Integrate HyperPod clusters with Active Directory for seamless multi-user login

AWS Machine Learning Blog

APRIL 22, 2024

Amazon SageMaker HyperPod is purpose-built to accelerate foundation model (FM) training, removing the undifferentiated heavy lifting involved in managing and optimizing a large training compute cluster. In this solution, HyperPod cluster instances use the LDAPS protocol to connect to the AWS Managed Microsoft AD via an NLB.

Clustering

Clustering AWS Machine Learning Machine Learning

Build enterprise-ready generative AI solutions with Cohere foundation models in Amazon Bedrock and Weaviate vector database on AWS Marketplace

AWS Machine Learning Blog

JANUARY 24, 2024

Generative AI solutions have the potential to transform businesses by boosting productivity and improving customer experiences, and using large language models (LLMs) with these solutions has become increasingly popular. Despite their wealth of general knowledge, state-of-the-art LLMs only have access to the information they were trained on.

AWS

AWS Database AI AI

Use language embeddings for zero-shot classification and semantic search with Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 13, 2025

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. AWS Lambda The API is a Fastify application written in TypeScript. Its hosted on AWS Lambda.

AWS

AWS K-nearest Neighbors Clustering Algorithm

Efficiently train models with large sequence lengths using Amazon SageMaker model parallel

AWS Machine Learning Blog

NOVEMBER 27, 2024

You can use these techniques together to train complex models that are orders of magnitude faster and rapidly iterate and deploy innovative AI solutions that drive business value. After they’re initiated, SageMaker training jobs spin up the cluster, provisioning the specified number and type of compute instances.

AWS

AWS Clustering ML ML

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

AWS (Amazon Web Services), the comprehensive and evolving cloud computing platform provided by Amazon, is comprised of infrastructure as a service (IaaS), platform as a service (PaaS) and packaged software as a service (SaaS). With its wide array of tools and convenience, AWS has already become a popular choice for many SaaS companies.

AWS

AWS Cloud Computing Data Lakes Database

Dynamic video content moderation and policy evaluation using AWS generative AI services

AWS Machine Learning Blog

MAY 30, 2024

Generative artificial intelligence (AI) has unlocked fresh opportunities for these use cases. In this post, we introduce the Media Analysis and Policy Evaluation solution, which uses AWS AI and generative AI services to provide a framework to streamline video extraction and evaluation processes.

AWS

AWS AI AI ML

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

AWS Machine Learning Blog

MAY 1, 2024

Large language models (LLMs) are making a significant impact in the realm of artificial intelligence (AI). Llama2 by Meta is an example of an LLM offered by AWS. To learn more about Llama 2 on AWS, refer to Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart.

AWS

AWS ML ML Clustering

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

Flipboard

JUNE 20, 2023

For reference, GPT-3, an earlier generation LLM has 175 billion parameters and requires months of non-stop training on a cluster of thousands of accelerated processors. The Carbontracker study estimates that training GPT-3 from scratch may emit up to 85 metric tons of CO2 equivalent, using clusters of specialized hardware accelerators.

AWS

AWS Machine Learning Machine Learning ML

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

AWS Machine Learning Blog

MAY 30, 2024

CBRE is unlocking the potential of artificial intelligence (AI) to realize value across the entire commercial real estate lifecycle—from guiding investment decisions to managing buildings. The opportunities to unlock value using AI in the commercial real estate lifecycle starts with data at scale.

AWS

AWS SQL Database AI

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

AWS Machine Learning Blog

SEPTEMBER 4, 2024

A challenge for DevOps engineers is the additional complexity that comes from using Kubernetes to manage the deployment stage while resorting to other tools (such as the AWS SDK or AWS CloudFormation ) to manage the model building pipeline. kubectl for working with Kubernetes clusters. eksctl for working with EKS clusters.

AWS

AWS Clustering ML ML

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

AWS Machine Learning Blog

NOVEMBER 20, 2024

In this post, we explore how you can use Amazon Q Business , the AWS generative AI-powered assistant, to build a centralized knowledge base for your organization, unifying structured and unstructured datasets from different sources to accelerate decision-making and drive productivity. Choose Create database.

Database

Database AWS SQL ETL

Deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK

AWS Machine Learning Blog

MAY 23, 2023

Just recently, generative AI applications have captured everyone’s attention and imagination. We are truly at an exciting inflection point in the widespread adoption of ML, and we believe every customer experience and application will be reinvented with generative AI.

AWS

AWS AI AI ML

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

AWS Machine Learning Blog

JANUARY 30, 2025

Today, we are announcing that DeepSeek AI s first-generation frontier model, DeepSeek-R1 , is available through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace to deploy for inference. You can now use DeepSeek-R1 to build, experiment, and responsibly scale your generative AI ideas on AWS. 48xlarge for endpoint usage.

AWS

AWS Python AI AI

Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium

AWS Machine Learning Blog

OCTOBER 5, 2023

In this post, we walk through how to fine-tune Llama 2 on AWS Trainium , a purpose-built accelerator for LLM training, to reduce training times and costs. We review the fine-tuning scripts provided by the AWS Neuron SDK (using NeMo Megatron-LM), the various configurations we used, and the throughput results we saw.

AWS

AWS Machine Learning Machine Learning Deep Learning

Revolutionizing large language model training with Arcee and AWS Trainium

AWS Machine Learning Blog

APRIL 29, 2024

Close collaboration with AWS Trainium has also played a major role in making the Arcee platform extremely performant, not only accelerating model training but also reducing overall costs and enforcing compliance and data integrity in the secure AWS environment. Our cluster consisted of 16 nodes, each equipped with a trn1n.32xlarge

AWS

AWS Clustering ML ML

Scale and simplify ML workload monitoring on Amazon EKS with AWS Neuron Monitor container

AWS Machine Learning Blog

JUNE 25, 2024

Amazon Web Services is excited to announce the launch of the AWS Neuron Monitor container , an innovative tool designed to enhance the monitoring capabilities of AWS Inferentia and AWS Trainium chips on Amazon Elastic Kubernetes Service (Amazon EKS). The Container Insights dashboard also shows cluster status and alarms.

AWS

AWS ML ML Clustering

Boost your forecast accuracy with time series clustering

AWS Machine Learning Blog

APRIL 4, 2023

AWS provides various services catered to time series data that are low code/no code, which both machine learning (ML) and non-ML practitioners can use for building ML solutions. In this post, we seek to separate a time series dataset into individual clusters that exhibit a higher degree of similarity between its data points and reduce noise.

Clustering

Clustering ML ML AWS

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

AWS Machine Learning Blog

DECEMBER 12, 2023

In this post, we’ll summarize training procedure of GPT NeoX on AWS Trainium , a purpose-built machine learning (ML) accelerator optimized for deep learning training. M tokens/$) trained such models with AWS Trainium without losing any model quality. We’ll outline how we cost-effectively (3.2 billion in Pythia. billion in Pythia.

AWS

AWS Machine Learning Machine Learning Deep Learning

3 ways to migrate and deploy IBM Maximo on AWS Cloud

IBM Journey to AI blog

OCTOBER 26, 2023

Maximo maintains high-value assets with AI and analytics to optimize performance, extend asset lifecycles and reduce downtime and costs. In this journey, we are seeing an increased interest in migrating and deploying MAS on AWS Cloud. Therefore, Maximo migration and modernization has become even more important for clients.

AWS

AWS Clustering Database Analytics

Bring legacy machine learning code into Amazon SageMaker using AWS Step Functions

AWS Machine Learning Blog

MARCH 15, 2023

Tens of thousands of AWS customers use AWS machine learning (ML) services to accelerate their ML development with fully managed infrastructure and tools. The data scientist is responsible for moving the code into SageMaker, either manually or by cloning it from a code repository such as AWS CodeCommit.

AWS

AWS Machine Learning Machine Learning Data Scientist

Benchmarking Amazon Nova and GPT-4o models with FloTorch

AWS Machine Learning Blog

MARCH 11, 2025

The growing need for cost-effective AI models The landscape of generative AI is rapidly evolving. OpenAI launched GPT-4o in May 2024, and Amazon introduced Amazon Nova models at AWS re:Invent in December 2024. The implementation included a provisioned three-node sharded OpenSearch Service cluster.

K-nearest Neighbors

K-nearest Neighbors AWS Database AI

Boost your MLOps efficiency with these 6 must-have tools and platforms

Data Science Dojo

FEBRUARY 20, 2023

It is used by businesses across industries for a wide range of applications, including fraud prevention, marketing automation, customer service, artificial intelligence (AI), chatbots, virtual assistants, and recommendations. It provides a large cluster of clusters on a single machine.

Machine Learning

Machine Learning Machine Learning AWS Azure

Scale your machine learning workloads on Amazon ECS powered by AWS Trainium instances

AWS Machine Learning Blog

MAY 31, 2023

With containers, scaling on a cluster becomes much easier. In late 2022, AWS announced the general availability of Amazon EC2 Trn1 instances powered by AWS Trainium accelerators, which are purpose built for high-performance deep learning training. What you get is an ML development environment that is consistent and portable.

AWS

AWS Machine Learning Machine Learning ML

Sprinklr improves performance by 20% and reduces cost by 25% for machine learning inference on AWS Graviton3

AWS Machine Learning Blog

JUNE 11, 2024

Each of these products are infused with artificial intelligence (AI) capabilities to deliver exceptional customer experience. Sprinklr’s specialized AI models streamline data processing, gather valuable insights, and enable workflows and analytics at scale to drive better decision-making and productivity.

Machine Learning

Machine Learning Machine Learning AWS Natural Language Processing

How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 7, 2025

This post details how we used Amazon Bedrock to create an AI assistant (Untold Assistant), providing artists with a straightforward way to access our internal resources through a natural language interface integrated directly into their existing Slack workflow. Key AWS services used include: Amazon Bedrock Including Anthropics Claude 3.5

AWS

AWS AI AI Python

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

AWS Machine Learning Blog

APRIL 19, 2024

Iambic Therapeutics is a drug discovery startup with a mission to create innovative AI-driven technologies to bring better medicines to cancer patients, faster. Our advanced generative and predictive artificial intelligence (AI) tools enable us to search the vast space of possible drug molecules faster and more effectively.

Clustering

Clustering AI AI AWS

Introducing Amazon SageMaker HyperPod to train foundation models at scale

AWS Machine Learning Blog

NOVEMBER 30, 2023

Building foundation models (FMs) requires building, maintaining, and optimizing large clusters to train models with tens to hundreds of billions of parameters on vast amounts of data. Customers such as Stability AI use SageMaker HyperPod to train their foundation models, including Stable Diffusion. “As

Clustering

Clustering AWS Machine Learning Machine Learning

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

AWS Machine Learning Blog

MARCH 15, 2024

Therefore, ML creates challenges for AWS customers who need to ensure privacy and security across distributed entities without compromising patient outcomes. Solution overview We deploy FedML into multiple EKS clusters integrated with SageMaker for experiment tracking. As always, AWS welcomes your feedback.

AWS

AWS ML ML Machine Learning

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Racing into the future: How AWS DeepRacer fueled my AI and ML journey

Webinars

Trending Sources

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM

Webinars

Build a reverse image search engine with Amazon Titan Multimodal Embeddings in Amazon Bedrock and AWS managed services

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

ByteDance processes billions of daily videos using their multimodal video understanding models on AWS Inferentia2

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

Customize DeepSeek-R1 distilled models using Amazon SageMaker HyperPod recipes – Part 1

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS at NVIDIA GTC 2024: Accelerate innovation with generative AI on AWS

Map Earth’s vegetation in under 20 minutes with Amazon SageMaker

How Rocket Companies modernized their data science solution on AWS

Get started quickly with AWS Trainium and AWS Inferentia using AWS Neuron DLAMI and AWS Neuron DLC

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

Accelerate pre-training of Mistral’s Mathstral model with highly resilient clusters on Amazon SageMaker HyperPod

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Development Support Program

Integrate HyperPod clusters with Active Directory for seamless multi-user login

Build enterprise-ready generative AI solutions with Cohere foundation models in Amazon Bedrock and Weaviate vector database on AWS Marketplace

Use language embeddings for zero-shot classification and semantic search with Amazon Bedrock

Efficiently train models with large sequence lengths using Amazon SageMaker model parallel

10 Things AWS Can Do for Your SaaS Company

Dynamic video content moderation and policy evaluation using AWS generative AI services

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

Deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium

Revolutionizing large language model training with Arcee and AWS Trainium

Scale and simplify ML workload monitoring on Amazon EKS with AWS Neuron Monitor container

Boost your forecast accuracy with time series clustering

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

3 ways to migrate and deploy IBM Maximo on AWS Cloud

Bring legacy machine learning code into Amazon SageMaker using AWS Step Functions

Benchmarking Amazon Nova and GPT-4o models with FloTorch

Boost your MLOps efficiency with these 6 must-have tools and platforms

Scale your machine learning workloads on Amazon ECS powered by AWS Trainium instances

Sprinklr improves performance by 20% and reduces cost by 25% for machine learning inference on AWS Graviton3

How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

Introducing Amazon SageMaker HyperPod to train foundation models at scale

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

Stay Connected