AWS, Clustering and Events - Data Science Current

Racing into the future: How AWS DeepRacer fueled my AI and ML journey

AWS Machine Learning Blog

NOVEMBER 19, 2024

In 2018, I sat in the audience at AWS re:Invent as Andy Jassy announced AWS DeepRacer —a fully autonomous 1/18th scale race car driven by reinforcement learning. But AWS DeepRacer instantly captured my interest with its promise that even inexperienced developers could get involved in AI and ML.

AWS

AWS ML ML AI

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

The excitement is building for the fourteenth edition of AWS re:Invent, and as always, Las Vegas is set to host this spectacular event. Third, we’ll explore the robust infrastructure services from AWS powering AI innovation, featuring Amazon SageMaker , AWS Trainium , and AWS Inferentia under AI/ML, as well as Compute topics.

AWS

AWS ML ML AI

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

By using dbt Cloud for data transformation, data teams can focus on writing business rules to drive insights from their transaction data to respond effectively to critical, time sensitive events. Solution overview Let’s consider TICKIT , a fictional website where users buy and sell tickets online for sporting events, shows, and concerts.

ETL

ETL Data Warehouse Analytics Analytics

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

AWS Machine Learning Blog

JULY 25, 2024

In the post, we introduce the AWS Neuron node problem detector and recovery DaemonSet for AWS Trainium and AWS Inferentia on Amazon Elastic Kubernetes Service (Amazon EKS). Additionally, the node recovery agent will publish Amazon CloudWatch metrics for users to monitor and alert on these events. install.sh install.sh

Clustering

Clustering AWS ML ML

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

AWS Machine Learning Blog

MARCH 10, 2025

We walk through the journey Octus took from managing multiple cloud providers and costly GPU instances to implementing a streamlined, cost-effective solution using AWS services including Amazon Bedrock, AWS Fargate , and Amazon OpenSearch Service. Along the way, it also simplified operations as Octus is an AWS shop more generally.

AWS

AWS Database AI AI

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Communication between the two systems was established through Kerberized Apache Livy (HTTPS) connections over AWS PrivateLink. Responsibility for maintenance and troubleshooting: Rockets DevOps/Technology team was responsible for all upgrades, scaling, and troubleshooting of the Hadoop cluster, which was installed on bare EC2 instances.

Data Science

Data Science AWS Hadoop Data Scientist

Use language embeddings for zero-shot classification and semantic search with Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 13, 2025

Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using Amazon Web Services (AWS) services without having to manage infrastructure. AWS Lambda The API is a Fastify application written in TypeScript.

AWS

AWS K-nearest Neighbors Clustering Algorithm

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

AWS Machine Learning Blog

MAY 30, 2024

Because Amazon Bedrock is serverless, you don’t have to manage infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with. AWS Prototyping developed an AWS Cloud Development Kit (AWS CDK) stack for deployment following AWS best practices.

AWS

AWS SQL Database AI

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

AWS Machine Learning Blog

MARCH 15, 2024

Therefore, ML creates challenges for AWS customers who need to ensure privacy and security across distributed entities without compromising patient outcomes. Solution overview We deploy FedML into multiple EKS clusters integrated with SageMaker for experiment tracking. As always, AWS welcomes your feedback.

AWS

AWS ML ML Machine Learning

Introducing Amazon SageMaker HyperPod to train foundation models at scale

AWS Machine Learning Blog

NOVEMBER 30, 2023

Building foundation models (FMs) requires building, maintaining, and optimizing large clusters to train models with tens to hundreds of billions of parameters on vast amounts of data. SageMaker HyperPod integrates the Slurm Workload Manager for cluster and training job orchestration.

Clustering

Clustering AWS Machine Learning Machine Learning

Welcome to a New Era of Building in the Cloud with Generative AI on AWS

AWS Machine Learning Blog

NOVEMBER 30, 2023

The number of companies launching generative AI applications on AWS is substantial and building quickly, including adidas, Booking.com, Bridgewater Associates, Clariant, Cox Automotive, GoDaddy, and LexisNexis Legal & Professional, to name just a few. Innovative startups like Perplexity AI are going all in on AWS for generative AI.

AWS

AWS AI AI ML

How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 7, 2025

The implementation uses Slacks event subscription API to process incoming messages and Slacks Web API to send responses. Key AWS services used include: Amazon Bedrock Including Anthropics Claude 3.5 By keeping all data within the AWS ecosystem, weve eliminated dependencies on third-party AI tools and mitigated associated risks.

AWS

AWS AI AI Python

Accelerating sustainable modernization with Green IT Analyzer on AWS

IBM Journey to AI blog

JANUARY 16, 2024

To aid in building more sustainable IT estates, IBM has partnered up with Amazon Web Services (AWS) to facilitate sustainable cloud modernization journeys. To read about other key scenarios and entry points of IBM Consulting® Custom Lens for Sustainability, check out the blog post: Sustainable App Modernization Using AWS Cloud.

AWS

AWS Clustering Database Artificial Intelligence

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

AWS Machine Learning Blog

JULY 17, 2023

As described in the AWS Well-Architected Framework , separating workloads across accounts enables your organization to set common guardrails while isolating environments. Organizations with a multi-account architecture typically have Amazon Redshift and SageMaker Studio in two separate AWS accounts.

Clustering

Clustering AWS ML ML

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 2, 2024

In this post, we share AWS guidance that we have learned and developed as part of real-world projects into practical guides oriented towards the AWS Well-Architected Framework , which is used to build production infrastructure and applications on AWS. We focus on the operational excellence pillar in this post.

AWS

AWS AI AI ML

Introducing Amazon EKS support in Amazon SageMaker HyperPod

AWS Machine Learning Blog

SEPTEMBER 11, 2024

This capability allows for the seamless addition of SageMaker HyperPod managed compute to EKS clusters, using automated node and job resiliency features for foundation model (FM) development. FMs are typically trained on large-scale compute clusters with hundreds or thousands of accelerators.

Clustering

Clustering AWS ML ML

Migrate and modernize enterprise integration using IBM Cloud Pak for Integration with Red Hat OpenShift Service on AWS (ROSA)

IBM Journey to AI blog

MARCH 20, 2024

Perhaps you need to discover what’s happening in your business and respond quickly to events. In the rest of this paper, we will explore how Cloud Pak for Integration, deployed on Red Hat OpenShift is the best way to provide integrations deploying in AWS. ROSA is jointly engineered and supported by AWS and Red Hat.

AWS

AWS Clustering AI AI

Host the Spark UI on Amazon SageMaker Studio

AWS Machine Learning Blog

AUGUST 8, 2023

You can run Spark applications interactively from Amazon SageMaker Studio by connecting SageMaker Studio notebooks and AWS Glue Interactive Sessions to run Spark jobs with a serverless cluster. With interactive sessions, you can choose Apache Spark or Ray to easily process large datasets, without worrying about cluster management.

AWS

AWS Clustering Machine Learning Machine Learning

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

JANUARY 17, 2024

It can represent a geographical area as a whole or it can represent an event associated with a geographical area. We then discuss the various use cases and explore how you can use AWS services to clean the data, how machine learning (ML) can aid in this effort, and how you can make ethical use of the data in generating visuals and insights.

Clustering

Clustering AWS ML ML

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

AWS Machine Learning Blog

FEBRUARY 5, 2025

The listing writer microservice publishes listing change events to an Amazon Simple Notification Service (Amazon SNS) topic, which an Amazon Simple Queue Service (Amazon SQS) queue subscribes to. The listing indexer AWS Lambda function continuously polls the queue and processes incoming listing updates.

K-nearest Neighbors

K-nearest Neighbors Machine Learning Machine Learning Database

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

AWS Machine Learning Blog

APRIL 19, 2024

The architecture deploys a simple service in a Kubernetes pod within an EKS cluster. The Kubernetes Event Driven Autoscaler ( KEDA ) is configured to automatically scale the number of service pods, based on the custom metrics available in Prometheus. The service is exposed behind a reverse-proxy using Traefik.

Clustering

Clustering AI AI AWS

How LotteON built a personalized recommendation system using Amazon SageMaker and MLOps

AWS Machine Learning Blog

MAY 16, 2024

The main AWS services used are SageMaker, Amazon EMR , AWS CodeBuild , Amazon Simple Storage Service (Amazon S3), Amazon EventBridge , AWS Lambda , and Amazon API Gateway. The subsequent role of EventBridge was to dispatch events, instigated by the alteration of the buildspec.yml file, leading to running CodeBuild.

AWS

AWS ML ML Deep Learning

How Cisco accelerated the use of generative AI with Amazon SageMaker Inference

AWS Machine Learning Blog

AUGUST 8, 2024

Webex by Cisco is a leading provider of cloud-based collaboration solutions, including video meetings, calling, messaging, events, polling, asynchronous video, and customer experience solutions like contact center and purpose-built collaboration devices. Webex works with the world’s leading business and productivity apps—including AWS.

AWS

AWS AI AI Clustering

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

SnapLogic uses Amazon Bedrock to build its platform, capitalizing on the proximity to data already stored in Amazon Web Services (AWS). To address customers’ requirements about data privacy and sovereignty, SnapLogic deploys the data plane within the customer’s VPC on AWS.

AI

AI AI AWS Database

Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 2

AWS Machine Learning Blog

JANUARY 13, 2023

Analyzing real-world healthcare and life sciences (HCLS) data poses several practical challenges, such as distributed data silos, lack of sufficient data at a single site for rare events, regulatory guidelines that prohibit data sharing, infrastructure requirement, and cost incurred in creating a centralized data repository.

AWS

AWS Analytics Analytics Machine Learning

Live Meeting Assistant with Amazon Transcribe, Amazon Bedrock, and Knowledge Bases for Amazon Bedrock

AWS Machine Learning Blog

APRIL 18, 2024

It’s straightforward to deploy in your AWS account. Prerequisites You need to have an AWS account and an AWS Identity and Access Management (IAM) role and user with permissions to create and manage the necessary resources and components for this application. Everything you need is provided as open source in our GitHub repo.

AWS

AWS Analytics Analytics AI

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. Amazon’s AWS Glue is one such tool that allows you to consume data from Apache Kafka and Amazon-managed streaming for Apache Kafka (MSK).

Apache Kafka

Apache Kafka ETL Data Lakes AWS

Automate chatbot for document and data retrieval using Agents and Knowledge Bases for Amazon Bedrock

AWS Machine Learning Blog

MAY 1, 2024

Virginia) AWS Region. The diagram details a comprehensive AWS Cloud-based setup within a specific Region, using multiple AWS services. The primary interface for the chatbot is a Streamlit application hosted on an Amazon Elastic Container Service (Amazon ECS) cluster, with accessibility managed by an Application Load Balancer.

AWS

AWS Machine Learning Machine Learning SQL

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

Make sure you have the following prerequisites: Create an S3 bucket Configure MongoDB Atlas cluster Create a free MongoDB Atlas cluster by following the instructions in Create a Cluster. In his role Igor is working with strategic partners helping them build complex, AWS-optimized architectures. Note we have two folders.

Clustering

Clustering AWS Database ML

Scaling Thomson Reuters’ language model research with Amazon SageMaker HyperPod

AWS Machine Learning Blog

SEPTEMBER 12, 2024

In this post, we explore the journey that Thomson Reuters took to enable cutting-edge research in training domain-adapted large language models (LLMs) using Amazon SageMaker HyperPod , an Amazon Web Services (AWS) feature focused on providing purpose-built infrastructure for distributed training at scale. So, for example, a 6.6B

Clustering

Clustering AWS ML ML

Training large language models on Amazon SageMaker: Best practices

AWS Machine Learning Blog

MARCH 6, 2023

These factors require training an LLM over large clusters of accelerated machine learning (ML) instances. In the past few years, numerous customers have been using the AWS Cloud for LLM training. We recommend working with your AWS account team or contacting AWS Sales to determine the appropriate Region for your LLM workload.

AWS

AWS Clustering ML ML

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

AWS Machine Learning Blog

NOVEMBER 3, 2023

This process comprises two key components: event data and optical tracking data. Event data collection entails gathering the fundamental building blocks of the game. For the precision needed in shot speed calculations, we must ensure that the ball’s position aligns precisely with the moment of the event.

AWS

AWS Apache Kafka Data Scientist Data Science

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

APRIL 5, 2024

Geobox enables city departments to do the following: Improved climate adaptation planning – Informed decisions reduce the impact of extreme heat events. SageMaker Processing enables the flexible scaling of compute clusters to accommodate tasks of varying sizes, from processing a single city block to managing planetary-scale workloads.

Clustering

Clustering ML ML AWS

How Vericast optimized feature engineering using Amazon SageMaker Processing

AWS Machine Learning Blog

MAY 3, 2023

AWS customer Vericast is a marketing solutions company that makes data-driven decisions to boost marketing ROIs for its clients. Dynamic scaling of feature engineering jobs – A combination of various AWS services is used for this, but most notably SageMaker Processing.

AWS

AWS Machine Learning Machine Learning ML

Big data engineering simplified: Exploring roles of distributed systems

Data Science Dojo

JULY 24, 2023

Clusters : Clusters are groups of interconnected nodes that work together to process and store data. Clustering allows for improved performance and fault tolerance as tasks can be distributed across nodes. Amazon S3: Amazon Simple Storage Service (S3) is a scalable object storage service provided by Amazon Web Services (AWS).

Big Data

Big Data Big Data Data Engineer Data Engineering

Solving the Image Promotion Challenge Across Multi-Environment with ArgoCD

Towards AI

AUGUST 17, 2023

Each environment has a dedicated AWS account with its own cluster and ArgoCD installation. Potential Solutions At the outset, two potential solutions were proposed: ECR Cross Account Replication: AWS’s ECR natively supports replicating images between two accounts. Source: Image by the author. hook: PreSync argocd.argoproj.io/hook-delete-policy:

AWS

AWS Clustering AI AI

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

Prerequisites To implement this solution, complete the following prerequisites: Have AWS Cloud admin access with an AWS Identity and Access Management (IAM) user with permissions required to complete the integration. Enter a connection name such as demo and choose your desired Amazon DocumentDB cluster. Choose Add connection.

Machine Learning

Machine Learning Machine Learning AWS ML

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

The following figure illustrates the idea of a large cluster of GPUs being used for learning, followed by a smaller number for inference. Examples of other PBAs now available include AWS Inferentia and AWS Trainium , Google TPU, and Graphcore IPU. Suppliers of data center GPUs include NVIDIA, AMD, Intel, and others.

AWS

AWS ML ML Clustering

How to achieve Kubernetes observability: Principles and best practices

IBM Journey to AI blog

FEBRUARY 15, 2024

Logs Logs include discrete events recorded every time something occurs in the system, such as status or error messages, or transaction details. Autoscaling When traffic spikes, Kubernetes can automatically spin up new clusters to handle the additional workload. Kubernetes logs can be written in both structured and unstructured text.

Clustering

Clustering Azure Data Visualization AWS

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. Given the highly parallel needs, we chose Lambda to process our images.

ML

ML ML AWS Data Pipeline

Deploy a Hugging Face (PyAnnote) speaker diarization model on Amazon SageMaker as an asynchronous endpoint

AWS Machine Learning Blog

APRIL 25, 2024

We provide a comprehensive guide on how to deploy speaker segmentation and clustering solutions using SageMaker on the AWS Cloud. Solution overview Amazon Transcribe is the go-to service for speaker diarization in AWS. Make sure the AWS account has a service quota for hosting a SageMaker endpoint for an ml.g5.2xlarge instance.

AWS

AWS ML ML Python

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

This is a joint blog with AWS and Philips. Since 2014, the company has been offering customers its Philips HealthSuite Platform, which orchestrates dozens of AWS services that healthcare and life sciences companies use to improve patient care.

ML

ML ML AWS AI

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

With Ray and AIR, the same Python code can scale seamlessly from a laptop to a large cluster. The full code can be found on the aws-samples-for-ray GitHub repository. Prepare the source data for the feature store by adding an event time and record ID for each row of data. You can specify resource requirements in actors too.

Machine Learning

Machine Learning Machine Learning ML ML

Power recommendations and search using an IMDb knowledge graph – Part 3

AWS Machine Learning Blog

JANUARY 6, 2023

Many AWS media and entertainment customers license IMDb data through AWS Data Exchange to improve content discovery and increase customer engagement and retention. We downloaded the data from AWS Data Exchange and processed it in AWS Glue to generate KG files. Background. That’s an out-of-catalog search experience!

AWS

AWS ML ML Machine Learning

Racing into the future: How AWS DeepRacer fueled my AI and ML journey

Your guide to generative AI and ML at AWS re:Invent 2024

Webinars

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

How Rocket Companies modernized their data science solution on AWS

Use language embeddings for zero-shot classification and semantic search with Amazon Bedrock

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

Introducing Amazon SageMaker HyperPod to train foundation models at scale

Welcome to a New Era of Building in the Cloud with Generative AI on AWS

How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

Accelerating sustainable modernization with Green IT Analyzer on AWS

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Introducing Amazon EKS support in Amazon SageMaker HyperPod

Migrate and modernize enterprise integration using IBM Cloud Pak for Integration with Red Hat OpenShift Service on AWS (ROSA)

Host the Spark UI on Amazon SageMaker Studio

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

How LotteON built a personalized recommendation system using Amazon SageMaker and MLOps

How Cisco accelerated the use of generative AI with Amazon SageMaker Inference

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 2

Live Meeting Assistant with Amazon Transcribe, Amazon Bedrock, and Knowledge Bases for Amazon Bedrock

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Automate chatbot for document and data retrieval using Agents and Knowledge Bases for Amazon Bedrock

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

Scaling Thomson Reuters’ language model research with Amazon SageMaker HyperPod

Training large language models on Amazon SageMaker: Best practices

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

How Vericast optimized feature engineering using Amazon SageMaker Processing

Big data engineering simplified: Exploring roles of distributed systems

Solving the Image Promotion Challenge Across Multi-Environment with ArgoCD

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

A review of purpose-built accelerators for financial services

How to achieve Kubernetes observability: Principles and best practices

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Deploy a Hugging Face (PyAnnote) speaker diarization model on Amazon SageMaker as an asynchronous endpoint

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Power recommendations and search using an IMDb knowledge graph – Part 3

Stay Connected