Clustering, Machine Learning and System Architecture

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS Machine Learning Blog

DECEMBER 4, 2024

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. You can view and create EMR clusters directly through the SageMaker notebook. This post is cowritten with Isaac Cameron and Alex Gnibus from Tecton.

ML

ML ML AWS AI

Accelerate pre-training of Mistral’s Mathstral model with highly resilient clusters on Amazon SageMaker HyperPod

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The compute clusters used in these scenarios are composed of more than thousands of AI accelerators such as GPUs or AWS Trainium and AWS Inferentia , custom machine learning (ML) chips designed by Amazon Web Services (AWS) to accelerate deep learning workloads in the cloud.

Clustering

Clustering AWS ML ML

Redesigning Snorkel’s interactive machine learning systems

Snorkel AI

MAY 3, 2023

To empower our enterprise customers to adopt foundation models and large language models, we completely redesigned the machine learning systems behind Snorkel Flow to make sure we were meeting customer needs. In this article, we share our journey and hope that it helps you design better machine learning systems.

Machine Learning

Machine Learning Machine Learning ML ML

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Redesigning Snorkel’s interactive machine learning systems

Snorkel AI

MAY 3, 2023

To empower our enterprise customers to adopt foundation models and large language models, we completely redesigned the machine learning systems behind Snorkel Flow to make sure we were meeting customer needs. In this article, we share our journey and hope that it helps you design better machine learning systems.

Machine Learning

Machine Learning Machine Learning ML ML

Who Said What? Recorder's On-device Solution for Labeling Speakers

Google Research AI blog

DECEMBER 14, 2022

It leverages recent developments in on-device machine learning to transcribe speech , recognize audio events , suggest tags for titles, and help users navigate transcripts. This feature is powered by Google's new speaker diarization system named Turn-to-Diarize , which was first presented at ICASSP 2022.

Clustering

Clustering Algorithm Machine Learning Machine Learning

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. He works with customers from different sectors to accelerate high-impact data, analytics, and machine learning initiatives.

ML

ML ML AWS Data Pipeline

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

AWS Machine Learning Blog

MARCH 10, 2025

Solution overview The following figure illustrates our system architecture for CreditAI on AWS, with two key paths: the document ingestion and content extraction workflow, and the Q&A workflow for live user query response. He specializes in generative AI, machine learning, and system design.

AWS

AWS Database AI AI

Meeting customer needs with our ML platform redesign

Snorkel AI

MAY 3, 2023

To empower our enterprise customers to adopt foundation models and large language models, we completely redesigned the machine learning systems behind Snorkel Flow to make sure we were meeting customer needs. In this article, we share our journey and hope that it helps you design better machine learning systems.

ML

ML ML Machine Learning Machine Learning

10 industries that use distributed computing

IBM Journey to AI blog

JULY 18, 2024

Computing Computing is being dominated by major revolutions in artificial intelligence (AI) and machine learning (ML). Tight coupling: The level of synchronization and parallelism is so great in tightly coupled components that a process called “clustering” uses redundant components to ensure ongoing system viability.

Cloud Computing

Cloud Computing Database Internet of Things ML

LLMOps: What It Is, Why It Matters, and How to Implement It

The MLOps Blog

MARCH 12, 2024

Machine Learning Operations (MLOps) vs Large Language Model Operations (LLMOps) LLMOps fall under MLOps (Machine Learning Operations). The following table provides a more detailed comparison: Task MLOps LLMOps Primary focus Developing and deploying machine-learning models. Specifically focused on LLMs.

Database

Database Machine Learning Machine Learning AI

Ray jobs on Amazon SageMaker HyperPod: scalable and resilient distributed AI

AWS Machine Learning Blog

APRIL 2, 2025

At its core, Ray offers a unified programming model that allows developers to seamlessly scale their applications from a single machine to a distributed cluster. Ray promotes the same coding patterns for both a simple machine learning (ML) experiment and a scalable, resilient production application.

Clustering

Clustering AWS AI AI

Reduce ML training costs with Amazon SageMaker HyperPod

AWS Machine Learning Blog

APRIL 10, 2025

As cluster sizes grow, the likelihood of failure increases due to the number of hardware components involved. Each hardware failure can result in wasted GPU hours and requires valuable engineering time to identify and resolve the issue, making the system prone to downtime that can disrupt progress and delay completion.

ML

ML ML Clustering AWS

Data Science Current

Real value, real time: Production AI with Amazon SageMaker and Tecton

Accelerate pre-training of Mistral’s Mathstral model with highly resilient clusters on Amazon SageMaker HyperPod

Webinars

Trending Sources

Redesigning Snorkel’s interactive machine learning systems

Webinars

Redesigning Snorkel’s interactive machine learning systems

Who Said What? Recorder's On-device Solution for Labeling Speakers

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

Meeting customer needs with our ML platform redesign

10 industries that use distributed computing

LLMOps: What It Is, Why It Matters, and How to Implement It

Ray jobs on Amazon SageMaker HyperPod: scalable and resilient distributed AI

Reduce ML training costs with Amazon SageMaker HyperPod

Stay Connected