Clustering, Events and ML - Data Science Current

Racing into the future: How AWS DeepRacer fueled my AI and ML journey

AWS Machine Learning Blog

NOVEMBER 19, 2024

At the time, I knew little about AI or machine learning (ML). But AWS DeepRacer instantly captured my interest with its promise that even inexperienced developers could get involved in AI and ML. Panic set in as we realized we would be competing on stage in front of thousands of people while knowing little about ML.

AWS

AWS ML ML AI

Identification of Hazardous Areas for Priority Landmine Clearance: AI for Humanitarian Mine Action

ML @ CMU

NOVEMBER 7, 2024

In close collaboration with the UN and local NGOs, we co-develop an interpretable predictive tool for landmine contamination to identify hazardous clusters under geographic and budget constraints, experimentally reducing false alarms and clearance time by half. RELand consistently outperforms the benchmark models on all relevant metrics.

Clustering

Clustering Cross Validation Machine Learning Machine Learning

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

The excitement is building for the fourteenth edition of AWS re:Invent, and as always, Las Vegas is set to host this spectacular event. Third, we’ll explore the robust infrastructure services from AWS powering AI innovation, featuring Amazon SageMaker , AWS Trainium , and AWS Inferentia under AI/ML, as well as Compute topics.

AWS

AWS ML ML AI

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

AWS Machine Learning Blog

JULY 25, 2024

By accelerating the speed of issue detection and remediation, it increases the reliability of your ML training and reduces the wasted time and cost due to hardware failure. Additionally, the node recovery agent will publish Amazon CloudWatch metrics for users to monitor and alert on these events. install.sh and public.ecr.aws. .

Clustering

Clustering AWS ML ML

Ray jobs on Amazon SageMaker HyperPod: scalable and resilient distributed AI

AWS Machine Learning Blog

APRIL 2, 2025

At its core, Ray offers a unified programming model that allows developers to seamlessly scale their applications from a single machine to a distributed cluster. Ray promotes the same coding patterns for both a simple machine learning (ML) experiment and a scalable, resilient production application.

Clustering

Clustering AWS AI AI

Introducing Databricks One

databricks

JUNE 12, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

An Important Guide To Unsupervised Machine Learning

Smart Data Collective

NOVEMBER 1, 2020

Unsupervised ML: The Basics. Unlike supervised ML, we do not manage the unsupervised model. Unsupervised ML uses algorithms that draw conclusions on unlabeled datasets. As a result, unsupervised ML algorithms are more elaborate than supervised ones, since we have little to no information or the predicted outcomes.

Machine Learning

Machine Learning Machine Learning Clustering Data Mining

Create Audience Segments Using K-Means Clustering, Churn Prevention with Reinforcement Learning…

ODSC - Open Data Science

FEBRUARY 23, 2023

Learn more about how you can volunteer for either the in-person or virtual team and get a free ticket to the event. Volunteer for ODSC East 2023 ODSC volunteers are an integral part of the success of each ODSC conference and a perfect extension of our core team and ambassadors to our community!

Clustering

Clustering Data Science Machine Learning Machine Learning

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

AWS Machine Learning Blog

FEBRUARY 7, 2025

However, it lacked essential services required for machine learning (ML) applications, such as frontend and backend infrastructure, DNS, load balancers, scaling, blob storage, and managed databases. The S3 bucket is configured in such a way that it forwards (2) all events into EventBridge. We use Karpenter as the cluster auto scaler.

Analytics

Analytics Analytics AWS Clustering

Efficiently build and tune custom log anomaly detection models with Amazon SageMaker

AWS Machine Learning Blog

JANUARY 6, 2025

It usually comprises parsing log data into vectors or machine-understandable tokens, which you can then use to train custom machine learning (ML) algorithms for determining anomalies. You can adjust the inputs or hyperparameters for an ML algorithm to obtain a combination that yields the best-performing model. scikit-learn==0.21.3

Python

Python AWS ML ML

Unlock ML insights using the Amazon SageMaker Feature Store Feature Processor

AWS Machine Learning Blog

SEPTEMBER 19, 2023

Amazon SageMaker Feature Store provides an end-to-end solution to automate feature engineering for machine learning (ML). For many ML use cases, raw data like log files, sensor readings, or transaction records need to be transformed into meaningful features that are optimized for model training. SageMaker Studio set up.

ML

ML ML AWS SQL

Build a Search Engine: Setting Up AWS OpenSearch

Flipboard

MAY 5, 2025

Amazon OpenSearch Service is a fully managed solution that simplifies the deployment, operation, and scaling of OpenSearch clusters in the AWS Cloud. Log and Event Analytics: Index, store, and analyze logs from cloud applications, security monitoring tools, and observability platforms to detect trends and troubleshoot issues.

AWS

AWS Clustering Deep Learning Deep Learning

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks. Data Storage and Processing: All compute is done as Spark jobs inside of a Hadoop cluster using Apache Livy and Spark. Apache HBase was employed to offer real-time key-based access to data.

Data Science

Data Science AWS Hadoop Data Scientist

Automate cloud security vulnerability assessment and alerting using Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 18, 2024

GuardDuty combines machine learning (ML), anomaly detection, and malicious file discovery, using both AWS and industry-leading third-party sources, to help protect AWS accounts, workloads, and data. GuardDuty integrates with Amazon EventBridge by creating an event for EventBridge for new generated vulnerability findings.

AWS

AWS ML ML AI

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

AWS Machine Learning Blog

JULY 17, 2023

With cloud computing, as compute power and data became more available, machine learning (ML) is now making an impact across every industry and is a core part of every business and industry. Amazon SageMaker Studio is the first fully integrated ML development environment (IDE) with a web-based visual interface.

Clustering

Clustering AWS ML ML

How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 7, 2025

Our commitment to innovation led us to a pivotal challenge: how to harness the power of machine learning (ML) to further enhance our competitive edge while balancing this technological advancement with strict data security requirements and the need to streamline access to our existing internal resources.

AWS

AWS AI AI Python

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

AWS Machine Learning Blog

APRIL 18, 2025

The next step is to use a SageMaker Studio terminal instance to connect to the MSK cluster and create the test stream topic. The next step is to use a SageMaker Studio terminal instance to connect to the MSK cluster and create the test stream topic. Prepare the test data. ticker price OOOO $44.50 ZVZZT $3,413.23 ZNRXX $208.76

Apache Kafka

Apache Kafka AWS Clustering Database

Search enterprise data assets using LLMs backed by knowledge graphs

Flipboard

NOVEMBER 27, 2024

He helps architect solutions across AI/ML applications, enterprise data platforms, data governance, and unified search in enterprises. Gi Kim is a Data & ML Engineer with the AWS Professional Services team, helping customers build data analytics solutions and AI/ML applications.

AWS

AWS Database ML ML

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

AWS Machine Learning Blog

APRIL 19, 2024

The architecture deploys a simple service in a Kubernetes pod within an EKS cluster. The Kubernetes Event Driven Autoscaler ( KEDA ) is configured to automatically scale the number of service pods, based on the custom metrics available in Prometheus. xlarge nodes is included to run system pods that are needed by the cluster.

Clustering

Clustering AI AI AWS

Classification vs. Clustering

Pickl AI

MAY 10, 2023

ML algorithms fall into various categories which can be generally characterised as Regression, Clustering, and Classification. While Classification is an example of directed Machine Learning technique, Clustering is an unsupervised Machine Learning algorithm. What is Classification?

Clustering

Clustering Decision Trees Machine Learning Machine Learning

Introducing Amazon SageMaker HyperPod to train foundation models at scale

AWS Machine Learning Blog

NOVEMBER 30, 2023

Building foundation models (FMs) requires building, maintaining, and optimizing large clusters to train models with tens to hundreds of billions of parameters on vast amounts of data. SageMaker HyperPod integrates the Slurm Workload Manager for cluster and training job orchestration.

Clustering

Clustering AWS Machine Learning Machine Learning

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

APRIL 5, 2024

SageMaker geospatial capabilities make it straightforward for data scientists and machine learning (ML) engineers to build, train, and deploy models using geospatial data. Geobox enables city departments to do the following: Improved climate adaptation planning – Informed decisions reduce the impact of extreme heat events.

Clustering

Clustering ML ML AWS

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

JANUARY 17, 2024

It can represent a geographical area as a whole or it can represent an event associated with a geographical area. We then discuss the various use cases and explore how you can use AWS services to clean the data, how machine learning (ML) can aid in this effort, and how you can make ethical use of the data in generating visuals and insights.

Clustering

Clustering AWS ML ML

Introducing Amazon EKS support in Amazon SageMaker HyperPod

AWS Machine Learning Blog

SEPTEMBER 11, 2024

This capability allows for the seamless addition of SageMaker HyperPod managed compute to EKS clusters, using automated node and job resiliency features for foundation model (FM) development. FMs are typically trained on large-scale compute clusters with hundreds or thousands of accelerators.

Clustering

Clustering AWS ML ML

ML Model Packaging [The Ultimate Guide]

The MLOps Blog

APRIL 5, 2023

In this comprehensive guide, we’ll explore the key concepts, challenges, and best practices for ML model packaging, including the different types of packaging formats, techniques, and frameworks. Best practices for ml model packaging Here is how you can package a model efficiently.

ML

ML ML Machine Learning Machine Learning

Maintaining large-scale AI capacity at Meta

Hacker News

JUNE 12, 2024

Meta is currently operating many data centers with GPU training clusters across the world. Meta’s training infrastructure comprises dozens of AI clusters of varying sizes, with a plan to scale to 600,000 GPUs in the next year. It runs thousands of training jobs every day from hundreds of different Meta teams.

Clustering

Clustering AI AI Artificial Intelligence

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

AWS Machine Learning Blog

FEBRUARY 5, 2025

The listing writer microservice publishes listing change events to an Amazon Simple Notification Service (Amazon SNS) topic, which an Amazon Simple Queue Service (Amazon SQS) queue subscribes to. The cluster comprises 3 cluster manager nodes (m6g.xlarge.search instance) dedicated to manage cluster operations.

K-nearest Neighbors

K-nearest Neighbors Machine Learning Machine Learning Database

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem. Consumers read the events and process the data in real-time. Editor’s note: Kai Waehner is a speaker for ODSC Europe this June.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

Host the Spark UI on Amazon SageMaker Studio

AWS Machine Learning Blog

AUGUST 8, 2023

You can run Spark applications interactively from Amazon SageMaker Studio by connecting SageMaker Studio notebooks and AWS Glue Interactive Sessions to run Spark jobs with a serverless cluster. With interactive sessions, you can choose Apache Spark or Ray to easily process large datasets, without worrying about cluster management.

AWS

AWS Clustering Machine Learning Machine Learning

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

This post, part of the Governing the ML lifecycle at scale series ( Part 1 , Part 2 , Part 3 ), explains how to set up and govern a multi-account ML platform that addresses these challenges. An enterprise might have the following roles involved in the ML lifecycles. This ML platform provides several key benefits.

ML

ML ML Data Scientist AWS

How Cisco accelerated the use of generative AI with Amazon SageMaker Inference

AWS Machine Learning Blog

AUGUST 8, 2024

Webex by Cisco is a leading provider of cloud-based collaboration solutions, including video meetings, calling, messaging, events, polling, asynchronous video, and customer experience solutions like contact center and purpose-built collaboration devices. The following diagram illustrates the WxAI architecture on AWS.

AWS

AWS AI AI Clustering

Model hosting patterns in Amazon SageMaker, Part 1: Common design patterns for building ML applications on Amazon SageMaker

AWS Machine Learning Blog

JANUARY 9, 2023

Machine learning (ML) applications are complex to deploy and often require the ability to hyper-scale, and have ultra-low latency requirements and stringent cost budgets. Deploying ML models at scale with optimized cost and compute efficiencies can be a daunting and cumbersome task. Design patterns for building ML applications.

ML

ML ML AWS Deep Learning

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

Amazon SageMaker Canvas Amazon SageMaker Canvas is a visual machine learning (ML) service that enables business analysts and data scientists to build and deploy custom ML models without requiring any ML experience or having to write a single line of code. Through Atlas Data Federation, data is extracted into Amazon S3 bucket.

Clustering

Clustering AWS Database ML

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. SageMaker is a fully managed service for building, training, and deploying ML models.

Machine Learning

Machine Learning Machine Learning ML ML

Understanding Machine Learning Challenges: Insights for Professionals

Pickl AI

FEBRUARY 17, 2025

It uses predictive modelling to forecast future events and adaptiveness to improve with new data, plus generalization to analyse fresh data. This scenario highlights a common reality in the Machine Learning landscape: despite the hype surrounding ML capabilities, many projects fail to deliver expected results due to various challenges.

Machine Learning

Machine Learning Machine Learning Supervised Learning ML

How Vericast optimized feature engineering using Amazon SageMaker Processing

AWS Machine Learning Blog

MAY 3, 2023

For any machine learning (ML) problem, the data scientist begins by working with data. Feature engineering refers to the process where relevant variables are identified, selected, and manipulated to transform the raw data into more useful and usable forms for use with the ML algorithm used to train a model and perform inference against it.

AWS

AWS Machine Learning Machine Learning ML

Scaling Thomson Reuters’ language model research with Amazon SageMaker HyperPod

AWS Machine Learning Blog

SEPTEMBER 12, 2024

Thomson Reuters , a global content and technology-driven company, has been using artificial intelligence and machine learning (AI/ML) in its professional information products for decades. In order to provision a highly scalable cluster that is resilient to hardware failures, Thomson Reuters turned to Amazon SageMaker HyperPod.

Clustering

Clustering AWS ML ML

How VMware built an MLOps pipeline from scratch using GitLab, Amazon MWAA, and Amazon SageMaker

Flipboard

MARCH 13, 2023

With terabytes of data generated by the product, the security analytics team focuses on building machine learning (ML) solutions to surface critical attacks and spotlight emerging threats from noise. Solution overview The following diagram illustrates the ML platform architecture.

ML

ML ML AWS Data Scientist

Implement semantic video search using open source large vision models on Amazon SageMaker and Amazon OpenSearch Serverless

Flipboard

JUNE 6, 2025

We introduce some use case-specific methods, such as temporal frame smoothing and clustering, to enhance the video search performance. Less frequent frame sampling might make sense when working with longer videos, whereas more frequent frame sampling might be needed to catch fast-occurring events.

AWS

AWS Clustering K-nearest Neighbors ML

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. For more information, refer to Preview: Use Amazon SageMaker to Build, Train, and Deploy ML Models Using Geospatial Data.

ML

ML ML AWS Data Pipeline

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas , allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Enter a connection name such as demo and choose your desired Amazon DocumentDB cluster.

Machine Learning

Machine Learning Machine Learning AWS ML

Top 6 Kubernetes use cases

IBM Journey to AI blog

NOVEMBER 13, 2023

Nodes run the pods and are usually grouped in a Kubernetes cluster, abstracting the underlying physical hardware resources. AI and machine learning Building and deploying artificial intelligence (AI) and machine learning (ML) systems requires huge volumes of data and complex processes like high performance computing and big data analysis.

Machine Learning

Machine Learning Machine Learning ML ML

Getting Up to Speed on Real-Time Machine Learning with Spark and SBERT

ODSC - Open Data Science

JUNE 6, 2023

Historically, our space has perceived streaming as a complex technology reserved for experienced data engineers with a deep understanding of incremental event processing. When combined with event-time windows, analyzing the embeddings in real-time becomes much more feasible. October 2022).

Machine Learning

Machine Learning Machine Learning Data Science Clustering

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

Amazon SageMaker provides purpose-built tools for machine learning operations (MLOps) to help automate and standardize processes across the ML lifecycle. In this post, we describe how Philips partnered with AWS to develop AI ToolSuite—a scalable, secure, and compliant ML platform on SageMaker.

ML

ML ML AWS AI

Racing into the future: How AWS DeepRacer fueled my AI and ML journey

Identification of Hazardous Areas for Priority Landmine Clearance: AI for Humanitarian Mine Action

Webinars

Trending Sources

Your guide to generative AI and ML at AWS re:Invent 2024

Webinars

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

Ray jobs on Amazon SageMaker HyperPod: scalable and resilient distributed AI

Introducing Databricks One

An Important Guide To Unsupervised Machine Learning

Create Audience Segments Using K-Means Clustering, Churn Prevention with Reinforcement Learning…

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

Efficiently build and tune custom log anomaly detection models with Amazon SageMaker

Unlock ML insights using the Amazon SageMaker Feature Store Feature Processor

Build a Search Engine: Setting Up AWS OpenSearch

How Rocket Companies modernized their data science solution on AWS

Automate cloud security vulnerability assessment and alerting using Amazon Bedrock

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

Search enterprise data assets using LLMs backed by knowledge graphs

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

Classification vs. Clustering

Introducing Amazon SageMaker HyperPod to train foundation models at scale

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

Introducing Amazon EKS support in Amazon SageMaker HyperPod

ML Model Packaging [The Ultimate Guide]

Maintaining large-scale AI capacity at Meta

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

Streaming Machine Learning Without a Data Lake

Host the Spark UI on Amazon SageMaker Studio

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

How Cisco accelerated the use of generative AI with Amazon SageMaker Inference

Model hosting patterns in Amazon SageMaker, Part 1: Common design patterns for building ML applications on Amazon SageMaker

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Understanding Machine Learning Challenges: Insights for Professionals

How Vericast optimized feature engineering using Amazon SageMaker Processing

Scaling Thomson Reuters’ language model research with Amazon SageMaker HyperPod

How VMware built an MLOps pipeline from scratch using GitLab, Amazon MWAA, and Amazon SageMaker

Implement semantic video search using open source large vision models on Amazon SageMaker and Amazon OpenSearch Serverless

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

Top 6 Kubernetes use cases

Getting Up to Speed on Real-Time Machine Learning with Spark and SBERT

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

Stay Connected