2015 and Clustering - Data Science Current

Building Meta’s GenAI Infrastructure

Hacker News

MARCH 12, 2024

Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We use this cluster design for Llama 3 training. We built these clusters on top of Grand Teton , OpenRack , and PyTorch and continue to push open innovation across the industry. The other cluster features an NVIDIA Quantum2 InfiniBand fabric.

Clustering

Clustering AI AI ML

The history of Kubernetes

IBM Journey to AI blog

NOVEMBER 2, 2023

Borg’s large-scale cluster management system essentially acts as a central brain for running containerized workloads across its data centers. Omega took the Borg ecosystem further, providing a flexible, scalable scheduling solution for large-scale computer clusters. Control plane nodes , which control the cluster.

Clustering

Clustering Cloud Computing AWS

Top 6 Kubernetes use cases

IBM Journey to AI blog

NOVEMBER 13, 2023

Nodes run the pods and are usually grouped in a Kubernetes cluster, abstracting the underlying physical hardware resources. In 2015, Google donated Kubernetes as a seed technology to the Cloud Native Computing Foundation (CNCF) (link resides outside ibm.com), the open-source, vendor-neutral hub of cloud-native computing.

Machine Learning

Machine Learning Machine Learning ML ML

Webinars

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

For nearly two decades, IBM Consulting has helped power SingHealth’s digital transformation

IBM Journey to AI blog

APRIL 5, 2023

This partnership allows the public healthcare cluster to remain agile and navigate ongoing changes in compliance and technology. It also standardised policies on compensation and benefits, performance reviews and career development throughout the healthcare cluster.

Clustering

Clustering Data Governance

Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium

AWS Machine Learning Blog

OCTOBER 5, 2023

Our high-level training procedure is as follows: for our training environment, we use a multi-instance cluster managed by the SLURM system for distributed training and scheduling under the NeMo framework. From 2015–2018, he worked as a program director at the US NSF in charge of its big data program. Youngsuk Park is a Sr.

AWS

AWS Machine Learning Machine Learning Deep Learning

Conformer-2: a state-of-the-art speech recognition model trained on 1.1M hours of data

AssemblyAI

JULY 18, 2023

Building on In-House Hardware Conformer-2 was trained on our own GPU compute cluster of 80GB-A100s. To do this, we deployed a fault-tolerant and highly scalable cluster management and job scheduling Slurm scheduler, capable of managing resources in the cluster, recovering from failures, and adding or removing specific nodes.

Clustering

Clustering Supervised Learning AI AI

We still have so much to learn from nature

Dataconomy

JULY 18, 2023

Object clustering and assembly is a behavior that allows the swarm of robots to manipulate objects distributed in the environment. By clustering and assembling these objects, the swarm can engage in construction processes or accomplish specific tasks that require collaborative object manipulation.

Algorithm

Algorithm Clustering Artificial Intelligence Artificial Intelligence

23 Best Free NLP Datasets for Machine Learning

Iguazio

SEPTEMBER 20, 2023

Twitter US Airline Sentiment Polarized Tweets from February 2015 about the large US airlines. 20 Newsgroups A dataset containing roughly 20,000 newsgroup documents spanning a variety of topics, for text classification, text clustering and similar ML applications. Get the dataset here. Data is provided in a CSV file and SQLite database.

Machine Learning

Machine Learning Machine Learning Database Data Scientist

Elon Musk’s xAI Unveils Grok 3 AI Model, Claims Edge Over OpenAI and DeepSeek

ODSC - Open Data Science

FEBRUARY 20, 2025

OpenAI, a company Musk co-founded in 2015, introduced its GPT-4-based model, the o1, last year, which showcased strong problem-solving abilities in coding, math, andscience. The company disclosed Tuesday that it had doubled its GPU cluster to 200,000 Nvidia units for Grok 3s training, up from 100,000 in2023. Whats Next?

AI

AI AI Artificial Intelligence Artificial Intelligence

Robustness of a Markov Blanket Discovery Approach to Adversarial Attack in Image Segmentation: An…

Mlearning.ai

MARCH 9, 2023

Automated algorithms for image segmentation have been developed based on various techniques, including clustering, thresholding, and machine learning (Arbeláez et al., 2015; Huang et al., 2015), which consists of 20 object categories with varying levels of complexity. 2015) to generate adversarial examples for each image.

Deep Learning

Deep Learning Deep Learning Machine Learning Machine Learning

How an Electrical Engineer Solved Australia’s Most Famous Cold Case

Hacker News

MARCH 20, 2023

In 2012, with the permission of the police, Janette used a magnifying glass to find where several hairs came together in a cluster. Janette performed our first DNA analysis in 2015 and, from the hair root, was able to place the sample within a maternal genetic lineage, or haplotype , known as “H,” which is widely spread around Europe.

Database

Database Clustering AI AI

A Spy Satellite You’ve Never Heard of Helped Win the Cold War

Hacker News

JANUARY 21, 2025

As chief engineer for NRLs satellite-building efforts for some 60 years until his retirement in 2015, Wilhelm directed the development of more than 100 satellites, some of them still classified. This triple launching capability was achieved with a satellite dispenser designed and built by an NRL team led by Peter Wilhelm.

Algorithm

Algorithm Clustering

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

Explore the model pre-training workflow from start to finish, including setting up clusters, troubleshooting convergence issues, and running distributed training to improve model performance. In this builders’ session, learn how to pre-train an LLM using Slurm on SageMaker HyperPod.

AWS

AWS ML ML AI

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

This dataset consists of human and machine annotated airborne images collected by the Civil Air Patrol in support of various disaster responses from 2015-2019. To train this model, we need a labeled ground truth subset of the Low Altitude Disaster Imagery (LADI) dataset. Given the highly parallel needs, we chose Lambda to process our images.

ML

ML ML AWS Data Pipeline

Coactive AI’s CEO: quality beats quantity for data selection

Snorkel AI

APRIL 11, 2023

So for example, in 2015, fidget spinners were all the rage. Now the key insight that we had in solving this is that we noticed that unseen concepts are actually well clustered by pre-trained deep learning models or foundation models. So this might come up if we’re a social media site and we’re trying to do a recommendation.

K-nearest Neighbors

K-nearest Neighbors Clustering Deep Learning Deep Learning

Coactive AI’s CEO: quality beats quantity for data selection

Snorkel AI

APRIL 11, 2023

So for example, in 2015, fidget spinners were all the rage. Now the key insight that we had in solving this is that we noticed that unseen concepts are actually well clustered by pre-trained deep learning models or foundation models. So this might come up if we’re a social media site and we’re trying to do a recommendation.

K-nearest Neighbors

K-nearest Neighbors Clustering Deep Learning Deep Learning

Coactive AI’s CEO: quality beats quantity for data selection

Snorkel AI

APRIL 11, 2023

So for example, in 2015, fidget spinners were all the rage. Now the key insight that we had in solving this is that we noticed that unseen concepts are actually well clustered by pre-trained deep learning models or foundation models. So this might come up if we’re a social media site and we’re trying to do a recommendation.

K-nearest Neighbors

K-nearest Neighbors Clustering Deep Learning Deep Learning

Demand forecasting at Getir built with Amazon Forecast

AWS Machine Learning Blog

MAY 15, 2023

Getir was founded in 2015 and operates in Turkey, the UK, the Netherlands, Germany, France, Spain, Italy, Portugal, and the United States. Algorithm Selection Amazon Forecast has six built-in algorithms ( ARIMA , ETS , NPTS , Prophet , DeepAR+ , CNN-QR ), which are clustered into two groups: statististical and deep/neural network.

Algorithm

Algorithm Data Scientist Machine Learning Machine Learning

Meet the Winners of the Youth Mental Health Narratives Challenge

DrivenData Labs

FEBRUARY 3, 2025

His journey in AI began in 2015 with a master's in computer vision for biomedical image analysis. Issac Chan is a Machine Learning Engineer at Verto where he leverages advanced machine learning techniques to create impactful healthcare solutions.

Machine Learning

Machine Learning Machine Learning Data Science Natural Language Processing

Best Machine Learning Frameworks for ML Experts in 2023

Pickl AI

JANUARY 23, 2023

It is an open source framework that has been available since April 2015. Scikit-Learn Scikit-Learn, or simply called SKLearn, is the most popular machine learning framework that supports various algorithms for classification, regression, and clustering. Allows clustering of unstructured data. It also allows distributed training.

Machine Learning

Machine Learning Machine Learning ML ML

Financial text generation using a domain-adapted fine-tuned large language model in Amazon SageMaker JumpStart

AWS Machine Learning Blog

APRIL 18, 2023

per diluted share, for the year ended December 31, 2015. per diluted share, for the year ended December 31, 2015. per diluted share, for the year ended December 31, 2015. per diluted share, for the year ended December 31, 2015. per diluted share, compared to $3,818,000, or $0.21

ML

ML ML Deep Learning Deep Learning

Introducing spaCy

Explosion

FEBRUARY 18, 2015

The only problem is that the list really contains two clusters of words: one associated with the legal meaning of “pleaded”, and one for the more general sense. Sorting out these clusters is an area of active research. Labs and Emory University, to appear at ACL 2015. Independent Evaluation Independent evaluation by Yahoo!

Clustering

Clustering Natural Language Processing Machine Learning Machine Learning

Open source data visualization options: we compare 5 tools

Cambridge Intelligence

FEBRUARY 20, 2025

Format: Open source automatic graph drawing/design tool that uses a simple graph description language (DOT) for nodes, edges, clusters etc. cdnjs.com History: Made available and maintained by mdaines at slowscan.net since 2015. graphviz.org History: Created by researchers at AT&T Bell Labs in 1991.

Data Visualization

Data Visualization Algorithm Data Analyst Clustering

Building a Predictive Model in KNIME

phData

MARCH 6, 2023

Delving further into KNIME Analytics Platform’s Node Repository reveals a treasure trove of data science-focused nodes, from linear regression to k-means clustering to ARIMA modeling—and quite a bit in between. The great thing about building a predictive model in KNIME is its simplicity.

Decision Trees

Decision Trees Analytics Analytics Data Science

Netflix Movies and Series Recommendation Systems

PyImageSearch

JULY 3, 2023

Figure 4: The Netflix personalized home page generation problem (source: Alvino and Basilico, “Learning a Personalized Homepage,” Netflix Technology Blog , 2015 ). Green ticks represent the relevant titles (source: Alvino and Basilico, “Learning a Personalized Homepage,” Netflix Technology Blog , 2015 ).

Deep Learning

Deep Learning Deep Learning Algorithm Machine Learning

Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 2

AWS Machine Learning Blog

JANUARY 13, 2023

They were admitted to one of 335 units at 208 hospitals located throughout the US between 2014–2015. Finally, monitor and track the FL model training progression across different nodes in the cluster using the weights and biases (wandb) tool, as shown in the following screenshot.

AWS

AWS Analytics Analytics Machine Learning

Five scalability pitfalls to avoid with your Kafka application

IBM Journey to AI blog

NOVEMBER 9, 2023

Since 2015, IBM has provided the IBM Event Streams service, which is a fully-managed Apache Kafka service running on IBM Cloud® Since then, the service has helped many customers, as well as teams within IBM, resolve scalability and performance problems with the Kafka applications they have written. So, what can you do?

Apache Kafka

Apache Kafka Algorithm Clustering

Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data

AWS Machine Learning Blog

APRIL 18, 2023

per diluted share, for the year ended December 31, 2015. per diluted share, for the year ended December 31, 2015. per diluted share, for the year ended December 31, 2015. per diluted share, for the year ended December 31, 2015. per diluted share, compared to $3,818,000, or $0.21

ML

ML ML Deep Learning Deep Learning

Comparative Analysis: PyTorch vs TensorFlow vs Keras

Pickl AI

AUGUST 22, 2024

Overview of TensorFlow TensorFlow , developed by Google Brain, is a robust and versatile deep learning framework that was introduced in 2015. Scalability TensorFlow can handle large datasets and scale to distributed clusters, making it suitable for training complex models.

Deep Learning

Deep Learning Deep Learning Machine Learning Machine Learning

The Story Continues: Announcing Version 14 of Wolfram Language and Mathematica

Hacker News

JANUARY 9, 2024

One very simple example (introduced in 2015) is Nothing : Another, introduced in 2020, is Splice : An old chestnut of Wolfram Language design concerns the way infinite evaluation loops are handled. but with things like clustering). And in Version 13.2

Python

Python Algorithm Machine Learning Machine Learning

Announcing the ICDAR 2023 Competition on Hierarchical Text Detection and Recognition

Google Research AI blog

MARCH 7, 2023

In this competition, we invite researchers from around the world to build systems that can produce hierarchical annotations of text in images using words clustered into lines and paragraphs. Middle: Illustration of line clustering. Right: Illustration paragraph clustering. Samples from the HierText dataset.

Clustering

Clustering Natural Language Processing Deep Learning Deep Learning

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

Since joining SnapLogic in 2010, Greg has helped design and implement several key platform features including cluster processing, big data processing, the cloud architecture, and machine learning. Greg has published research in the areas of operating systems, parallel computing, and distributed systems.

AI

AI AI AWS Database

Analyzing the history of Tableau innovation

Tableau

DECEMBER 1, 2021

Incredible growth started in 2005 with the company roughly doubling in size every year until 2015. Clustered under visual encoding , we have topics of self-service analysis , authoring , and computer assistance. Gestalt properties including clusters are salient on scatters. The first Tableau customer conference was in 2008.

Tableau

Tableau ML ML Database

What is Snowpark — and Why Does it Matter? A phData Perspective

phData

SEPTEMBER 20, 2023

phData has been working in data engineering since the inception of the company back in 2015. Check out our best practices guide for spark developers using Snowpark! Why is Snowpark Exciting to us? Until now, we’ve had to treat them as different entities.

SQL

SQL Python Data Lakes Machine Learning

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Flipboard

NOVEMBER 24, 2023

Since joining SnapLogic in 2010, Greg has helped design and implement several key platform features including cluster processing, big data processing, the cloud architecture, and machine learning. Greg has published research in the areas of operating systems, parallel computing, and distributed systems.

Database

Database AWS ETL SQL

Analyzing the history of Tableau innovation

Tableau

DECEMBER 1, 2021

Incredible growth started in 2005 with the company roughly doubling in size every year until 2015. Clustered under visual encoding , we have topics of self-service analysis , authoring , and computer assistance. Gestalt properties including clusters are salient on scatters. The first Tableau customer conference was in 2008.

Tableau

Tableau ML ML Database

How Meesho built a generalized feed ranker using Amazon SageMaker inference

AWS Machine Learning Blog

OCTOBER 20, 2023

Meesho was founded in 2015 and today focuses on buyers and sellers across India. We used Dask—a distributed data science computing framework that natively integrates with Python libraries—on Amazon EMR to scale out the training jobs across the cluster. One of the major challenges was to run distributed training at scale.

AWS

AWS Data Scientist ML ML

Exploring Google’s AI Tools: A Deep Dive into the Future of Data Science

ODSC - Open Data Science

OCTOBER 15, 2024

The Evolution of AI Tools at Google Since the release of TensorFlow in 2015, Google has been pushing the boundaries of what is possible with AI and machine learning. Paige explained that Gemini models are used not only for code generation but also for tasks like video analysis and data clustering.

Data Science

Data Science Data Scientist AI AI

Building Meta’s GenAI Infrastructure

The history of Kubernetes

Webinars

Trending Sources

Top 6 Kubernetes use cases

Webinars

For nearly two decades, IBM Consulting has helped power SingHealth’s digital transformation

Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium

Conformer-2: a state-of-the-art speech recognition model trained on 1.1M hours of data

We still have so much to learn from nature

23 Best Free NLP Datasets for Machine Learning

Elon Musk’s xAI Unveils Grok 3 AI Model, Claims Edge Over OpenAI and DeepSeek

Robustness of a Markov Blanket Discovery Approach to Adversarial Attack in Image Segmentation: An…

How an Electrical Engineer Solved Australia’s Most Famous Cold Case

A Spy Satellite You’ve Never Heard of Helped Win the Cold War

Your guide to generative AI and ML at AWS re:Invent 2024

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Coactive AI’s CEO: quality beats quantity for data selection

Coactive AI’s CEO: quality beats quantity for data selection

Coactive AI’s CEO: quality beats quantity for data selection

Demand forecasting at Getir built with Amazon Forecast

Meet the Winners of the Youth Mental Health Narratives Challenge

Best Machine Learning Frameworks for ML Experts in 2023

Financial text generation using a domain-adapted fine-tuned large language model in Amazon SageMaker JumpStart

Introducing spaCy

Open source data visualization options: we compare 5 tools

Building a Predictive Model in KNIME

Netflix Movies and Series Recommendation Systems

Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 2

Five scalability pitfalls to avoid with your Kafka application

Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data

Comparative Analysis: PyTorch vs TensorFlow vs Keras

The Story Continues: Announcing Version 14 of Wolfram Language and Mathematica

Announcing the ICDAR 2023 Competition on Hierarchical Text Detection and Recognition

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

Analyzing the history of Tableau innovation

What is Snowpark — and Why Does it Matter? A phData Perspective

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Analyzing the history of Tableau innovation

How Meesho built a generalized feed ranker using Amazon SageMaker inference

Exploring Google’s AI Tools: A Deep Dive into the Future of Data Science

Stay Connected