Clustering, Database and Demo - Data Science Current

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Towards AI

JANUARY 29, 2025

Retrieval Augmented Generation generally consists of Three major steps, I will explain them briefly down below – Information Retrieval The very first step involves retrieving relevant information from a knowledge base, database, or vector database, where we store the embeddings of the data from which we will retrieve information.

Database

Database Clustering Python SQL

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

For this post we’ll use a provisioned Amazon Redshift cluster. Set up the Amazon Redshift cluster We’ve created a CloudFormation template to set up the Amazon Redshift cluster. Implementation steps Load data to the Amazon Redshift cluster Connect to your Amazon Redshift cluster using Query Editor v2.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Top Gen AI Demos of AI Applications With MLRun

Iguazio

JANUARY 30, 2025

Each of these demos can be adapted to a number of industries and customized to specific needs. You can also watch the complete library of demos here. Output structured data is stored in a database, accessible for reporting or downstream applications. Watch the smart call center analysis app demo.

AI

AI AI Clustering Machine Learning

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

AWS Machine Learning Blog

APRIL 7, 2025

Additionally, we dive into integrating common vector database solutions available for Amazon Bedrock Knowledge Bases and how these integrations enable advanced metadata filtering and querying capabilities.

Visualizing graph data without a graph database

Cambridge Intelligence

OCTOBER 25, 2023

Visualizing graph data doesn’t necessarily depend on a graph database… Working on a graph visualization project? You might assume that graph databases are the way to go – they have the word “graph” in them, after all. Do I need a graph database? It depends on your project. Unstructured? Under construction?

Database

Database Data Modeling Data Models Algorithm

Citus 12: Schema-based sharding for PostgreSQL

Hacker News

JULY 18, 2023

What if you could automatically shard your PostgreSQL database across any number of servers and get industry-leading performance at scale without any special data modelling steps? And if you want to see demos of some of this functionality, be sure to join us for the livestream of the Citus 12.0 Updates page. Let’s dive in!

Database

Database SQL Data Models Data Modeling

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. Enter a stack name, such as Demo-Redshift. This is the maximum allowed number of domains in each supported Region.

ML

ML ML AWS Data Warehouse

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical document workloads at virtually any scale without managing infrastructure. Enter a connection name such as demo and choose your desired Amazon DocumentDB cluster. Choose Add connection.

Machine Learning

Machine Learning Machine Learning AWS ML

How to Split Text For Vector Embeddings in Snowflake

phData

NOVEMBER 28, 2024

“ Vector Databases are completely different from your cloud data warehouse.” – You might have heard that statement if you are involved in creating vector embeddings for your RAG-based Gen AI applications. Enhanced Search and Retrieval Augmented Generation: Vector search systems work by matching queries with embeddings in a database.

Python

Python Database SQL Machine Learning

Getting started with Amazon Titan Text Embeddings

AWS Machine Learning Blog

JANUARY 31, 2024

Amazon Titan Text Embeddings is a text embeddings model that converts natural language text—consisting of single words, phrases, or even large documents—into numerical representations that can be used to power use cases such as search, personalization, and clustering based on semantic similarity.

Natural Language Processing

Natural Language Processing AWS Machine Learning Machine Learning

Get Creative with AI Forecasting in Changing Economic Conditions

DataRobot Blog

OCTOBER 4, 2022

In this blog, we’ll review the DataRobot new Time Series clustering feature, which gives you a creative edge to build time series forecasting models by automatically grouping series that are identical to each other and then building models tailored to these groups. You can also connect to Snowflake, Azure, Redshift and many other databases.

Clustering

Clustering AI AI Azure

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

AWS Machine Learning Blog

MAY 30, 2024

In this post, we describe how CBRE partnered with AWS Prototyping to develop a custom query environment allowing natural language query (NLQ) prompts by using Amazon Bedrock, AWS Lambda , Amazon Relational Database Service (Amazon RDS), and Amazon OpenSearch Service. Embeddings were generated using Amazon Titan.

AWS

AWS SQL Database AI

Teaching AI to Smell by Using DataRobot

DataRobot

JUNE 10, 2021

The database used for this competition is based on the Perfumery Materials & Performance dataset by Leffingwell & Associates and the Good Scents Company Information system. Upon further reflection of the embeddings, it’s possible to see clusters of particular molecules. Request a demo. See DataRobot in Action.

Clustering

Clustering Machine Learning Machine Learning AI

Forecast Time Series at Scale with Google BigQuery and DataRobot

DataRobot Blog

NOVEMBER 3, 2022

To understand how DataRobot AI Cloud and Big Query can align, let’s explore how DataRobot AI Cloud Time Series capabilities help enterprises with three specific areas: segmented modeling, clustering, and explainability. Enable Granular Forecasts with Clustering. This is where clustering comes in.

Clustering

Clustering Data Scientist Exploratory Data Analysis AI

Introducing the MLOps Management Agent

DataRobot

JUNE 16, 2021

Additionally, we have recently announced a partnership and integration with Snowflake to expand deployment options by bringing models directly into the database. To see a demo or to learn how it can be applied to your current use cases, reach out to your DataRobot account team or request a demo today. Request a Demo.

Azure

Azure Data Science Clustering AWS

Observability in LLMOps: Different Levels of Scale

The MLOps Blog

AUGUST 15, 2024

We’re working with super-large GPU clusters and are looking at training runs that take weeks or months. Retrieval Augmented Generation (RAG) systems add a vector database and embeddings to the mix, which require dedicated observability tooling. Pretraining is undoubtedly the most expensive activity.

Database

Database Clustering ML ML

Open source data visualization options: we compare 5 tools

Cambridge Intelligence

FEBRUARY 20, 2025

Our graph visualization SDKs include performance demos, so you can run layouts of thousands of chart items and monitor the frames per second (FPS) rate for comparison. Format: Open source automatic graph drawing/design tool that uses a simple graph description language (DOT) for nodes, edges, clusters etc. Cytoscape.js

Data Visualization

Data Visualization Algorithm Data Analyst Clustering

How to optimize Google Cloud Platform cloud costs with IBM Turbonomic

IBM Journey to AI blog

MAY 1, 2023

Here, you can find information on the actions and the corresponding workload, such as the container cluster, the namespace and the risk posed to the workload (which, in this case, is transaction congestion): Figure 5 In Figure 6 below, you can see how Turbonomic provides the rationale behind taking the action.

Clustering

Clustering Database Analytics Analytics

GenAI for Aerospace: Empowering the workforce with expert knowledge on Amazon Q and Amazon Bedrock

AWS Machine Learning Blog

SEPTEMBER 26, 2024

This architecture combines a general-purpose large language model (LLM) with a customer-specific document database, which is accessed through a semantic search engine. Because RAG uses a semantic search, it can find more relevant material in the database than just a keyword match alone. Choose Next. Choose Next.

AWS

AWS AI AI Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Kubeflow integrates with popular ML frameworks, supports versioning and collaboration, and simplifies the deployment and management of ML pipelines on Kubernetes clusters. Dolt Dolt is an open-source relational database system built on Git. Check out the Kubeflow documentation.

Machine Learning

Machine Learning Machine Learning ML ML

Build a cybersecurity dashboard to fight alert fatigue

Cambridge Intelligence

JULY 26, 2023

Let’s jump ahead to a few days later, when a red alert shows our database server exchanging a huge number of packets with an external entity. Request full access to our KronoGraph SDK, demos and live-coding playground. What other activity on this file server happened immediately before or after the policy violation?

Clustering

Clustering Data Visualization Database

Understanding earthquakes: what map visualizations teach us

Cambridge Intelligence

NOVEMBER 8, 2023

FREE: The ultimate guide to graph visualization Proven strategies for building successful graph visualization applications GET YOUR FREE GUIDE The earthquakes data source The data I used is from the USGS’s National Earthquake Information Center (NEIC), whose extensive databases of seismic information are freely available. Tōhoku earthquake.

Data Visualization

Data Visualization Clustering Database Data Models

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

AWS Machine Learning Blog

MAY 2, 2023

For example, a health insurance company may want their question answering bot to answer questions using the latest information stored in their enterprise document repository or database, so the answers are accurate and reflect their unique business rules. In this demo, we use a Jumpstart Flan T5 XXL model endpoint.

Algorithm

Algorithm Machine Learning Machine Learning Natural Language Processing

Which is better, retrieval augmentation (RAG) or fine-tuning? Both.

Snorkel AI

SEPTEMBER 20, 2023

For example, if a data team wants to use an LLM to examine financial documents—something the model may perform poorly on out of the box—the team can fine-tune it on something like the Financial Documents Clustering data set. This information could come from: A vector database such as FAISS or Pinecone. Book a demo today.

Data Science

Data Science Data Scientist Database AI

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

It won’t be a long demo, it’ll be a very quick demo of what you can do and how you can operationalize stuff in Snowflake. And then once they’re done with that, it’s very easy to package up, and you’ll see that in the demo today. The demo is actually very simple.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

It won’t be a long demo, it’ll be a very quick demo of what you can do and how you can operationalize stuff in Snowflake. And then once they’re done with that, it’s very easy to package up, and you’ll see that in the demo today. The demo is actually very simple.

SQL

SQL ML ML Python

Generative AI in the Enterprise

O'Reilly Media

NOVEMBER 28, 2023

If we asked whether their companies were using databases or web servers, no doubt 100% of the respondents would have said “yes.” And there are tools for archiving and indexing prompts for reuse, vector databases for retrieving documents that an AI can use to answer a question, and much more. We expect others to follow.

AI

AI AI Data Analysis Data Analysis

Dialogue-guided intelligent document processing with foundation models on Amazon SageMaker JumpStart

AWS Machine Learning Blog

MAY 24, 2023

Finally, we store these vectors in a vector database for similarity search. As an alternative, you can use FAISS , an open-source vector clustering solution for storing vectors. One of the key features is its ability to interface with external sources of information, such as the web, databases, and APIs.

AI

AI AWS AI ML

12 Standout Deep Learning Talks Coming to ODSC East this May

ODSC - Open Data Science

APRIL 19, 2023

With Dr. Jon Krohn you’ll also get hands-on code demos in Jupyter notebooks and strategic advice for overcoming common pitfalls. Here, Weaviate will be introduced as an open-source vector search database with unique features for serving millions of users worldwide.

Deep Learning

Deep Learning Deep Learning Machine Learning Machine Learning

Top 50+ Data Analyst Interview Questions & Answers

Pickl AI

APRIL 26, 2024

SQL stands for Structured Query Language, essential for querying and manipulating data stored in relational databases. The SELECT statement retrieves data from a database, while SELECT DISTINCT eliminates duplicate rows from the result set. Additional Benefits Free demo sessions. How do you join tables in SQL?

Data Analyst

Data Analyst Data Analysis Data Analysis Machine Learning

Prodigy: A new tool for radically efficient machine teaching

Explosion

AUGUST 3, 2017

To keep the system requirements to a minimum, data is stored in an SQLite database by default. Try the live demo! However, the unsupervised algorithm won’t usually return clusters that map neatly to the labels you care about. It’s easy to use a different SQL backend, or to specify a custom storage solution.

Supervised Learning

Supervised Learning Python Machine Learning Machine Learning

Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects

Explosion

DECEMBER 14, 2021

Clustering health aspects ? We also have some cool Healthsea demos hosted on Hugging Face spaces ? Healthsea Demo Visualization of 1 million analyzed reviews with Healthsea ✨ Healthsea Pipeline Visualization of individual processing steps of the Healthsea pipeline ? You can try out a demo of the Benepar parser here.

Clustering

Clustering Machine Learning Machine Learning Natural Language Processing

Financial text generation using a domain-adapted fine-tuned large language model in Amazon SageMaker JumpStart

AWS Machine Learning Blog

APRIL 18, 2023

All the steps in this demo are available in the accompanying notebook Fine-tuning text generation GPT-J 6B model on a domain specific dataset. We serve developers and enterprises of all sizes through AWS, which offers a broad set of global compute, storage, database, and other service offerings.

ML

ML ML Deep Learning Deep Learning

Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data

AWS Machine Learning Blog

APRIL 18, 2023

All the steps in this demo are available in the accompanying notebook Fine-tuning text generation GPT-J 6B model on a domain specific dataset. We serve developers and enterprises of all sizes through AWS, which offers a broad set of global compute, storage, database, and other service offerings.

ML

ML ML Deep Learning Deep Learning

Meet the winners of the Research Rovers: AI Research Assistants for NASA Challenge

DrivenData Labs

DECEMBER 10, 2023

or GPT-4 arXiv, OpenAlex, CrossRef, NTRS lgarma Topic clustering and visualization, paper recommendation, saved research collections, keyword extraction GPT-3.5 Currently, published research may be spread across a variety of different publishers, including free and open-source ones like those used in many of this challenge's demos (e.g.

AI

AI AI Natural Language Processing Artificial Intelligence

The Shift from Models to Compound AI Systems

BAIR

FEBRUARY 17, 2024

We frequently see this with LLM users, where a good LLM creates a compelling but frustratingly unreliable first demo, and engineering teams then go on to systematically raise quality. Systems can be dynamic. Machine learning models are inherently limited because they are trained on static datasets, so their “knowledge” is fixed.

AI

AI AI DataOps Data Pipeline

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

Agent Creator is a versatile extension to the SnapLogic platform that is compatible with modern databases, APIs, and even legacy mainframe systems, fostering seamless integration across various data environments. The following demo shows Agent Creator in action. Chunker Snap – Segments large texts into manageable pieces.

AI

AI AI AWS Database

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Flipboard

NOVEMBER 24, 2023

The SnapLogic Intelligent Integration Platform (IIP) enables organizations to realize enterprise-wide automation by connecting their entire ecosystem of applications, databases, big data, machines and devices, APIs, and more with pre-built, intelligent connectors called Snaps.

Database

Database AWS ETL SQL

The Shift from Models to Compound AI Systems

BAIR

FEBRUARY 18, 2024

We frequently see this with LLM users, where a good LLM creates a compelling but frustratingly unreliable first demo, and engineering teams then go on to systematically raise quality. Systems can be dynamic. Machine learning models are inherently limited because they are trained on static datasets, so their “knowledge” is fixed.

AI

AI AI DataOps Data Pipeline

Analyzing the history of Tableau innovation

Tableau

DECEMBER 1, 2021

Chris had earned an undergraduate computer science degree from Simon Fraser University and had worked as a database-oriented software engineer. In 2004, Tableau got both an initial series A of venture funding and Tableau’s first EOM contract with the database company Hyperion—that’s when I was hired. Let’s take a look at each. .

Tableau

Tableau ML ML Database

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

AWS Machine Learning Blog

MARCH 19, 2024

We use Cohere Command and AI21 Labs Jurassic-2 Mid for this demo. DynamoDB table An application running on AWS uses an Amazon Aurora Multi-AZ DB cluster deployment for its database. Enable read-through caching on the Aurora database. Create a second Aurora database and link it to the primary database as a read replica.

Database

Database AWS Python Natural Language Processing

Analyzing the history of Tableau innovation

Tableau

DECEMBER 1, 2021

Chris had earned an undergraduate computer science degree from Simon Fraser University and had worked as a database-oriented software engineer. In 2004, Tableau got both an initial series A of venture funding and Tableau’s first OEM contract with the database company Hyperion—that’s when I was hired. Let’s take a look at each. .

Tableau

Tableau ML ML Database

Unleashing the power of Presto: The Uber case study

IBM Journey to AI blog

SEPTEMBER 25, 2023

When a query is constructed, it passes through a cost-based optimizer, then data is accessed through connectors, cached for performance and analyzed across a series of servers in a cluster. They stood up a file-based data lake alongside their analytical database. Uber has made the Presto query engine connect to real-time databases.

Data Lakes

Data Lakes Analytics Analytics Clustering

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

AWS Machine Learning Blog

APRIL 19, 2024

These controllers allow Kubernetes users to provision AWS resources like buckets, databases, or message queues simply by using the Kubernetes API. Prerequisites To follow along, you should have a Kubernetes cluster with the SageMaker ACK controller v1.2.9 Release v1.2.9 Now you also can use them with SageMaker Operators for Kubernetes.

AWS

AWS ML ML Machine Learning

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Trending Sources

Top Gen AI Demos of AI Applications With MLRun

Webinars

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

Visualizing graph data without a graph database

Citus 12: Schema-based sharding for PostgreSQL

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

How to Split Text For Vector Embeddings in Snowflake

Getting started with Amazon Titan Text Embeddings

Get Creative with AI Forecasting in Changing Economic Conditions

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

Teaching AI to Smell by Using DataRobot

Forecast Time Series at Scale with Google BigQuery and DataRobot

Introducing the MLOps Management Agent

Observability in LLMOps: Different Levels of Scale

Open source data visualization options: we compare 5 tools

How to optimize Google Cloud Platform cloud costs with IBM Turbonomic

GenAI for Aerospace: Empowering the workforce with expert knowledge on Amazon Q and Amazon Bedrock

MLOps Landscape in 2023: Top Tools and Platforms

Build a cybersecurity dashboard to fight alert fatigue

Understanding earthquakes: what map visualizations teach us

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

Which is better, retrieval augmentation (RAG) or fine-tuning? Both.

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Generative AI in the Enterprise

Dialogue-guided intelligent document processing with foundation models on Amazon SageMaker JumpStart

12 Standout Deep Learning Talks Coming to ODSC East this May

Top 50+ Data Analyst Interview Questions & Answers

Prodigy: A new tool for radically efficient machine teaching

Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects

Financial text generation using a domain-adapted fine-tuned large language model in Amazon SageMaker JumpStart

Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data

Meet the winners of the Research Rovers: AI Research Assistants for NASA Challenge

The Shift from Models to Compound AI Systems

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

The Shift from Models to Compound AI Systems

Analyzing the history of Tableau innovation

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

Analyzing the history of Tableau innovation

Unleashing the power of Presto: The Uber case study

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

Stay Connected