Database and Document - Data Science Current

NoSQL Databases and Their Use Cases

KDnuggets

MARCH 16, 2023

Learn about NoSQL Databases and their types like key-value, document, graph and column family with their use cases.

Database

Database SQL

Ask your Documents with Langchain and Deep Lake!

Analytics Vidhya

SEPTEMBER 14, 2023

Introduction Large Language Models like langchain and deep lake have come a long way in Document Q&A and information retrieval. However, a […] The post Ask your Documents with Langchain and Deep Lake! These models know a lot about the world, but sometimes, they struggle to know when they don’t know something.

Analytics

Analytics Analytics Database Python

Building Multi-Document Agentic RAG using LLamaIndex

Analytics Vidhya

SEPTEMBER 5, 2024

Enter Multi-Document Agentic RAG – a powerful approach that combines Retrieval-Augmented Generation (RAG) with agent-based systems to create AI that can reason across multiple documents.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Analytics Analytics

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

A New Era of Text Generation: RAG, LangChain, and Vector Databases

Analytics Vidhya

NOVEMBER 5, 2023

One such groundbreaking approach is Retrieval Augmented Generation (RAG), which combines the power of generative models like GPT (Generative Pretrained Transformer) with the efficiency of vector databases and langchain.

Database

Database Natural Language Processing Analytics Analytics

A Deep Dive into Qdrant, the Rust-Based Vector Database

Analytics Vidhya

NOVEMBER 21, 2023

Introduction Vector Databases have become the go-to place for storing and indexing the representations of unstructured and structured data. In the ever-evolving landscape of […] The post A Deep Dive into Qdrant, the Rust-Based Vector Database appeared first on Analytics Vidhya.

Database

Database Deep Learning Deep Learning Analytics

Build Semantic Search Applications Using Open Source Vector Database ChromaDB

Analytics Vidhya

JULY 18, 2023

Among such tools, today we will learn about the workings and functions of ChromaDB, an open-source vector database to store embeddings from […] The post Build Semantic Search Applications Using Open Source Vector Database ChromaDB appeared first on Analytics Vidhya.

Database

Database Analytics Analytics AI

CRUD Operations in MongoDB

Analytics Vidhya

DECEMBER 13, 2022

Introduction MongoDB is a type of NoSQL Database, that stores data in document format(bson or binary json format). Its advantage over traditional SQL Databases includes the flexibility of schema-design, relaxation of its ACID properties and its distributed data storage capability thus performing better for […].

SQL

SQL Database Data Science Analytics

Building Custom Q&A Applications Using LangChain and Pinecone Vector Database

Analytics Vidhya

AUGUST 19, 2023

One of the fascinating applications of these models is developing custom question-answering or chatbots that draw from personal or organizational data sources. […] The post Building Custom Q&A Applications Using LangChain and Pinecone Vector Database appeared first on Analytics Vidhya.

Database

Database Artificial Intelligence Artificial Intelligence Analytics

Introduction to Apache CouchDB using Python

Analytics Vidhya

JULY 23, 2022

Introduction Apache CouchDB is an open-source, document-based NoSQL database developed by Apache Software Foundation and used by big companies like Apple, GenCorp Technologies, and Wells Fargo. This article was published as a part of the Data Science Blogathon.

Python

Python Database Data Science Analytics

Vector Streaming: Memory-efficient Indexing with Rust

Analytics Vidhya

SEPTEMBER 17, 2024

Introduction Vector streaming in EmbedAnything is being introduced, a feature designed to optimize large-scale document embedding. Today, I will show how to integrate it with the Weaviate Vector Database for seamless image embedding and search.

Database

Database Analytics Analytics

50+ MongoDB Interview Questions and Answers

Analytics Vidhya

JULY 18, 2024

Introduction MongoDB is a NoSQL database offering high performance and scalability. It stores data as documents, similar to JSON objects, allowing for complex structures like nested documents and arrays. It also reduces the need for joins with embedded documents and arrays.

Database

Database Analytics Analytics

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

AWS Machine Learning Blog

APRIL 7, 2025

Additionally, we dive into integrating common vector database solutions available for Amazon Bedrock Knowledge Bases and how these integrations enable advanced metadata filtering and querying capabilities. Using the query embedding and the metadata filter, relevant documents are retrieved from the knowledge base.

Database

Database AWS Natural Language Processing AI

Top vector databases in market

Data Science Dojo

AUGUST 3, 2023

A vector database is a type of database that stores data as high-dimensional vectors. One way to think about a vector database is as a way of storing and organizing data that is similar to how the human brain stores and organizes memories. Pinecone is a vector database that is designed for machine learning applications.

Database

Database Natural Language Processing Machine Learning Machine Learning

Introduction to Elasticsearch using Python

Analytics Vidhya

JULY 18, 2022

Introduction Elasticsearch is primarily a document-based NoSQL database, meaning developers do not need any prior knowledge of SQL to use it. Still, it is much more than just a NoSQL database. This article was published as a part of the Data Science Blogathon.

Python

Python SQL Database Data Science

How to Develop Serverless Code Using Azure Functions?

Analytics Vidhya

JANUARY 30, 2023

Whether we are analyzing IoT data streams, managing scheduled events, processing document uploads, responding to database changes, etc. Azure functions allow developers […] The post How to Develop Serverless Code Using Azure Functions? appeared first on Analytics Vidhya.

Azure

Azure Database Analytics Analytics

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. or a later version) database.

ETL

ETL Data Warehouse Analytics Analytics

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

AWS Machine Learning Blog

NOVEMBER 20, 2024

Whether it’s structured data in databases or unstructured content in document repositories, enterprises often struggle to efficiently query and use this wealth of information. The solution combines data from an Amazon Aurora MySQL-Compatible Edition database and data stored in an Amazon Simple Storage Service (Amazon S3) bucket.

Database

Database AWS SQL ETL

How To Create An Aggregation Pipeline In MongoDB

Analytics Vidhya

APRIL 12, 2021

Introduction MongoDB is a free open-source No-SQL document database. ArticleVideo Book This article was published as a part of the Data Science Blogathon. The post How To Create An Aggregation Pipeline In MongoDB appeared first on Analytics Vidhya.

SQL

SQL Data Science Database Analytics

Understanding databases: A comprehensive guide to different types for beginners

Data Science Dojo

APRIL 6, 2023

While Python and R are popular for analysis and machine learning, SQL and database management are often overlooked. However, data is typically stored in databases and requires SQL or business intelligence tools for access. In this guide, we provide a comprehensive overview of various types of databases and their differences.

Database

Database SQL Data Science Business Intelligence

Master Vector Embeddings with Weaviate – A Comprehensive Series for You!

Data Science Dojo

JANUARY 22, 2025

Heres how embeddings power these advanced systems: Semantic Understanding LLMs use embeddings to represent words, sentences, and entire documents in a way that captures their semantic meaning. The process enables the models to find the most relevant sections of a document or dataset, improving the accuracy and relevance of their outputs.

Database

Database ML ML AI

Understanding the popular database management system: MySQL

Data Science Dojo

MARCH 25, 2024

MySQL is a popular database management system that is used globally and across different domains. It is one of the most popular database management systems (DBMS) globally that supports all major operating systems: Linux, macOS, and Windows. Databases are stored on a server, which is typically a remote computer or a cloud server.

Database

Database SQL

Understanding the popular database management system: MySQL

Data Science Dojo

MARCH 25, 2024

MySQL is a popular database management system that is used globally and across different domains. It is one of the most popular database management systems (DBMS) globally that supports all major operating systems: Linux, macOS, and Windows. Databases are stored on a server, which is typically a remote computer or a cloud server.

Database

Database SQL

Retrieval augmented generation (RAG) – Elevate your large language models experience

Data Science Dojo

DECEMBER 6, 2023

This process is typically facilitated by document loaders, which provide a “load” method for accessing and loading documents into the memory. This involves splitting lengthy documents into smaller chunks that are compatible with the model and produce accurate and clear results.

Database

Database Data Preparation Algorithm AI

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Towards AI

JANUARY 29, 2025

Retrieval Augmented Generation generally consists of Three major steps, I will explain them briefly down below – Information Retrieval The very first step involves retrieving relevant information from a knowledge base, database, or vector database, where we store the embeddings of the data from which we will retrieve information.

Database

Database Clustering Python SQL

Perplexity acquires Carbon, a Seattle startup that helps developers connect data sources to LLMs

Flipboard

DECEMBER 18, 2024

. “Carbon will make it easier for Perplexity’s answer engine to be informed by diverse sources of information, whether that data resides in internal databases, cloud storage, or document repositories.” ” Carbon raised a $1.3 million seed round in 2023.

Computer Science

Computer Science Computer Science Database AI

Databases are the unsung heroes of AI

Dataconomy

AUGUST 7, 2023

Artificial intelligence is no longer fiction and the role of AI databases has emerged as a cornerstone in driving innovation and progress. An AI database is not merely a repository of information but a dynamic and specialized system meticulously crafted to cater to the intricate demands of AI and ML applications.

Database

Database AI AI ML

Complete roadmap of LlamaIndex to Creating Personalized Q&A Chatbots

Data Science Dojo

SEPTEMBER 28, 2023

It supports a variety of data sources, including APIs, databases, and PDFs. Key components of LlamaIndex: The key components of LlamaIndex are as follows: Data connectors:  These components allow LlamaIndex to ingest data from a variety of sources, such as APIs, databases, and PDFs.

Natural Language Processing

Natural Language Processing Database Data Science Analytics

Natural Language Processing Using CNNs for Sentence Classification

Analytics Vidhya

SEPTEMBER 2, 2021

This article was published as a part of the Data Science Blogathon Overview Sentence classification is one of the simplest NLP tasks that have a wide range of applications including document classification, spam filtering, and sentiment analysis. A question database will be used for this article and […].

Natural Language Processing

Natural Language Processing Data Science Database Analytics

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

Flipboard

JANUARY 14, 2025

The documents uploaded to the knowledge base on the rack might be private and sensitive documents, so they wont be transferred to the AWS Region and will remain completely local on the Outpost rack. This vector database will store the vector representations of your documents, serving as a key component of your local Knowledge Base.

AWS

AWS Database AI AI

Interactive SQLite Documentation: Experiment with Queries in Real-Time

Hacker News

MARCH 5, 2024

At SQLite Cloud, we are dedicated to making database management as seamless and intuitive as possible. Today, we are thrilled to unveil a groundbreaking addition to our platform - the Interactive SQLite Documentation! Now, alongside our comprehensive.

Database

Cost-effective document classification using the Amazon Titan Multimodal Embeddings Model

AWS Machine Learning Blog

APRIL 11, 2024

Organizations across industries want to categorize and extract insights from high volumes of documents of different formats. Manually processing these documents to classify and extract information remains expensive, error prone, and difficult to scale. Categorizing documents is an important first step in IDP systems.

Database

Database AWS Algorithm Machine Learning

Kronotop: Redis-compatible, transactional document store backed by FoundationDB

Hacker News

JANUARY 20, 2025

Kronotop is a Redis-compatible, distributed and transactional document database backed by FoundationDB. kronotop/kronotop

Database

Protect sensitive data in RAG applications with Amazon Bedrock

Flipboard

APRIL 23, 2025

RAG workflow: Converting data to actionable knowledge RAG consists of two major steps: Ingestion Preprocessing unstructured data, which includes converting the data into text documents and splitting the documents into chunks. Document chunks are then encoded with an embedding model to convert them to document embeddings.

AWS

AWS ML ML AI

A Beginner’s Guide to MongoDB and CRUD Operations

Analytics Vidhya

MAY 31, 2023

Introduction In this guide, we will explore the fundamentals of MongoDB and delve into the essential CRUD (Create, Read, Update, Delete) operations that form the backbone of any database system.

Database

Database Analytics Analytics SQL

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Flipboard

NOVEMBER 19, 2024

A common adoption pattern is to introduce document search tools to internal teams, especially advanced document searches based on semantic search. In a real-world scenario, organizations want to make sure their users access only documents they are entitled to access. The following diagram depicts the solution architecture.

AWS

AWS AI AI Big Data

Enhance Your LLM Agents with BM25: Lightweight Retrieval That Works

Towards AI

APRIL 28, 2025

Models like Sentence Transformers map words, sentences, or documents into high-dimensional vectors. To find relevant text, you compare vectors using metrics like cosine similarity, retrieving documents whose embeddings are closest to the query embedding. It scores documents based on: 1. from rank_bm25 import BM25Okapi# 1.

Python

Python Database AI AI

What is LangChain? Key Features, Tools, and Use Cases

Data Science Dojo

OCTOBER 24, 2024

It also connects effortlessly with collaboration tools like Airtable, Trello, Figma, and Notion, as well as databases including Pandas, MongoDB, and Microsoft databases. For instance, a healthcare application could integrate patient data from a secure database with the latest medical research.

Database

Database Natural Language Processing AI AI

Fauna Service Winding Down

Hacker News

MARCH 19, 2025

The truly serverless database that combines the power of a relational database with the flexibility of JSON documents.

Database

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

AWS Machine Learning Blog

AUGUST 9, 2024

Question and answering (Q&A) using documents is a commonly used application in various use cases like customer support chatbots, legal research assistants, and healthcare advisors. In this collaboration, the AWS GenAIIC team created a RAG-based solution for Deltek to enable Q&A on single and multiple government solicitation documents.

AWS

AWS Database AI AI

Effectively use prompt caching on Amazon Bedrock

AWS Machine Learning Blog

APRIL 7, 2025

The following use cases are well-suited for prompt caching: Chat with document By caching the document as input context on the first request, each user query becomes more efficient, enabling simpler architectures that avoid heavier solutions like vector databases. Please follow these detailed instructions:" "nn1.

AWS

AWS AI AI ML

Leaked Midjourney artist database could be a moment of reckoning for AI art

Flipboard

JANUARY 4, 2024

Over 16,000 artists are named in the document.

Database

Database AI AI

Easy Late-Chunking With Chonkie

Towards AI

FEBRUARY 5, 2025

This article breaks down what Late Chunking is, why its essential for embedding larger or more intricate documents, and how to build it into your search pipeline using Chonkie and KDB.AI When you have a document that spans thousands of words, encoding it into a single embedding often isnt optimal. as the vector store. Image By Author.

Database

Database Clustering AI AI

What is an LLM Bootcamp? What Does Data Science Dojo Offer for Your Success?

Data Science Dojo

NOVEMBER 5, 2024

It covers a range of topics including generative AI, LLM basics, natural language processing, vector databases, prompt engineering, and much more. You get a chance to work on various projects that involve practical exercises with vector databases, embeddings, and deployment frameworks.

Data Science

Data Science Azure Natural Language Processing Database

Top 10 Python packages you need to master to maximize your coding productivity

Data Science Dojo

MAY 1, 2023

It is designed to simplify the process of working with databases by providing a consistent and high-level interface. It offers a set of utilities and abstractions that make it easier to interact with relational databases using SQL queries. BeautifulSoup BeautifulSoup is a Python library for parsing HTML and XML documents.

Python

Python Machine Learning Machine Learning Data Science

NoSQL Databases and Their Use Cases

Ask your Documents with Langchain and Deep Lake!

Webinars

Trending Sources

Building Multi-Document Agentic RAG using LLamaIndex

Webinars

A New Era of Text Generation: RAG, LangChain, and Vector Databases

A Deep Dive into Qdrant, the Rust-Based Vector Database

Build Semantic Search Applications Using Open Source Vector Database ChromaDB

CRUD Operations in MongoDB

Building Custom Q&A Applications Using LangChain and Pinecone Vector Database

Introduction to Apache CouchDB using Python

Vector Streaming: Memory-efficient Indexing with Rust

50+ MongoDB Interview Questions and Answers

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

Top vector databases in market

Introduction to Elasticsearch using Python

How to Develop Serverless Code Using Azure Functions?

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

How To Create An Aggregation Pipeline In MongoDB

Understanding databases: A comprehensive guide to different types for beginners

Master Vector Embeddings with Weaviate – A Comprehensive Series for You!

Understanding the popular database management system: MySQL

Understanding the popular database management system: MySQL

Retrieval augmented generation (RAG) – Elevate your large language models experience

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Perplexity acquires Carbon, a Seattle startup that helps developers connect data sources to LLMs

Databases are the unsung heroes of AI

Complete roadmap of LlamaIndex to Creating Personalized Q&A Chatbots

Natural Language Processing Using CNNs for Sentence Classification

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

Interactive SQLite Documentation: Experiment with Queries in Real-Time

Cost-effective document classification using the Amazon Titan Multimodal Embeddings Model

Kronotop: Redis-compatible, transactional document store backed by FoundationDB

Protect sensitive data in RAG applications with Amazon Bedrock

A Beginner’s Guide to MongoDB and CRUD Operations

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Enhance Your LLM Agents with BM25: Lightweight Retrieval That Works

What is LangChain? Key Features, Tools, and Use Cases

Fauna Service Winding Down

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

Effectively use prompt caching on Amazon Bedrock

Leaked Midjourney artist database could be a moment of reckoning for AI art

Easy Late-Chunking With Chonkie

What is an LLM Bootcamp? What Does Data Science Dojo Offer for Your Success?

Top 10 Python packages you need to master to maximize your coding productivity

Stay Connected