Data Preparation, Database and Natural Language Processing

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Data Science Dojo

NOVEMBER 27, 2024

Whether you’re an expert, a curious learner, or just love data science and AI, there’s something here for you to learn about the fundamental concepts. They cover everything from the basics like embeddings and vector databases to the newest breakthroughs in tools. Link to blog -> What is LangChain?

Data Science

Data Science Natural Language Processing AI AI

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

Knowledge base – You need a knowledge base created in Amazon Bedrock with ingested data and metadata. For detailed instructions on setting up a knowledge base, including data preparation, metadata creation, and step-by-step guidance, refer to Amazon Bedrock Knowledge Bases now supports metadata filtering to improve retrieval accuracy.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Harnessing LLM chatbots: Real-life applications, building techniques and LangChain’s Finetuning

Data Science Dojo

AUGUST 1, 2023

The resulting vector representations can then be stored in a vector database. This could involve using a hierarchical file system or a database. Step 3: Store vector embeddings Save the vector embeddings obtained from the embedding model in a Vector Database. The original text can be stored in a separate database or file system.

Database

Database AI AI Natural Language Processing

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

Multimodal Retrieval Augmented Generation (MM-RAG) is emerging as a powerful evolution of traditional RAG systems, addressing limitations and expanding capabilities across diverse data types. Traditionally, RAG systems were text-centric, retrieving information from large text databases to provide relevant context for language models.

AWS

AWS Computer Science Computer Science Database

A comprehensive comparison of RPA and ML

Dataconomy

MARCH 27, 2023

Definition and purpose of RPA Robotic process automation refers to the use of software robots to automate rule-based business processes. RPA tools can be programmed to interact with various systems, such as web applications, databases, and desktop applications.

ML

ML ML Machine Learning Machine Learning

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

As AI adoption continues to accelerate, developing efficient mechanisms for digesting and learning from unstructured data becomes even more critical in the future. This could involve better preprocessing tools, semi-supervised learning techniques, and advances in natural language processing. Choose your domain.

Data Preparation

Data Preparation AI AI Python

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

AWS Machine Learning Blog

JULY 11, 2024

RAG provides additional knowledge to the LLM through its input prompt space and its architecture typically consists of the following components: Indexing : Prepare a corpus of unstructured text, parse and chunk it, and then, embed each chunk and store it in a vector database.

AWS

AWS ML ML Machine Learning

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Solution overview With SageMaker Studio JupyterLab notebook’s SQL integration, you can now connect to popular data sources like Snowflake, Athena, Amazon Redshift, and Amazon DataZone. For example, you can visually explore data sources like databases, tables, and schemas directly from your JupyterLab ecosystem.

SQL

SQL AWS Database Data Scientist

Turn the face of your business from chaos to clarity

Dataconomy

JULY 28, 2023

Data preprocessing is a fundamental and essential step in the field of sentiment analysis, a prominent branch of natural language processing (NLP). These tools offer a wide range of functionalities to handle complex data preparation tasks efficiently.

Power BI

Power BI Data Preparation Exploratory Data Analysis Machine Learning

AI Development Lifecycle Learnings of What Changed with LLMs

ODSC - Open Data Science

FEBRUARY 5, 2025

The Evolving AI Development Lifecycle Despite the revolutionary capabilities of LLMs, the core development lifecycle established by traditional natural language processing remains essential: Plan, Prepare Data, Engineer Model, Evaluate, Deploy, Operate, and Monitor. For instance: Data Preparation: GoogleSheets.

Data Preparation

Data Preparation AI AI Data Scientist

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

An intelligent document processing (IDP) project usually combines optical character recognition (OCR) and natural language processing (NLP) to read and understand a document and extract specific entities or phrases. Sensitive data in these data stores needs to be secured.

AWS

AWS ML ML Machine Learning

Artificial Intelligence Using Python: A Comprehensive Guide

Pickl AI

JULY 12, 2024

Neural networks are inspired by the structure of the human brain, and they are able to learn complex patterns in data. Deep Learning has been used to achieve state-of-the-art results in a variety of tasks, including image recognition, Natural Language Processing, and speech recognition.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Python Natural Language Processing

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

The final retrieval augmentation workflow covers the following high-level steps: The user query is used for a retriever component, which does a vector search, to retrieve the most relevant context from our database. A vector database provides efficient vector similarity search by providing specialized indexes like k-NN indexes.

AWS

AWS Database AI AI

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

This allows users to accomplish different Natural Language Processing (NLP) functional tasks and take advantage of IBM vetted pre-trained open-source foundation models. Encoder-decoder and decoder-only large language models are available in the Prompt Lab today. To bridge the tuning gap, watsonx.ai

AI

AI AI Machine Learning Machine Learning

A comprehensive comparison of RPA and ML

Dataconomy

MARCH 27, 2023

Definition and purpose of RPA Robotic process automation refers to the use of software robots to automate rule-based business processes. RPA tools can be programmed to interact with various systems, such as web applications, databases, and desktop applications.

ML

ML ML Machine Learning Machine Learning

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

AWS

AWS Clustering Big Data Big Data

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

Amazon Kendra is a highly accurate and intelligent search service that enables users to search unstructured and structured data using natural language processing (NLP) and advanced search algorithms. The following screenshot shows the Data Catalog schema. We have completed the data preparation step.

ML

ML ML AWS Database

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Learn more The Best Tools, Libraries, Frameworks and Methodologies that ML Teams Actually Use – Things We Learned from 41 ML Startups [ROUNDUP] Key use cases and/or user journeys Identify the main business problems and the data scientist’s needs that you want to solve with ML, and choose a tool that can handle them effectively.

Machine Learning

Machine Learning Machine Learning ML ML

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

AIM333 (LVL 300) | Explore text-generation FMs for top use cases with Amazon Bedrock Tuesday November 28| 2:00 PM – 3:00 PM (PST) Foundation models can be used for natural language processing tasks such as summarization, text generation, classification, open-ended Q&A, and information extraction.

AWS

AWS ML ML AI

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

They have deep end-to-end ML and natural language processing (NLP) expertise and data science skills, and massive data labeler and editor teams. Additions are required in historical data preparation, model evaluation, and monitoring. The following figure illustrates their journey.

AI

AI AI ML ML

How to choose the best AI platform

IBM Journey to AI blog

OCTOBER 20, 2023

These development platforms support collaboration between data science and engineering teams, which decreases costs by reducing redundant efforts and automating routine tasks, such as data duplication or extraction. AutoAI automates data preparation, model development, feature engineering and hyperparameter optimization.

AI

AI AI Machine Learning Machine Learning

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

I spent over a decade of my career developing large-scale data pipelines to transform both structured and unstructured data into formats that can be utilized in downstream systems. I also have experience in building large-scale distributed text search and Natural Language Processing (NLP) systems.

ETL

ETL Data Scientist Data Science Machine Learning

Must-Have Prompt Engineering Skills for 2024

ODSC - Open Data Science

JANUARY 29, 2024

These outputs, stored in vector databases like Weaviate, allow Prompt Enginers to directly access these embeddings for tasks like semantic search, similarity analysis, or clustering. NLP skills have long been essential for dealing with textual data. This enhances the context awareness and factual accuracy of LLM outputs.

Data Science

Data Science Machine Learning Machine Learning Natural Language Processing

Understanding and Building Machine Learning Models

Pickl AI

NOVEMBER 18, 2024

Key steps involve problem definition, data preparation, and algorithm selection. Data quality significantly impacts model performance. Predictive analytics uses historical data to forecast future trends, such as stock market movements or customer churn. This data can come from databases, APIs, or public datasets.

Machine Learning

Machine Learning Machine Learning Decision Trees Algorithm

Must-Have Skills for a Machine Learning Engineer

Pickl AI

NOVEMBER 28, 2024

These networks can learn from large volumes of data and are particularly effective in handling tasks such as image recognition and natural language processing. Key Deep Learning models include: Convolutional Neural Networks (CNNs) CNNs are designed to process structured grid data, such as images.

Machine Learning

Machine Learning Machine Learning ML ML

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Data Preparation: Cleaning, transforming, and preparing data for analysis and modelling. Algorithm Development: Crafting algorithms to solve complex business problems and optimise processes. Azure Cognitive Services offers ready-to-use models that seamlessly integrate into existing data workflows.

Azure

Azure Data Scientist Data Science Machine Learning

Continual Learning: Methods and Application

The MLOps Blog

FEBRUARY 22, 2024

The memory can be a database, a local file system, or just an object in RAM. The idea is to use these examples later for model training along with currently seen data to prevent catastrophic forgetting. through Cron ), and the whole pipeline (data preparation, training) is automated.

Machine Learning

Machine Learning Machine Learning ML ML

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

The benchmark used is the RoBERTa-Base, a popular model used in natural language processing (NLP) applications, that uses the transformer architecture. Historical data is normally (but not always) independent inter-day, meaning that days can be parsed independently.

AWS

AWS ML ML Clustering

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

AWS Machine Learning Blog

JANUARY 26, 2024

After your generative AI workload environment has been secured, you can layer in AI/ML-specific features, such as Amazon SageMaker Data Wrangler to identify potential bias during data preparation and Amazon SageMaker Clarify to detect bias in ML data and models.

AWS

AWS ML ML AI

Techniques for reducing costs in LLM architectures

DagsHub

JULY 15, 2024

Introduction Large Language Models (LLMs) represent the cutting-edge of artificial intelligence, driving advancements in everything from natural language processing to autonomous agentic systems. RAG has three important steps: Indexing : This stage involves preparing and organizing external data sources.

Azure

Azure AI AI Database

Predicting the Future of Data Science

Pickl AI

DECEMBER 4, 2024

Augmented Analytics Augmented analytics is revolutionising the way businesses analyse data by integrating Artificial Intelligence (AI) and Machine Learning (ML) into analytics processes. This foundational knowledge is essential for any Data Science project.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Chat with Graphic PDFs: Understand How AI PDF Summarizers Work

PyImageSearch

FEBRUARY 17, 2025

It is designed to enhance the performance of generative models by providing them with highly relevant context retrieved from a large database or knowledge base. Instead of relying on static datasets, it uses GPT-4 to generate instruction-following data across diverse scenarios.

Deep Learning

Deep Learning Deep Learning AI AI

Best AI apps that actually deliver: No hype, just impact (2025)

Dataconomy

MARCH 7, 2025

Sales teams can forecast trends, optimize lead scoring, and enhance customer engagement all while reducing manual data analysis. IBM Watson A pioneer in AI-driven analytics, IBM Watson transforms enterprise operations with natural language processing, machine learning, and predictive modeling.

AI

AI AI Machine Learning Machine Learning

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

AWS Machine Learning Blog

DECEMBER 18, 2024

This strategic decision was driven by several factors: Efficient data preparation Building a high-quality pre-training dataset is a complex task, involving assembling and preprocessing text data from various sources, including web sources and partner companies. The team opted for fine-tuning on AWS.

Clustering

Clustering AWS AI AI

Data Science Current

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Webinars

Trending Sources

Harnessing LLM chatbots: Real-life applications, building techniques and LangChain’s Finetuning

Webinars

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

A comprehensive comparison of RPA and ML

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Turn the face of your business from chaos to clarity

AI Development Lifecycle Learnings of What Changed with LLMs

Build well-architected IDP solutions with a custom lens – Part 2: Security

Artificial Intelligence Using Python: A Comprehensive Guide

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

Exploring the AI and data capabilities of watsonx

A comprehensive comparison of RPA and ML

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

MLOps Landscape in 2023: Top Tools and Platforms

Your guide to generative AI and ML at AWS re:Invent 2023

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

How to choose the best AI platform

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Must-Have Prompt Engineering Skills for 2024

Understanding and Building Machine Learning Models

Must-Have Skills for a Machine Learning Engineer

Your Complete Roadmap to Become an Azure Data Scientist

Continual Learning: Methods and Application

A review of purpose-built accelerators for financial services

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

Techniques for reducing costs in LLM architectures

Predicting the Future of Data Science

Chat with Graphic PDFs: Understand How AI PDF Summarizers Work

Best AI apps that actually deliver: No hype, just impact (2025)

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

Stay Connected