Artificial Intelligence, Data Preparation and Database

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Data Science Dojo

NOVEMBER 27, 2024

The fields of Data Science, Artificial Intelligence (AI), and Large Language Models (LLMs) continue to evolve at an unprecedented pace. In this blog, we will explore the top 7 LLM, data science, and AI blogs of 2024 that have been instrumental in disseminating detailed and updated information in these dynamic fields.

Data Science

Data Science Natural Language Processing AI AI

Fine-tuning large language models (LLMs) for 2025

Dataconomy

NOVEMBER 11, 2024

RAG helps models access a specific library or database, making it suitable for tasks that require factual accuracy. What is Retrieval-Augmented Generation (RAG) and when to use it Retrieval-Augmented Generation (RAG) is a method that integrates the capabilities of a language model with a specific library or database.

Data Preparation

Data Preparation Database Data Quality Machine Learning

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

Data Science Dojo

MARCH 7, 2023

This includes sourcing, gathering, arranging, processing, and modeling data, as well as being able to analyze large volumes of structured or unstructured data. The goal of data preparation is to present data in the best forms for decision-making and problem-solving.

Data Scientist

Data Scientist Exploratory Data Analysis Data Science Data Visualization

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

AWS

AWS Database ETL AI

New Study: 2018 State of Embedded Analytics Report

Why do some embedded analytics projects succeed while others fail? We surveyed 500+ application teams embedding analytics to find out which analytics features actually move the needle. Read the 6th annual State of Embedded Analytics Report to discover new best practices. Brought to you by Logi Analytics.

Analytics

Implementing Approximate Nearest Neighbor Search with KD-Trees

PyImageSearch

DECEMBER 23, 2024

Or think about a real-time facial recognition system that must match a face in a crowd to a database of thousands. These scenarios demand efficient algorithms to process and retrieve relevant data swiftly. Imagine a database with billions of samples ( ) (e.g., So, how can we perform efficient searches in such big databases?

K-nearest Neighbors

K-nearest Neighbors Algorithm Deep Learning Deep Learning

RAG and Vectorization: A Comprehensive Overview

Pickl AI

DECEMBER 24, 2024

Introduction In the rapidly evolving landscape of Artificial Intelligence (AI), Retrieval-Augmented Generation (RAG) has emerged as a transformative approach that enhances the capabilities of language models. Creating a Vector Database Once the data is vectorized, the next step is to store these vectors in a vector database.

Database

Database Machine Learning Machine Learning AI

Artificial Intelligence Using Python: A Comprehensive Guide

Pickl AI

JULY 12, 2024

Summary: This guide explores Artificial Intelligence Using Python, from essential libraries like NumPy and Pandas to advanced techniques in machine learning and deep learning. It equips you to build and deploy intelligent systems confidently and efficiently.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Python Natural Language Processing

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

With this new capability, you can securely ask questions on single documents, without the overhead of setting up a vector database or ingesting data, making it effortless for businesses to use their enterprise data. You only need to provide a relevant data file as input and choose your FM to get started.

AWS

AWS Database Python AI

How OLAP and AI can enable better business

IBM Journey to AI blog

DECEMBER 7, 2023

Online analytical processing (OLAP) database systems and artificial intelligence (AI) complement each other and can help enhance data analysis and decision-making when used in tandem. Defining OLAP today OLAP database systems have significantly evolved since their inception in the early 1990s.

Data Preparation

Data Preparation Database Data Analysis Data Analysis

A comprehensive comparison of RPA and ML

Dataconomy

MARCH 27, 2023

Robotic process automation vs machine learning is a common debate in the world of automation and artificial intelligence. RPA tools can be programmed to interact with various systems, such as web applications, databases, and desktop applications. It works on structured data and follows a predefined set of rules to perform tasks.

ML

ML ML Machine Learning Machine Learning

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

The sample dataset Upload the dataset to Amazon S3 and crawl the data to create an AWS Glue database and tables. For instructions to catalog the data, refer to Populating the AWS Glue Data Catalog. Choose Data Wrangler in the navigation pane. On the Import and prepare dropdown menu, choose Tabular.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Step-by-step guide: Generative AI for your business

IBM Journey to AI blog

JULY 30, 2024

Generative artificial intelligence (gen AI) is transforming the business world by creating new opportunities for innovation, productivity and efficiency. Start by identifying all potential data sources across your organization, including structured databases.

AI

AI AI Data Scientist Data Preparation

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

Multimodal Retrieval Augmented Generation (MM-RAG) is emerging as a powerful evolution of traditional RAG systems, addressing limitations and expanding capabilities across diverse data types. Traditionally, RAG systems were text-centric, retrieving information from large text databases to provide relevant context for language models.

AWS

AWS Computer Science Computer Science Database

Integrating AI into Asset Performance Management: It’s all about the data

IBM Journey to AI blog

MARCH 29, 2024

Imagine a future where artificial intelligence (AI) seamlessly collaborates with existing supply chain solutions, redefining how organizations manage their assets. If you’re currently using traditional AI, advanced analytics, and intelligent automation, aren’t you already getting deep insights into asset performance?

AI

AI AI Artificial Intelligence Artificial Intelligence

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Generative artificial intelligence ( generative AI ) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. Access to Amazon OpenSearch as a vector database.

Data Preparation

Data Preparation AI AI Python

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. The fundamental objective is to build a manufacturer-agnostic database, leveraging generative AI’s ability to standardize sensor outputs, synchronize data, and facilitate precise corrections.

AWS

AWS AI AI Python

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. Your job is to answer questions about a database.

SQL

SQL AWS ML ML

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics. You can import data from multiple data sources, such as Amazon Simple Storage Service (Amazon S3), Amazon Athena , Amazon Redshift , Amazon EMR , and Snowflake.

AWS

AWS Data Preparation Azure ML

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Solution overview With SageMaker Studio JupyterLab notebook’s SQL integration, you can now connect to popular data sources like Snowflake, Athena, Amazon Redshift, and Amazon DataZone. For example, you can visually explore data sources like databases, tables, and schemas directly from your JupyterLab ecosystem.

SQL

SQL AWS Database Data Scientist

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

However, the majority of enterprise data remains unleveraged from an analytics and machine learning perspective, and much of the most valuable information remains in relational database schemas such as OLAP. Data preparation happens at the entity-level first so errors and anomalies don’t make their way into the aggregated dataset.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

AWS Machine Learning Blog

JULY 11, 2024

RAG provides additional knowledge to the LLM through its input prompt space and its architecture typically consists of the following components: Indexing : Prepare a corpus of unstructured text, parse and chunk it, and then, embed each chunk and store it in a vector database.

AWS

AWS ML ML Machine Learning

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Becoming Human

MAY 15, 2023

One is a scripting language such as Python, and the other is a Query language like SQL (Structured Query Language) for SQL Databases. Python is a High-level, Procedural, and object-oriented language; it is also a vast language itself, and covering the whole of Python is one the worst mistakes we can make in the data science journey.

Data Science

Data Science Machine Learning Machine Learning Database

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

This means that individuals can ask companies to erase their personal data from their systems and from the systems of any third parties with whom the data was shared. Data preparation Before creating a knowledge base using Knowledge Bases for Amazon Bedrock, it’s essential to prepare the data to augment the FM in a RAG implementation.

AWS

AWS Machine Learning Machine Learning Database

Introduction to Power BI Datamarts

ODSC - Open Data Science

JUNE 12, 2023

They all agree that a Datamart is a subject-oriented subset of a data warehouse focusing on a particular business unit, department, subject area, or business functionality. The Datamart’s data is usually stored in databases containing a moving frame required for data analysis, not the full history of data.

Power BI

Power BI Data Warehouse ETL Data Preparation

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. We are happy to announce that SageMaker Data Wrangler now supports using Lake Formation with Amazon EMR to provide this fine-grained data access restriction.

AWS

AWS Data Lakes Clustering Data Preparation

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

JuMa is tightly integrated with a range of BMW Central IT services, including identity and access management, roles and rights management, BMW Cloud Data Hub (BMW’s data lake on AWS) and on-premises databases. Furthermore, the notebooks can be integrated into the corporate Git repositories to collaborate using version control.

ML

ML ML AWS AI

GenASL: Generative AI-powered American Sign Language avatars

AWS Machine Learning Blog

AUGUST 26, 2024

GenASL is a generative artificial intelligence (AI) -powered solution that translates speech or text into expressive ASL avatar animations, bridging the gap between spoken and written language and sign language. If the gloss is not available in the GenASL database, the logic falls back to fingerspelling each alphabet letter.

AWS

AWS AI AI ML

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

The following screenshot shows the Data Catalog schema. Access permission to the AWS Glue databases and tables are managed by AWS Lake Formation. You can find the AWS Glue database name on the Outputs tab of the CloudFormation stack. We have completed the data preparation step. Choose Create data source.

ML

ML ML AWS Database

AI Development Lifecycle Learnings of What Changed with LLMs

ODSC - Open Data Science

FEBRUARY 5, 2025

Common Pitfalls in LLM Development Neglecting Data Preparation: Poorly prepared data leads to subpar evaluation and iterations, reducing generalizability and stakeholder confidence. Real-world applications often expose gaps that proper data preparation could have preempted. Evaluation: Tools likeNotion.

Data Preparation

Data Preparation AI AI Data Scientist

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

Only involving necessary people to do case validation or augmentation tasks reduces the risk of document mishandling and human error when dealing with sensitive data. Sensitive data in these data stores needs to be secured. You can either secure the output PII in your data store or redact the PII in your IDP output.

AWS

AWS ML ML Machine Learning

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale. The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring.

AWS

AWS ML ML Machine Learning

A comprehensive comparison of RPA and ML

Dataconomy

MARCH 27, 2023

Robotic process automation vs machine learning is a common debate in the world of automation and artificial intelligence. RPA tools can be programmed to interact with various systems, such as web applications, databases, and desktop applications. It works on structured data and follows a predefined set of rules to perform tasks.

ML

ML ML Machine Learning Machine Learning

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. Here, we predict whether an order is a high_value_order or a low_value_order based on the orderpriority as given from the TPC-H data.

ML

ML ML AWS Machine Learning

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 22, 2023

Amazon SageMaker Canvas is a low-code/no-code ML service that enables business analysts to perform data preparation and transformation, build ML models, and deploy these models into a governed workflow. He has good experience in databases, AI/ML, data analytics, compute, and storage. csv dataset into SageMaker Canvas.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

The final retrieval augmentation workflow covers the following high-level steps: The user query is used for a retriever component, which does a vector search, to retrieve the most relevant context from our database. A vector database provides efficient vector similarity search by providing specialized indexes like k-NN indexes.

AWS

AWS Database AI AI

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

Harnessing the power of big data has become increasingly critical for businesses looking to gain a competitive edge. From deriving insights to powering generative artificial intelligence (AI) -driven applications, the ability to efficiently process and analyze large datasets is a vital capability.

AWS

AWS Clustering Big Data Big Data

Reliable AI Model Tuning : Leveraging HNSW Vector with Firebase Genkit

Becoming Human

MAY 30, 2024

File-Based Management: HNSW allows the management of vector indexes as files, providing ease of use and portability, whether stored as blob or stored in a database. This is particularly useful for applications that require dynamic content generation based on current data, such as chatbots and recommendation systems.

AI

AI AI Database Artificial Intelligence

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

Fine tuning Now that your SageMaker HyperPod cluster is deployed, you can start preparing to execute your fine tuning job. Data preparation The foundation of successful language model fine tuning lies in properly structured and prepared training data.

AWS

AWS Clustering Deep Learning Deep Learning

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

AWS Machine Learning Blog

MAY 23, 2023

A recent PwC CEO survey unveiled that 84% of Canadian CEOs agree that artificial intelligence (AI) will significantly change their business within the next 5 years, making this technology more critical than ever. As such, an ML model is the product of an MLOps pipeline, and a pipeline is a workflow for creating one or more ML models.

Machine Learning

Machine Learning Machine Learning AWS ML

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

He highlights innovations in data, infrastructure, and artificial intelligence and machine learning that are helping AWS customers achieve their goals faster, mine untapped potential, and create a better future. Embeddings can be stored in a database and are used to enable streamlined and more accurate searches.

AWS

AWS ML ML AI

The Top AI Slides from ODSC West 2024

ODSC - Open Data Science

NOVEMBER 19, 2024

A Gentle Introduction to Vector Databases and Their Implementation with Balaji Dhamodharan Slides Balaji Dhamodharan’s AI slides offered a deep dive into vector databases, a foundational technology for LLMs. Steven Pousty showcased how to transform unstructured data into a vector-based query system.

Deep Learning

Deep Learning Deep Learning Data Science AI

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Visual modeling: Delivers easy-to-use workflows for data scientists to build data preparation and predictive machine learning pipelines that include text analytics, visualizations and a variety of modeling methods. foundation models to help users discover, augment, and enrich data with natural language.

AI

AI AI Machine Learning Machine Learning

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Fine-tuning large language models (LLMs) for 2025

Webinars

Trending Sources

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

Webinars

Tackling AI’s data challenges with IBM databases on AWS

New Study: 2018 State of Embedded Analytics Report

Implementing Approximate Nearest Neighbor Search with KD-Trees

RAG and Vectorization: A Comprehensive Overview

Artificial Intelligence Using Python: A Comprehensive Guide

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

How OLAP and AI can enable better business

A comprehensive comparison of RPA and ML

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Step-by-step guide: Generative AI for your business

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Integrating AI into Asset Performance Management: It’s all about the data

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

Improving air quality with generative AI

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

GraphReduce: Using Graphs for Feature Engineering Abstractions

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Introduction to Power BI Datamarts

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

GenASL: Generative AI-powered American Sign Language avatars

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AI Development Lifecycle Learnings of What Changed with LLMs

Build well-architected IDP solutions with a custom lens – Part 2: Security

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

A comprehensive comparison of RPA and ML

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Reliable AI Model Tuning : Leveraging HNSW Vector with Firebase Genkit

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

Your guide to generative AI and ML at AWS re:Invent 2023

The Top AI Slides from ODSC West 2024

Exploring the AI and data capabilities of watsonx

Stay Connected