AI, Data Preparation and Database - Data Science Current

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Data Science Dojo

NOVEMBER 27, 2024

The fields of Data Science, Artificial Intelligence (AI), and Large Language Models (LLMs) continue to evolve at an unprecedented pace. In this blog, we will explore the top 7 LLM, data science, and AI blogs of 2024 that have been instrumental in disseminating detailed and updated information in these dynamic fields.

Data Science

Data Science Natural Language Processing AI AI

Fine-tuning large language models (LLMs) for 2025

Dataconomy

NOVEMBER 11, 2024

RAG helps models access a specific library or database, making it suitable for tasks that require factual accuracy. Granite 3.0 : IBM launched open-source LLMs for enterprise AI 1. Fine-tuning large language models allows businesses to adapt AI to industry-specific needs 2.

Data Preparation

Data Preparation Database Data Quality Machine Learning

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. Within the data flow, add an Amazon S3 destination node.

Data Preparation

Data Preparation ML ML Data Quality

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

AWS

AWS Database ETL AI

New Study: 2018 State of Embedded Analytics Report

Why do some embedded analytics projects succeed while others fail? We surveyed 500+ application teams embedding analytics to find out which analytics features actually move the needle. Read the 6th annual State of Embedded Analytics Report to discover new best practices. Brought to you by Logi Analytics.

Analytics

Retrieval augmented generation (RAG) – Elevate your large language models experience

Data Science Dojo

DECEMBER 6, 2023

Read more about: AI hallucinations and risks associated with large language models AI hallucinations What is RAG? Step 3: Storage in vector database After extracting text chunks, we store and index them for future searches using the RAG application. Retrievers serve as interfaces that return documents based on a query.

Database

Database Data Preparation Algorithm AI

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

Retrieval Augmented Generation (RAG) has become a crucial technique for improving the accuracy and relevance of AI-generated responses. Knowledge base – You need a knowledge base created in Amazon Bedrock with ingested data and metadata.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

Data Science Dojo

MARCH 7, 2023

This includes sourcing, gathering, arranging, processing, and modeling data, as well as being able to analyze large volumes of structured or unstructured data. The goal of data preparation is to present data in the best forms for decision-making and problem-solving.

Data Scientist

Data Scientist Exploratory Data Analysis Data Science Data Visualization

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

SageMaker Unied Studio is an integrated development environment (IDE) for data, analytics, and AI. Discover your data and put it to work using familiar AWS tools to complete end-to-end development workflows, including data analysis, data processing, model training, generative AI app building, and more, in a single governed environment.

SQL

SQL AWS Data Lakes AI

RAG and Vectorization: A Comprehensive Overview

Pickl AI

DECEMBER 24, 2024

Summary: Retrieval-Augmented Generation (RAG) combines information retrieval and generative models to improve AI output. Introduction In the rapidly evolving landscape of Artificial Intelligence (AI), Retrieval-Augmented Generation (RAG) has emerged as a transformative approach that enhances the capabilities of language models.

Database

Database Machine Learning Machine Learning AI

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Step-by-step guide: Generative AI for your business

IBM Journey to AI blog

JULY 30, 2024

Generative artificial intelligence (gen AI) is transforming the business world by creating new opportunities for innovation, productivity and efficiency. This guide offers a clear roadmap for businesses to begin their gen AI journey. Most teams should include at least four types of team members.

AI

AI AI Data Scientist Data Preparation

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Last Updated on November 9, 2024 by Editorial Team Author(s): Houssem Ben Braiek Originally published on Towards AI. Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. Join thousands of data leaders on the AI newsletter. Published via Towards AI

ML

ML ML Data Preparation Data Engineering

Harnessing LLM chatbots: Real-life applications, building techniques and LangChain’s Finetuning

Data Science Dojo

AUGUST 1, 2023

While AI solutions do present potential benefits such as increased efficiency and cost reduction, it is crucial for businesses and society to thoroughly consider the ethical and social implications before widespread adoption. The resulting vector representations can then be stored in a vector database.

Database

Database AI AI Natural Language Processing

How OLAP and AI can enable better business

IBM Journey to AI blog

DECEMBER 7, 2023

Online analytical processing (OLAP) database systems and artificial intelligence (AI) complement each other and can help enhance data analysis and decision-making when used in tandem. As AI techniques continue to evolve, innovative applications in the OLAP domain are anticipated.

Data Preparation

Data Preparation Database Data Analysis Data Analysis

#54 Things are never boring with RAG! Vector Store, Vector Search, Knowledge Base, and more!

Towards AI

DECEMBER 19, 2024

Last Updated on December 20, 2024 by Editorial Team Author(s): Towards AI Editorial Team Originally published on Towards AI. You might also enjoy the practical tutorials on building an AI research agent using Pydantic AI and the step-by-step guide on fine-tuning the PaliGemma2 model for object detection. Enjoy the read!

Database

Database AI AI Data Preparation

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

Data scientists and ML engineers require capable tooling and sufficient compute for their work. To pave the way for the growth of AI, BMW Group needed to make a leap regarding scalability and elasticity while reducing operational overhead, software licensing, and hardware management.

ML

ML ML AWS AI

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

This trend toward multimodality enhances the capabilities of AI systems in tasks like cross-modal retrieval, where a query in one modality (such as text) retrieves data in another modality (such as images or design files). All businesses, across industry and size, can benefit from multimodal AI search.

AWS

AWS Computer Science Computer Science Database

Integrating AI into Asset Performance Management: It’s all about the data

IBM Journey to AI blog

MARCH 29, 2024

Imagine a future where artificial intelligence (AI) seamlessly collaborates with existing supply chain solutions, redefining how organizations manage their assets. If you’re currently using traditional AI, advanced analytics, and intelligent automation, aren’t you already getting deep insights into asset performance?

AI

AI AI Artificial Intelligence Artificial Intelligence

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

AWS Machine Learning Blog

FEBRUARY 27, 2025

In our previous blog posts, we explored various techniques such as fine-tuning large language models (LLMs), prompt engineering, and Retrieval Augmented Generation (RAG) using Amazon Bedrock to generate impressions from the findings section in radiology reports using generative AI. Part 1 focused on model fine-tuning.

AWS

AWS AI AI ML

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

This post presents a solution that uses a generative artificial intelligence (AI) to standardize air quality data from low-cost sensors in Africa, specifically addressing the air quality data integration problem of low-cost sensors. This allows for data to be aggregated for further manufacturer-agnostic analysis.

AWS

AWS AI AI Python

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

With this new capability, you can securely ask questions on single documents, without the overhead of setting up a vector database or ingesting data, making it effortless for businesses to use their enterprise data. You only need to provide a relevant data file as input and choose your FM to get started.

AWS

AWS Database Python AI

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Generative artificial intelligence ( generative AI ) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. Clean data is important for good model performance.

Data Preparation

Data Preparation AI AI Python

GenASL: Generative AI-powered American Sign Language avatars

AWS Machine Learning Blog

AUGUST 26, 2024

GenASL is a generative artificial intelligence (AI) -powered solution that translates speech or text into expressive ASL avatar animations, bridging the gap between spoken and written language and sign language. Users can input audio, video, or text into GenASL, which generates an ASL avatar video that interprets the provided data.

AWS

AWS AI AI ML

Data Fabric and Address Verification Interface

IBM Data Science in Practice

NOVEMBER 28, 2022

Ensuring high-quality data A crucial aspect of downstream consumption is data quality. Studies have shown that 80% of time is spent on data preparation and cleansing, leaving only 20% of time for data analytics. This leaves more time for data analysis. Let’s use address data as an example.

Data Pipeline

Data Pipeline Data Quality Data Preparation Data Governance

Recapping the Cloud Amplifier and Snowflake Demo

Towards AI

JANUARY 28, 2024

Last Updated on January 29, 2024 by Editorial Team Author(s): Cassidy Hilton Originally published on Towards AI. Recapping the Cloud Amplifier and Snowflake Demo The combined power of Snowflake and Domo’s Cloud Amplifier is the best-kept secret in data management right now — and we’re reaching new heights every day.

ETL

ETL Python Database Data Preparation

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Now all you need is some guidance on generative AI and machine learning (ML) sessions to attend at this twelfth edition of re:Invent. And although generative AI has appeared in previous events, this year we’re taking it to the next level. And although our track focuses on generative AI, many other tracks have related sessions.

AWS

AWS ML ML AI

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Solution overview With SageMaker Studio JupyterLab notebook’s SQL integration, you can now connect to popular data sources like Snowflake, Athena, Amazon Redshift, and Amazon DataZone. For example, you can visually explore data sources like databases, tables, and schemas directly from your JupyterLab ecosystem.

SQL

SQL AWS Database Data Scientist

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

The sample dataset Upload the dataset to Amazon S3 and crawl the data to create an AWS Glue database and tables. For instructions to catalog the data, refer to Populating the AWS Glue Data Catalog. Choose Data Wrangler in the navigation pane. On the Import and prepare dropdown menu, choose Tabular.

Machine Learning

Machine Learning Machine Learning Data Governance ML

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

For readers who work in ML/AI, it’s well understood that machine learning models prefer feature vectors of numerical information. However, the majority of enterprise data remains unleveraged from an analytics and machine learning perspective, and much of the most valuable information remains in relational database schemas such as OLAP.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

Nowadays, the majority of our customers is excited about large language models (LLMs) and thinking how generative AI could transform their business. In this post, we discuss how to operationalize generative AI applications using MLOps principles leading to foundation model operations (FMOps).

AI

AI AI ML ML

AI Development Lifecycle Learnings of What Changed with LLMs

ODSC - Open Data Science

FEBRUARY 5, 2025

This problem often stems from inadequate user value, underwhelming performance, and an absence of robust best practices for building and deploying LLM tools as part of the AI development lifecycle. Real-world applications often expose gaps that proper data preparation could have preempted. Evaluation: Tools likeNotion.

Data Preparation

Data Preparation AI AI Data Scientist

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics. You can import data from multiple data sources, such as Amazon Simple Storage Service (Amazon S3), Amazon Athena , Amazon Redshift , Amazon EMR , and Snowflake.

AWS

AWS Data Preparation Azure Data Scientist

CodeQueries: Answering Semantic Queries Over Code

Towards AI

FEBRUARY 15, 2024

Author(s): Surya Prakash Sahu Originally published on Towards AI. Few static analysis tools store the relational representation of the code base and evaluate a query (written in a specific query language) on the code base, similar to how a database query is evaluated by a database engine. So, what did we find out?!

Database

Database Python ML ML

How to choose the best AI platform

IBM Journey to AI blog

OCTOBER 20, 2023

AI platform tools enable knowledge workers to analyze data, formulate predictions and execute tasks with greater speed and precision than they can manually. AI plays a pivotal role as a catalyst in the new era of technological advancement. PwC calculates that “AI could contribute up to USD 15.7 trillion in value.

AI

AI AI Machine Learning Machine Learning

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

AWS Machine Learning Blog

JULY 11, 2024

RAG provides additional knowledge to the LLM through its input prompt space and its architecture typically consists of the following components: Indexing : Prepare a corpus of unstructured text, parse and chunk it, and then, embed each chunk and store it in a vector database.

AWS

AWS ML ML Machine Learning

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

However, companies are discovering that performing full fine tuning for these models with their data isnt cost effective. To reduce costs while continuing to use the power of AI , many companies have shifted to fine tuning LLMs on their domain-specific data using Parameter-Efficient Fine Tuning (PEFT).

AWS

AWS Clustering Deep Learning Deep Learning

Beyond the silos: Unifying statistical power with SPSS Statistics, R and Python

IBM Journey to AI blog

OCTOBER 23, 2024

With data visualization capabilities, advanced statistical analysis methods and modeling techniques, IBM SPSS Statistics enables users to pursue a comprehensive analytical journey from data preparation and management to analysis and reporting.

Python

Python Data Analysis Data Analysis Data Science

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise.

AI

AI AI Machine Learning Machine Learning

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

In this post, we showcase how to build an end-to-end generative AI application for enterprise search with Retrieval Augmented Generation (RAG) by using Haystack pipelines and the Falcon-40b-instruct model from Amazon SageMaker JumpStart and Amazon OpenSearch Service. It also hosts foundation models solely developed by Amazon, such as AlexaTM.

AWS

AWS Database AI AI

A comprehensive comparison of RPA and ML

Dataconomy

MARCH 27, 2023

RPA tools can be programmed to interact with various systems, such as web applications, databases, and desktop applications. RPA is often considered a form of artificial intelligence, but it is not a complete AI solution. AI, on the other hand, can learn from data and adapt to new situations without human intervention.

ML

ML ML Machine Learning Machine Learning

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. Your job is to answer questions about a database.

SQL

SQL AWS ML ML

Reliable AI Model Tuning : Leveraging HNSW Vector with Firebase Genkit

Becoming Human

MAY 30, 2024

Instant AI Model Tuning: Leveraging HNSW Vector with Firebase Genkit for Retrieval-Augmented Generation The rapid advancements in Generative AI have transformed how we interact with technology, enabling more intelligent and context-aware systems. with fine-tuning, you can add more knowledge and context for the AI Model.

AI

AI AI Database Artificial Intelligence

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Fine-tuning large language models (LLMs) for 2025

Webinars

Trending Sources

Accelerate data preparation for ML in Amazon SageMaker Canvas

Webinars

Tackling AI’s data challenges with IBM databases on AWS

New Study: 2018 State of Embedded Analytics Report

Retrieval augmented generation (RAG) – Elevate your large language models experience

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

RAG and Vectorization: A Comprehensive Overview

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Step-by-step guide: Generative AI for your business

Data4ML Preparation Guidelines (Beyond The Basics)

Harnessing LLM chatbots: Real-life applications, building techniques and LangChain’s Finetuning

How OLAP and AI can enable better business

#54 Things are never boring with RAG! Vector Store, Vector Search, Knowledge Base, and more!

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Integrating AI into Asset Performance Management: It’s all about the data

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

Improving air quality with generative AI

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

GenASL: Generative AI-powered American Sign Language avatars

Data Fabric and Address Verification Interface

Recapping the Cloud Amplifier and Snowflake Demo

Your guide to generative AI and ML at AWS re:Invent 2023

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

GraphReduce: Using Graphs for Feature Engineering Abstractions

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AI Development Lifecycle Learnings of What Changed with LLMs

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

CodeQueries: Answering Semantic Queries Over Code

How to choose the best AI platform

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Beyond the silos: Unifying statistical power with SPSS Statistics, R and Python

Exploring the AI and data capabilities of watsonx

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

A comprehensive comparison of RPA and ML

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Reliable AI Model Tuning : Leveraging HNSW Vector with Firebase Genkit

Stay Connected