AI, Data Preparation and Database - Data Science Current

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Data Science Dojo

NOVEMBER 27, 2024

The fields of Data Science, Artificial Intelligence (AI), and Large Language Models (LLMs) continue to evolve at an unprecedented pace. In this blog, we will explore the top 7 LLM, data science, and AI blogs of 2024 that have been instrumental in disseminating detailed and updated information in these dynamic fields.

Data Science

Data Science Natural Language Processing AI AI

Fine-tuning large language models (LLMs) for 2025

Dataconomy

NOVEMBER 11, 2024

RAG helps models access a specific library or database, making it suitable for tasks that require factual accuracy. Granite 3.0 : IBM launched open-source LLMs for enterprise AI 1. Fine-tuning large language models allows businesses to adapt AI to industry-specific needs 2.

Data Preparation

Data Preparation Database Data Quality Machine Learning

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. Within the data flow, add an Amazon S3 destination node.

Data Preparation

Data Preparation ML ML Data Quality

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

AWS

AWS Database ETL AI

New Study: 2018 State of Embedded Analytics Report

Why do some embedded analytics projects succeed while others fail? We surveyed 500+ application teams embedding analytics to find out which analytics features actually move the needle. Read the 6th annual State of Embedded Analytics Report to discover new best practices. Brought to you by Logi Analytics.

Analytics

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

Retrieval Augmented Generation (RAG) has become a crucial technique for improving the accuracy and relevance of AI-generated responses. Knowledge base – You need a knowledge base created in Amazon Bedrock with ingested data and metadata.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Retrieval augmented generation (RAG) – Elevate your large language models experience

Data Science Dojo

DECEMBER 6, 2023

Read more about: AI hallucinations and risks associated with large language models AI hallucinations What is RAG? Step 3: Storage in vector database After extracting text chunks, we store and index them for future searches using the RAG application. Retrievers serve as interfaces that return documents based on a query.

Database

Database Data Preparation Algorithm AI

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

Data Science Dojo

MARCH 7, 2023

This includes sourcing, gathering, arranging, processing, and modeling data, as well as being able to analyze large volumes of structured or unstructured data. The goal of data preparation is to present data in the best forms for decision-making and problem-solving.

Data Scientist

Data Scientist Exploratory Data Analysis Data Science Data Visualization

Responsible AI in action: How Data Reply red teaming supports generative AI safety on AWS

AWS Machine Learning Blog

APRIL 29, 2025

Generative AI is rapidly reshaping industries worldwide, empowering businesses to deliver exceptional customer experiences, streamline processes, and push innovation at an unprecedented scale. Specifically, we discuss Data Replys red teaming solution, a comprehensive blueprint to enhance AI safety and responsible AI practices.

AWS

AWS AI AI ML

Amazon Bedrock Model Distillation: Boost function calling accuracy while reducing cost and latency

AWS Machine Learning Blog

APRIL 30, 2025

Amazon Bedrock Model Distillation is generally available, and it addresses the fundamental challenge many organizations face when deploying generative AI : how to maintain high performance while reducing costs and latency. We show the latency and output speed comparison for different models in the following figure. Notably, the Llama 3.1

AWS

AWS AI AI Computer Science

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

It must integrate seamlessly across data technologies in the stack to execute various workflows—all while maintaining a strong focus on performance and governance. Two key technologies that have become foundational for this type of architecture are the Snowflake AI Data Cloud and Dataiku. Let’s say your company makes cars.

Machine Learning

Machine Learning Machine Learning Data Science ML

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

SageMaker Unied Studio is an integrated development environment (IDE) for data, analytics, and AI. Discover your data and put it to work using familiar AWS tools to complete end-to-end development workflows, including data analysis, data processing, model training, generative AI app building, and more, in a single governed environment.

SQL

SQL AWS Data Lakes AI

RAG and Vectorization: A Comprehensive Overview

Pickl AI

DECEMBER 24, 2024

Summary: Retrieval-Augmented Generation (RAG) combines information retrieval and generative models to improve AI output. Introduction In the rapidly evolving landscape of Artificial Intelligence (AI), Retrieval-Augmented Generation (RAG) has emerged as a transformative approach that enhances the capabilities of language models.

Database

Database Machine Learning Machine Learning AI

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Step-by-step guide: Generative AI for your business

IBM Journey to AI blog

JULY 30, 2024

Generative artificial intelligence (gen AI) is transforming the business world by creating new opportunities for innovation, productivity and efficiency. This guide offers a clear roadmap for businesses to begin their gen AI journey. Most teams should include at least four types of team members.

AI

AI AI Data Scientist Data Preparation

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Last Updated on November 9, 2024 by Editorial Team Author(s): Houssem Ben Braiek Originally published on Towards AI. Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. Join thousands of data leaders on the AI newsletter. Published via Towards AI

ML

ML ML Data Preparation Data Engineer

Harnessing LLM chatbots: Real-life applications, building techniques and LangChain’s Finetuning

Data Science Dojo

AUGUST 1, 2023

While AI solutions do present potential benefits such as increased efficiency and cost reduction, it is crucial for businesses and society to thoroughly consider the ethical and social implications before widespread adoption. The resulting vector representations can then be stored in a vector database.

Database

Database AI AI Natural Language Processing

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

This trend toward multimodality enhances the capabilities of AI systems in tasks like cross-modal retrieval, where a query in one modality (such as text) retrieves data in another modality (such as images or design files). All businesses, across industry and size, can benefit from multimodal AI search.

AWS

AWS Computer Science Computer Science Database

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Generative artificial intelligence ( generative AI ) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. Clean data is important for good model performance.

Data Preparation

Data Preparation AI AI Python

How OLAP and AI can enable better business

IBM Journey to AI blog

DECEMBER 7, 2023

Online analytical processing (OLAP) database systems and artificial intelligence (AI) complement each other and can help enhance data analysis and decision-making when used in tandem. As AI techniques continue to evolve, innovative applications in the OLAP domain are anticipated.

Data Preparation

Data Preparation Database Data Analysis Data Analysis

#54 Things are never boring with RAG! Vector Store, Vector Search, Knowledge Base, and more!

Towards AI

DECEMBER 19, 2024

Last Updated on December 20, 2024 by Editorial Team Author(s): Towards AI Editorial Team Originally published on Towards AI. You might also enjoy the practical tutorials on building an AI research agent using Pydantic AI and the step-by-step guide on fine-tuning the PaliGemma2 model for object detection. Enjoy the read!

Database

Database AI AI Data Preparation

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

Data scientists and ML engineers require capable tooling and sufficient compute for their work. To pave the way for the growth of AI, BMW Group needed to make a leap regarding scalability and elasticity while reducing operational overhead, software licensing, and hardware management.

ML

ML ML AWS AI

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

This post presents a solution that uses a generative artificial intelligence (AI) to standardize air quality data from low-cost sensors in Africa, specifically addressing the air quality data integration problem of low-cost sensors. This allows for data to be aggregated for further manufacturer-agnostic analysis.

AWS

AWS AI Python AI

Integrating AI into Asset Performance Management: It’s all about the data

IBM Journey to AI blog

MARCH 29, 2024

Imagine a future where artificial intelligence (AI) seamlessly collaborates with existing supply chain solutions, redefining how organizations manage their assets. If you’re currently using traditional AI, advanced analytics, and intelligent automation, aren’t you already getting deep insights into asset performance?

AI

AI AI Artificial Intelligence Artificial Intelligence

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

AWS Machine Learning Blog

FEBRUARY 27, 2025

In our previous blog posts, we explored various techniques such as fine-tuning large language models (LLMs), prompt engineering, and Retrieval Augmented Generation (RAG) using Amazon Bedrock to generate impressions from the findings section in radiology reports using generative AI. Part 1 focused on model fine-tuning.

AWS

AWS AI AI ML

GenASL: Generative AI-powered American Sign Language avatars

AWS Machine Learning Blog

AUGUST 26, 2024

GenASL is a generative artificial intelligence (AI) -powered solution that translates speech or text into expressive ASL avatar animations, bridging the gap between spoken and written language and sign language. Users can input audio, video, or text into GenASL, which generates an ASL avatar video that interprets the provided data.

AWS

AWS AI AI ML

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

With this new capability, you can securely ask questions on single documents, without the overhead of setting up a vector database or ingesting data, making it effortless for businesses to use their enterprise data. You only need to provide a relevant data file as input and choose your FM to get started.

AWS

AWS Database Python AI

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Now all you need is some guidance on generative AI and machine learning (ML) sessions to attend at this twelfth edition of re:Invent. And although generative AI has appeared in previous events, this year we’re taking it to the next level. And although our track focuses on generative AI, many other tracks have related sessions.

AWS

AWS ML ML AI

Data Fabric and Address Verification Interface

IBM Data Science in Practice

NOVEMBER 28, 2022

Ensuring high-quality data A crucial aspect of downstream consumption is data quality. Studies have shown that 80% of time is spent on data preparation and cleansing, leaving only 20% of time for data analytics. This leaves more time for data analysis. Let’s use address data as an example.

Data Pipeline

Data Pipeline Data Quality Data Preparation Data Governance

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

The sample dataset Upload the dataset to Amazon S3 and crawl the data to create an AWS Glue database and tables. For instructions to catalog the data, refer to Populating the AWS Glue Data Catalog. Choose Data Wrangler in the navigation pane. On the Import and prepare dropdown menu, choose Tabular.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Solution overview With SageMaker Studio JupyterLab notebook’s SQL integration, you can now connect to popular data sources like Snowflake, Athena, Amazon Redshift, and Amazon DataZone. For example, you can visually explore data sources like databases, tables, and schemas directly from your JupyterLab ecosystem.

SQL

SQL AWS Database Data Scientist

Recapping the Cloud Amplifier and Snowflake Demo

Towards AI

JANUARY 28, 2024

Last Updated on January 29, 2024 by Editorial Team Author(s): Cassidy Hilton Originally published on Towards AI. Recapping the Cloud Amplifier and Snowflake Demo The combined power of Snowflake and Domo’s Cloud Amplifier is the best-kept secret in data management right now — and we’re reaching new heights every day.

ETL

ETL Python Database Data Preparation

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

How to Learn Machine Learning

APRIL 26, 2025

Each component in this ecosystem is very important in the data-driven decision-making process for an organization. Data Sources and Collection Everything in data science begins with data. Data can be generated from databases, sensors, social media platforms, APIs, logs, and web scraping.

Data Science

Data Science Data Analyst Data Scientist Machine Learning

Welcome to a New Era of Building in the Cloud with Generative AI on AWS

AWS Machine Learning Blog

NOVEMBER 30, 2023

We believe generative AI has the potential over time to transform virtually every customer experience we know. Innovative startups like Perplexity AI are going all in on AWS for generative AI. And at the top layer, we’ve been investing in game-changing applications in key areas like generative AI-based coding.

AWS

AWS AI AI ML

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

Nowadays, the majority of our customers is excited about large language models (LLMs) and thinking how generative AI could transform their business. In this post, we discuss how to operationalize generative AI applications using MLOps principles leading to foundation model operations (FMOps).

AI

AI AI ML ML

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

In this post, we showcase how to build an end-to-end generative AI application for enterprise search with Retrieval Augmented Generation (RAG) by using Haystack pipelines and the Falcon-40b-instruct model from Amazon SageMaker JumpStart and Amazon OpenSearch Service. It also hosts foundation models solely developed by Amazon, such as AlexaTM.

AWS

AWS Database AI AI

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

For readers who work in ML/AI, it’s well understood that machine learning models prefer feature vectors of numerical information. However, the majority of enterprise data remains unleveraged from an analytics and machine learning perspective, and much of the most valuable information remains in relational database schemas such as OLAP.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

Ace Your Interview: Top 10 Data Visualization Questions and Answers (Beginner & Advanced)

Pickl AI

APRIL 21, 2025

What they’re testing: Basic data preparation awareness as it relates to visualization. Sample Answer: “First, I’d try to understand why the data is missing is it random, or is there a pattern? The approach depends on the context and the amount of missing data. How would you approach this?

Data Visualization

Data Visualization Power BI Data Analysis Data Analysis

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

However, companies are discovering that performing full fine tuning for these models with their data isnt cost effective. To reduce costs while continuing to use the power of AI , many companies have shifted to fine tuning LLMs on their domain-specific data using Parameter-Efficient Fine Tuning (PEFT).

AWS

AWS Clustering Deep Learning Deep Learning

AI Development Lifecycle Learnings of What Changed with LLMs

ODSC - Open Data Science

FEBRUARY 5, 2025

This problem often stems from inadequate user value, underwhelming performance, and an absence of robust best practices for building and deploying LLM tools as part of the AI development lifecycle. Real-world applications often expose gaps that proper data preparation could have preempted. Evaluation: Tools likeNotion.

Data Preparation

Data Preparation AI AI Data Scientist

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

AWS Machine Learning Blog

JULY 11, 2024

RAG provides additional knowledge to the LLM through its input prompt space and its architecture typically consists of the following components: Indexing : Prepare a corpus of unstructured text, parse and chunk it, and then, embed each chunk and store it in a vector database.

AWS

AWS ML ML Machine Learning

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics. You can import data from multiple data sources, such as Amazon Simple Storage Service (Amazon S3), Amazon Athena , Amazon Redshift , Amazon EMR , and Snowflake.

AWS

AWS Data Preparation Azure Data Scientist

How to choose the best AI platform

IBM Journey to AI blog

OCTOBER 20, 2023

AI platform tools enable knowledge workers to analyze data, formulate predictions and execute tasks with greater speed and precision than they can manually. AI plays a pivotal role as a catalyst in the new era of technological advancement. PwC calculates that “AI could contribute up to USD 15.7 trillion in value.

AI

AI AI Machine Learning Machine Learning

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Fine-tuning large language models (LLMs) for 2025

Webinars

Trending Sources

Accelerate data preparation for ML in Amazon SageMaker Canvas

Webinars

Tackling AI’s data challenges with IBM databases on AWS

New Study: 2018 State of Embedded Analytics Report

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Retrieval augmented generation (RAG) – Elevate your large language models experience

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

Responsible AI in action: How Data Reply red teaming supports generative AI safety on AWS

Amazon Bedrock Model Distillation: Boost function calling accuracy while reducing cost and latency

How Dataiku and Snowflake Strengthen the Modern Data Stack

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

RAG and Vectorization: A Comprehensive Overview

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Step-by-step guide: Generative AI for your business

Data4ML Preparation Guidelines (Beyond The Basics)

Harnessing LLM chatbots: Real-life applications, building techniques and LangChain’s Finetuning

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

How OLAP and AI can enable better business

#54 Things are never boring with RAG! Vector Store, Vector Search, Knowledge Base, and more!

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Improving air quality with generative AI

Integrating AI into Asset Performance Management: It’s all about the data

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

GenASL: Generative AI-powered American Sign Language avatars

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

Your guide to generative AI and ML at AWS re:Invent 2023

Data Fabric and Address Verification Interface

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Recapping the Cloud Amplifier and Snowflake Demo

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

Welcome to a New Era of Building in the Cloud with Generative AI on AWS

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

GraphReduce: Using Graphs for Feature Engineering Abstractions

Ace Your Interview: Top 10 Data Visualization Questions and Answers (Beginner & Advanced)

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AI Development Lifecycle Learnings of What Changed with LLMs

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

How to choose the best AI platform

Stay Connected