AI, Data Preparation and Natural Language Processing

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Data Science Dojo

NOVEMBER 27, 2024

The fields of Data Science, Artificial Intelligence (AI), and Large Language Models (LLMs) continue to evolve at an unprecedented pace. In this blog, we will explore the top 7 LLM, data science, and AI blogs of 2024 that have been instrumental in disseminating detailed and updated information in these dynamic fields.

Data Science

Data Science Natural Language Processing AI AI

Augmented analytics

Dataconomy

MARCH 17, 2025

Augmented analytics is revolutionizing how organizations interact with their data. By harnessing the power of machine learning (ML) and natural language processing (NLP), businesses can streamline their data analysis processes and make more informed decisions.

Augmented Analytics

Augmented Analytics Analytics Analytics Natural Language Processing

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

Retrieval Augmented Generation (RAG) has become a crucial technique for improving the accuracy and relevance of AI-generated responses. The effectiveness of RAG heavily depends on the quality of context provided to the large language model (LLM), which is typically retrieved from vector stores based on user queries.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

5 Top Large Language Models & Generative AI Books

Towards AI

AUGUST 6, 2024

Author(s): Youssef Hosni Originally published on Towards AI. Master LLMs & Generative AI Through These Five Books This article reviews five key books that explore the rapidly evolving fields of large language models (LLMs) and generative AI, providing essential insights into these transformative technologies.

Natural Language Processing

Natural Language Processing AI AI AWS

Fine-Tuning LLMs: A Review of Technologies, Research, Best Practices, Challenges

Hacker News

OCTOBER 21, 2024

It outlines the historical evolution of LLMs from traditional Natural Language Processing (NLP) models to their pivotal role in AI. The report introduces a structured seven-stage pipeline for fine-tuning LLMs, spanning data preparation, model initialization, hyperparameter tuning, and model deployment.

Natural Language Processing

Natural Language Processing Data Preparation AI AI

LLMOps demystified: Why it’s crucial and best practices for 2023

Data Science Dojo

AUGUST 28, 2023

Large Language Model Ops also known as LLMOps isn’t just a buzzword; it’s the cornerstone of unleashing LLM potential. From data management to model fine-tuning, LLMOps ensures efficiency, scalability, and risk mitigation. As LLMs redefine AI capabilities, mastering LLMOps becomes your compass in this dynamic landscape.

Exploratory Data Analysis

Exploratory Data Analysis Data Preparation Machine Learning Machine Learning

6 AI tools revolutionizing data analysis: Unleashing the best in business

Data Science Dojo

JULY 17, 2023

In recent years, there has been a growing interest in the use of artificial intelligence (AI) for data analysis. AI tools can automate many of the tasks involved in data analysis, and they can also help businesses to discover new insights from their data.

Data Analysis

Data Analysis Data Analysis Tableau Machine Learning

The Ultimate Guide to Data Preparation for Machine Learning

DagsHub

FEBRUARY 29, 2024

Data, is therefore, essential to the quality and performance of machine learning models. This makes data preparation for machine learning all the more critical, so that the models generate reliable and accurate predictions and drive business value for the organization. Why do you need Data Preparation for Machine Learning?

Data Preparation

Data Preparation Machine Learning Machine Learning Data Governance

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

Fine-tuning is a powerful approach in natural language processing (NLP) and generative AI , allowing businesses to tailor pre-trained large language models (LLMs) for specific tasks. This process involves updating the model’s weights to improve its performance on targeted applications.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

The AI Process

Towards AI

AUGUST 16, 2023

Last Updated on August 17, 2023 by Editorial Team Author(s): Jeff Holmes MS MSCS Originally published on Towards AI. Jason Leung on Unsplash AI is still considered a relatively new field, so there are really no guides or standards such as SWEBOK. 85% or more of AI projects fail [1][2]. 85% or more of AI projects fail [1][2].

AI

AI AI Machine Learning Machine Learning

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

This trend toward multimodality enhances the capabilities of AI systems in tasks like cross-modal retrieval, where a query in one modality (such as text) retrieves data in another modality (such as images or design files). All businesses, across industry and size, can benefit from multimodal AI search.

AWS

AWS Computer Science Computer Science Database

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Generative artificial intelligence ( generative AI ) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. Clean data is important for good model performance.

Data Preparation

Data Preparation AI AI Python

Improve prediction quality in custom classification models with Amazon Comprehend

AWS Machine Learning Blog

OCTOBER 5, 2023

Artificial intelligence (AI) and machine learning (ML) have seen widespread adoption across enterprise and government organizations. Processing unstructured data has become easier with the advancements in natural language processing (NLP) and user-friendly AI/ML services like Amazon Textract , Amazon Transcribe , and Amazon Comprehend.

Data Preparation

Data Preparation ML ML AWS

A comprehensive comparison of RPA and ML

Dataconomy

MARCH 27, 2023

RPA is often considered a form of artificial intelligence, but it is not a complete AI solution. AI, on the other hand, can learn from data and adapt to new situations without human intervention. RPA can be easily integrated with legacy systems, and the implementation process is relatively straightforward.

ML

ML ML Machine Learning Machine Learning

Top 10 Machine Learning (ML) Tools for Developers in 2023

Towards AI

JUNE 27, 2023

In the rapidly expanding field of artificial intelligence (AI), machine learning tools play an instrumental role. Already a multi-billion-dollar industry, AI is having a profound impact on every aspect of life, business, and society. These tools are becoming increasingly sophisticated, enabling the development of advanced applications.

Machine Learning

Machine Learning Machine Learning ML ML

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 6, 2024

We address this skew with generative AI models (Falcon-7B and Falcon-40B), which were prompted to generate event samples based on five examples from the training set to increase the semantic diversity and increase the sample size of labeled adverse events. Outside of work, when not discussing AI in radiology, she likes to run and hike.

AWS

AWS ML ML Data Preparation

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

In the rapidly evolving landscape of AI, generative models have emerged as a transformative technology, empowering users to explore new frontiers of creativity and problem-solving. 11B and 90B are the ﬁrst Llama models to support vision tasks, with a new model architecture that integrates image encoder representations into the language model.

ML

ML ML Python AWS

AI Development Lifecycle Learnings of What Changed with LLMs

ODSC - Open Data Science

FEBRUARY 5, 2025

This problem often stems from inadequate user value, underwhelming performance, and an absence of robust best practices for building and deploying LLM tools as part of the AI development lifecycle. LLMs, while accelerating some processes, introduce complexities that require new tools and methodologies. Evaluation: Tools likeNotion.

Data Preparation

Data Preparation AI AI Data Scientist

Understanding Vision Transformers (ViTs)

Towards AI

JANUARY 14, 2025

Last Updated on January 15, 2025 by Editorial Team Author(s): Yash Thube Originally published on Towards AI. Transformers have revolutionized natural language processing (NLP), powering models like GPT and BERT. Understanding Vision Transformers (ViTs) And what I learned while implementing them!

Natural Language Processing

Natural Language Processing Data Preparation AI AI

Harnessing LLM chatbots: Real-life applications, building techniques and LangChain’s Finetuning

Data Science Dojo

AUGUST 1, 2023

The next generation of Language Model Systems (LLMs) and LLM chatbots are expected to offer improved accuracy, expanded language support, enhanced computational efficiency, and seamless integration with emerging technologies. The answer will be in natural language and will be relevant to the question.

Database

Database AI AI Natural Language Processing

How to choose the best AI platform

IBM Journey to AI blog

OCTOBER 20, 2023

AI platform tools enable knowledge workers to analyze data, formulate predictions and execute tasks with greater speed and precision than they can manually. AI plays a pivotal role as a catalyst in the new era of technological advancement. PwC calculates that “AI could contribute up to USD 15.7 trillion in value.

AI

AI AI Machine Learning Machine Learning

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise.

AI

AI AI Machine Learning Machine Learning

Accelerating scope 3 emissions accounting: LLMs to the rescue

IBM Journey to AI blog

MARCH 27, 2024

This article explores an innovative way to streamline the estimation of Scope 3 GHG emissions leveraging AI and Large Language Models (LLMs) to help categorize financial transaction data to align with spend-based emissions factors. Why are Scope 3 emissions difficult to calculate? This is where LLMs come into play.

Natural Language Processing

Natural Language Processing Data Preparation Deep Learning Deep Learning

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

AWS Machine Learning Blog

APRIL 17, 2023

“Instead of focusing on the code, companies should focus on developing systematic engineering practices for improving data in ways that are reliable, efficient, and systematic. This can be a tedious task involving data collection, discovery, profiling, cleansing, structuring, transforming, enriching, validating, and securely storing the data.

AWS

AWS ML ML Python

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

AIOPs refers to the application of artificial intelligence (AI) and machine learning (ML) techniques to enhance and automate various aspects of IT operations (ITOps). However, they differ fundamentally in their purpose and level of specialization in AI and ML environments.

Big Data

Big Data Big Data ML ML

AI Models as a Service (AIMaaS): A Detailed Overview

Pickl AI

OCTOBER 3, 2024

Summary: Artificial Intelligence Models as a Service (AIMaaS) provides cloud-based access to scalable, customizable AI models. AIMaaS democratises AI, making advanced technologies accessible to organisations of all sizes across various industries. Predictive Analytics : Models that forecast future events based on historical data.

Machine Learning

Machine Learning Machine Learning AI AI

Artificial Intelligence Using Python: A Comprehensive Guide

Pickl AI

JULY 12, 2024

Introduction Artificial Intelligence (AI) transforms industries by enabling machines to mimic human intelligence. Python’s simplicity, versatility, and extensive library support make it the go-to language for AI development. It includes Python and a vast collection of pre-installed libraries and tools for AI development.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Python Natural Language Processing

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

AWS Machine Learning Blog

JULY 11, 2024

Fine tuning embedding models using SageMaker SageMaker is a fully managed machine learning service that simplifies the entire machine learning workflow, from data preparation and model training to deployment and monitoring. For more information about fine tuning Sentence Transformer, see Sentence Transformer training overview.

AWS

AWS ML ML Machine Learning

Turn the face of your business from chaos to clarity

Dataconomy

JULY 28, 2023

Data preprocessing is a fundamental and essential step in the field of sentiment analysis, a prominent branch of natural language processing (NLP). These tools offer a wide range of functionalities to handle complex data preparation tasks efficiently.

Power BI

Power BI Data Preparation Exploratory Data Analysis Machine Learning

From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 23, 2023

Given this mission, Talent.com and AWS joined forces to create a job recommendation engine using state-of-the-art natural language processing (NLP) and deep learning model training techniques with Amazon SageMaker to provide an unrivaled experience for job seekers.

AWS

AWS Deep Learning Deep Learning Machine Learning

Art and Science of Image Annotation: The Tech Behind AI and Machine Learning

Becoming Human

MAY 12, 2023

The use of Artificial Intelligence (AI) has become increasingly prevalent in the modern world, seeing its potential to drastically improve human life in every way possible. AI technology is constantly evolving, allowing machines to become increasingly advanced and capable of carrying out more intricate functions.

Machine Learning

Machine Learning Machine Learning AI AI

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

In this post, we showcase how to build an end-to-end generative AI application for enterprise search with Retrieval Augmented Generation (RAG) by using Haystack pipelines and the Falcon-40b-instruct model from Amazon SageMaker JumpStart and Amazon OpenSearch Service. It also hosts foundation models solely developed by Amazon, such as AlexaTM.

AWS

AWS Database AI AI

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

An intelligent document processing (IDP) project usually combines optical character recognition (OCR) and natural language processing (NLP) to read and understand a document and extract specific entities or phrases. Pay attention to protection of data at rest and data produced in IDP outputs.

AWS

AWS ML ML Machine Learning

Build an email spam detector using Amazon SageMaker

AWS Machine Learning Blog

JULY 18, 2023

Word2vec is useful for various natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, and machine translation. You now run the data preparation step in the notebook. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.

Supervised Learning

Supervised Learning Algorithm Natural Language Processing AWS

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

AWS Machine Learning Blog

JULY 24, 2024

Large language models (LLMs) have achieved remarkable success in various natural language processing (NLP) tasks, but they may not always generalize well to specific domains or tasks. This is where MLflow can help streamline the ML lifecycle, from data preparation to model deployment.

ML

ML ML AWS Machine Learning

Neural Network in Machine Learning

Pickl AI

AUGUST 14, 2024

They consist of interconnected nodes that learn complex patterns in data. Different types of neural networks, such as feedforward, convolutional, and recurrent networks, are designed for specific tasks like image recognition, Natural Language Processing, and sequence modelling.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Algorithm

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

OCTOBER 19, 2023

Redaction of PII data is often a key first step to unlock the larger and richer data streams needed to use or fine-tune generative AI models , without worrying about whether their enterprise data (or that of their customers) will be compromised. She is passionate about innovation and inclusion.

Machine Learning

Machine Learning Machine Learning ML ML

A single particle of data can do wonders

Dataconomy

SEPTEMBER 13, 2023

As a result, diffusion models have become a popular tool in many fields of artificial intelligence, including computer vision, natural language processing, and audio synthesis. Diffusion models have numerous applications in computer vision, natural language processing, and audio synthesis.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Machine Learning Machine Learning

How Data Science and AI is Changing the Future

Pickl AI

NOVEMBER 5, 2024

Summary: Data Science and AI are transforming the future by enabling smarter decision-making, automating processes, and uncovering valuable insights from vast datasets. Bureau of Labor Statistics predicts that employment for Data Scientists will grow by 36% from 2021 to 2031 , making it one of the fastest-growing professions.

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

AWS Machine Learning Blog

FEBRUARY 22, 2023

In this post, we discuss how Boomi used the bring-your-own-container (BYOC) approach to develop a new AI/ML enabled solution for their customers to tackle the “blank canvas” problem. First and foremost, Studio makes it easier to share notebook assets across a large team of data scientists like the one at Boomi.

AWS

AWS ML ML Data Science

A comprehensive comparison of RPA and ML

Dataconomy

MARCH 27, 2023

RPA is often considered a form of artificial intelligence, but it is not a complete AI solution. AI, on the other hand, can learn from data and adapt to new situations without human intervention. RPA can be easily integrated with legacy systems, and the implementation process is relatively straightforward.

ML

ML ML Machine Learning Machine Learning

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

To learn more about SageMaker Studio JupyterLab Spaces, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools. Data source access credentials – This SageMaker Studio notebook feature requires user name and password access to data sources such as Snowflake and Amazon Redshift.

SQL

SQL AWS Database Data Scientist

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

AWS Machine Learning Blog

MAY 31, 2024

Genomic language models Genomic language models represent a new approach in the field of genomics, offering a way to understand the language of DNA. Data preparation and loading into sequence store The initial step in our machine learning workflow focuses on preparing the data.

AWS

AWS ML ML Machine Learning

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

Harnessing the power of big data has become increasingly critical for businesses looking to gain a competitive edge. From deriving insights to powering generative artificial intelligence (AI) -driven applications, the ability to efficiently process and analyze large datasets is a vital capability.

AWS

AWS Clustering Big Data Big Data

Top 7 Data Science, Large Language Model, and AI Blogs of 2024

Augmented analytics

Webinars

Trending Sources

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Webinars

5 Top Large Language Models & Generative AI Books

Fine-Tuning LLMs: A Review of Technologies, Research, Best Practices, Challenges

LLMOps demystified: Why it’s crucial and best practices for 2023

6 AI tools revolutionizing data analysis: Unleashing the best in business

The Ultimate Guide to Data Preparation for Machine Learning

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

The AI Process

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

Improve prediction quality in custom classification models with Amazon Comprehend

A comprehensive comparison of RPA and ML

Top 10 Machine Learning (ML) Tools for Developers in 2023

Deploy large language models for a healthtech use case on Amazon SageMaker

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

AI Development Lifecycle Learnings of What Changed with LLMs

Understanding Vision Transformers (ViTs)

Harnessing LLM chatbots: Real-life applications, building techniques and LangChain’s Finetuning

How to choose the best AI platform

Exploring the AI and data capabilities of watsonx

Accelerating scope 3 emissions accounting: LLMs to the rescue

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

AI Models as a Service (AIMaaS): A Detailed Overview

Artificial Intelligence Using Python: A Comprehensive Guide

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Turn the face of your business from chaos to clarity

From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker

Art and Science of Image Annotation: The Tech Behind AI and Machine Learning

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

Build well-architected IDP solutions with a custom lens – Part 2: Security

Build an email spam detector using Amazon SageMaker

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

Neural Network in Machine Learning

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

A single particle of data can do wonders

How Data Science and AI is Changing the Future

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

A comprehensive comparison of RPA and ML

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Stay Connected