Data Preparation, Natural Language Processing and Python

5 Top Large Language Models & Generative AI Books

Towards AI

AUGUST 6, 2024

NLP with Transformers introduces readers to transformer architecture for natural language processing, offering practical guidance on using Hugging Face for tasks like text classification.

Natural Language Processing

Natural Language Processing AI AI AWS

Artificial Intelligence Using Python: A Comprehensive Guide

Pickl AI

JULY 12, 2024

Summary: This guide explores Artificial Intelligence Using Python, from essential libraries like NumPy and Pandas to advanced techniques in machine learning and deep learning. Python’s simplicity, versatility, and extensive library support make it the go-to language for AI development.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Python Natural Language Processing

6 AI tools revolutionizing data analysis: Unleashing the best in business

Data Science Dojo

JULY 17, 2023

TensorFlow First on the AI tool list, we have TensorFlow which is an open-source software library for numerical computation using data flow graphs. It is used for machine learning, natural language processing, and computer vision tasks. It is similar to TensorFlow, but it is designed to be more Pythonic.

Data Analysis

Data Analysis Data Analysis Tableau Machine Learning

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

We cover two approaches: using the Amazon SageMaker Studio UI for a no-code solution, and using the SageMaker Python SDK. FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. Fine-tune using the SageMaker Python SDK You can also fine-tune Meta Llama 3.2 Vision models.

ML

ML ML Python AWS

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

As AI adoption continues to accelerate, developing efficient mechanisms for digesting and learning from unstructured data becomes even more critical in the future. This could involve better preprocessing tools, semi-supervised learning techniques, and advances in natural language processing. And select Python (PySpark).

Data Preparation

Data Preparation AI AI Python

Top 10 Machine Learning (ML) Tools for Developers in 2023

Towards AI

JUNE 27, 2023

For instance, today’s machine learning tools are pushing the boundaries of natural language processing, allowing AI to comprehend complex patterns and languages. PyTorch PyTorch, a Python-based machine learning library, stands out among its peers in the machine learning tools ecosystem.

Machine Learning

Machine Learning Machine Learning ML ML

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

OCTOBER 19, 2023

Solution overview This solution uses Amazon Comprehend and SageMaker Data Wrangler to automatically redact PII data from a sample dataset. Amazon Comprehend is a natural language processing (NLP) service that uses ML to uncover insights and relationships in unstructured data, with no managing infrastructure or ML experience required.

Machine Learning

Machine Learning Machine Learning ML ML

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

AWS Machine Learning Blog

APRIL 17, 2023

In other words, companies need to move from a model-centric approach to a data-centric approach.” – Andrew Ng A data-centric AI approach involves building AI systems with quality data involving data preparation and feature engineering. Custom transforms can be written as separate steps within Data Wrangler.

AWS

AWS ML ML Python

Build an email spam detector using Amazon SageMaker

AWS Machine Learning Blog

JULY 18, 2023

Word2vec is useful for various natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, and machine translation. If you are prompted to choose a Kernel, choose the Python 3 (Data Science 3.0) Import the required Python library and set the roles and the S3 buckets.

Supervised Learning

Supervised Learning Algorithm Natural Language Processing AWS

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

AWS Machine Learning Blog

JULY 11, 2024

Fine tuning embedding models using SageMaker SageMaker is a fully managed machine learning service that simplifies the entire machine learning workflow, from data preparation and model training to deployment and monitoring. Python script that serves as the entry point. client('s3') # Get the region name session = boto3.Session()

AWS

AWS ML ML Machine Learning

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

AWS Machine Learning Blog

JULY 29, 2024

With the addition of forecasting, you can now access end-to-end ML capabilities for a broad set of model types—including regression, multi-class classification, computer vision (CV), natural language processing (NLP), and generative artificial intelligence (AI)—within the unified user-friendly platform of SageMaker Canvas.

ML

ML ML Algorithm AWS

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

AWS Machine Learning Blog

JULY 24, 2024

Large language models (LLMs) have achieved remarkable success in various natural language processing (NLP) tasks, but they may not always generalize well to specific domains or tasks. This is where MLflow can help streamline the ML lifecycle, from data preparation to model deployment.

ML

ML ML AWS Machine Learning

Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines

AWS Machine Learning Blog

AUGUST 16, 2023

It simplifies the development and maintenance of ML models by providing a centralized platform to orchestrate tasks such as data preparation, model training, tuning and validation. You can run the following command from your notebook or terminal to install or upgrade the SageMaker Python SDK version to 2.162.0

ML

ML ML Data Scientist Python

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 12, 2023

By implementing a modern natural language processing (NLP) model, the response process has been shaped much more efficiently, and waiting time for clients has been reduced tremendously. In the following sections, we break down the data preparation, model experimentation, and model deployment steps in more detail.

Data Science

Data Science Data Scientist AWS ML

Neural Network in Machine Learning

Pickl AI

AUGUST 14, 2024

They consist of interconnected nodes that learn complex patterns in data. Different types of neural networks, such as feedforward, convolutional, and recurrent networks, are designed for specific tasks like image recognition, Natural Language Processing, and sequence modelling.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Algorithm

From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 23, 2023

Given this mission, Talent.com and AWS joined forces to create a job recommendation engine using state-of-the-art natural language processing (NLP) and deep learning model training techniques with Amazon SageMaker to provide an unrivaled experience for job seekers. The client registers smddp as a backend for PyTorch.

AWS

AWS Deep Learning Deep Learning Machine Learning

Leveraging KNIME and Tableau: Connecting to Tableau with KNIME

phData

JUNE 26, 2023

While both these tools are powerful on their own, their combined strength offers a comprehensive solution for data analytics. In this blog post, we will show you how to leverage KNIME’s Tableau Integration Extension and discuss the benefits of using KNIME for data preparation before visualization in Tableau.

Tableau

Tableau Data Preparation Machine Learning Machine Learning

Top 10 Deep Learning Platforms in 2024

DagsHub

JULY 25, 2024

A good understanding of Python and machine learning concepts is recommended to fully leverage TensorFlow's capabilities. Libraries and Extensions: Includes torchvision for image processing, touchaudio for audio processing, and torchtext for NLP. It is well-suited for both research and production environments.

Deep Learning

Deep Learning Deep Learning Machine Learning Machine Learning

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

We create an automated model build pipeline that includes steps for data preparation, model training, model evaluation, and registration of the trained model in the SageMaker Model Registry. Romina’s areas of interest are natural language processing, large language models, and MLOps.

AWS

AWS ML ML Data Preparation

Large Language Models: A Complete Guide

Heartbeat

MAY 29, 2023

LLMs are one of the most exciting advancements in natural language processing (NLP). We will explore how to better understand the data that these models are trained on, and how to evaluate and optimize them for real-world use. LLMs rely on vast amounts of text data to learn patterns and generate coherent text.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Preparation

Training large language models on Amazon SageMaker: Best practices

AWS Machine Learning Blog

MARCH 6, 2023

Data preparation LLM developers train their models on large datasets of naturally occurring text. Popular examples of such data sources include Common Crawl and The Pile. Naturally occurring text may contain biases, inaccuracies, grammatical errors, and syntax variations.

AWS

AWS Clustering ML ML

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. In this post, we build a Docker image that includes the Python 3.11

AWS

AWS Clustering Big Data Big Data

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

Haystack FileConverters and PreProcessor allow you to clean and prepare your raw files to be in a shape and format that your natural language processing (NLP) pipeline and language model of choice can deal with. An indexing pipeline may also include a step to create embeddings for your documents.

AWS

AWS Database AI AI

Collaborate Smarter, Not Harder: Comet’s Integrations for Effective ML Project Management

Heartbeat

JUNE 5, 2023

PyTorch For tasks like computer vision and natural language processing, Using the Torch library as its foundation, PyTorch is a free and open-source machine learning framework that comes in handy. Anomalib Anomalib is a Python library that helps users to detect anomalies in time-series data.

ML

ML ML Machine Learning Machine Learning

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

This allows users to accomplish different Natural Language Processing (NLP) functional tasks and take advantage of IBM vetted pre-trained open-source foundation models. Encoder-decoder and decoder-only large language models are available in the Prompt Lab today. To bridge the tuning gap, watsonx.ai

AI

AI AI Machine Learning Machine Learning

Master the Power of Machine Learning with PyCaret: A Step-by-Step Guide

Mlearning.ai

JUNE 28, 2023

The expeditious and efficient construction, deployment, and scalability of machine learning models assume utmost importance in unearthing the untapped potential of data-driven decision-making. This extensive repertoire includes classification, regression, clustering, natural language processing, and anomaly detection.

Machine Learning

Machine Learning Machine Learning Data Preparation Data Science

Build a classification pipeline with Amazon Comprehend custom classification (Part I)

AWS Machine Learning Blog

SEPTEMBER 14, 2023

It can be difficult to find insights from this data, particularly if efforts are needed to classify, tag, or label it. Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning to uncover valuable insights and connections in text. CSV) or semi-structured format (ex.

AWS

AWS Machine Learning Machine Learning Data Scientist

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Jupyter notebooks can differentiate between SQL and Python code using the %%sm_sql magic command, which must be placed at the top of any cell that contains SQL code. This command signals to JupyterLab that the following instructions are SQL commands rather than Python code. In his free time, he enjoys playing chess and traveling.

SQL

SQL AWS Database Data Scientist

Amazon Comprehend document classifier adds layout support for higher accuracy

AWS Machine Learning Blog

APRIL 19, 2023

At AWS re:Invent 2022, Amazon Comprehend , a natural language processing (NLP) service that uses machine learning (ML) to discover insights from text, launched support for native document types. This new feature gave you the ability to classify documents in native formats (PDF, TIFF, JPG, PNG, DOCX) using Amazon Comprehend.

AWS

AWS Machine Learning Machine Learning ML

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

Amazon Kendra is a highly accurate and intelligent search service that enables users to search unstructured and structured data using natural language processing (NLP) and advanced search algorithms. For more information, refer to Granting Data Catalog permissions using the named resource method.

ML

ML ML AWS Database

Must-Have Skills for a Machine Learning Engineer

Pickl AI

NOVEMBER 28, 2024

Key programming languages include Python and R, while mathematical concepts like linear algebra and calculus are crucial for model optimisation. Understanding Machine Learning algorithms and effective data handling are also critical for success in the field. The global Machine Learning market was valued at USD 35.80

Machine Learning

Machine Learning Machine Learning ML ML

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if your team is proficient in Python and R, you may want an MLOps tool that supports open data formats like Parquet, JSON, CSV, etc., For example, if your team works on recommender systems or natural language processing applications, you may want an MLOps tool that has built-in algorithms or templates for these use cases.

Machine Learning

Machine Learning Machine Learning ML ML

How Data Science and AI is Changing the Future

Pickl AI

NOVEMBER 5, 2024

AI encompasses various subfields, including Machine Learning (ML), Natural Language Processing (NLP), robotics, and computer vision. Together, Data Science and AI enable organisations to analyse vast amounts of data efficiently and make informed decisions based on predictive analytics.

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

MLOps and the evolution of data science

IBM Journey to AI blog

AUGUST 11, 2023

Because the machine learning lifecycle has many complex components that reach across multiple teams, it requires close-knit collaboration to ensure that hand-offs occur efficiently, from data preparation and model training to model deployment and monitoring.

Data Science

Data Science Machine Learning Machine Learning ML

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Data Preparation: Cleaning, transforming, and preparing data for analysis and modelling. Algorithm Development: Crafting algorithms to solve complex business problems and optimise processes. Essential Technical Skills Technical proficiency is at the heart of an Azure Data Scientist’s role.

Azure

Azure Data Scientist Data Science Machine Learning

Must-Have Prompt Engineering Skills for 2024

ODSC - Open Data Science

JANUARY 29, 2024

Leaving aside the more established skills here’s a visual look at the newer skills Natural Language Processing (NLP), Tokenization, Transformers, Representation Learning and Knowledge Graphs NLP (Natural Language Processing) The NLP engineer can be considered a precursor to the Promt Engineer.

Data Science

Data Science Machine Learning Machine Learning Natural Language Processing

Predicting the Future of Data Science

Pickl AI

DECEMBER 4, 2024

Augmented Analytics Augmented analytics is revolutionising the way businesses analyse data by integrating Artificial Intelligence (AI) and Machine Learning (ML) into analytics processes. This foundational knowledge is essential for any Data Science project. What Skills Are Most Important for Future Data Scientists?

Data Science

Data Science Data Scientist Machine Learning Machine Learning

ML Model Packaging [The Ultimate Guide]

The MLOps Blog

APRIL 5, 2023

For example, Modularizing a natural language processing (NLP) model for sentiment analysis can include separating the word embedding layer and the RNN layer into separate modules, which can be packaged and reused in other NLP models to manage code and reduce duplication and computational resources required to run the model.

ML

ML ML Machine Learning Machine Learning

Advanced RAG patterns on Amazon SageMaker

AWS Machine Learning Blog

MARCH 28, 2024

LangChain is an open source Python library designed to build applications with LLMs. Data preparation In this post, we use several years of Amazon’s Letters to Shareholders as a text corpus to perform QnA on. For more detailed steps to prepare the data, refer to the GitHub repo.

AWS

AWS Machine Learning Machine Learning AI

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

In terms of resulting speedups, the approximate order is programming hardware, then programming against PBA APIs, then programming in an unmanaged language such as C++, then a managed language such as Python. Analysis of publications containing accelerated compute workloads by Zeta-Alpha shows a breakdown of 91.5%

AWS

AWS ML ML Clustering

Understanding and Building Machine Learning Models

Pickl AI

NOVEMBER 18, 2024

Key steps involve problem definition, data preparation, and algorithm selection. Data quality significantly impacts model performance. Predictive analytics uses historical data to forecast future trends, such as stock market movements or customer churn. Types include supervised, unsupervised, and reinforcement learning.

Machine Learning

Machine Learning Machine Learning Algorithm Decision Trees

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

I spent over a decade of my career developing large-scale data pipelines to transform both structured and unstructured data into formats that can be utilized in downstream systems. I also have experience in building large-scale distributed text search and Natural Language Processing (NLP) systems.

ETL

ETL Data Scientist Data Science Machine Learning

An introduction to preparing your own dataset for LLM training

AWS Machine Learning Blog

DECEMBER 19, 2024

Data preprocessing Text data can come from diverse sources and exist in a wide variety of formats such as PDF, HTML, JSON, and Microsoft Office documents such as Word, Excel, and PowerPoint. Its rare to already have access to text data that can be readily processed and fed into an LLM for training.

AWS

AWS Machine Learning Machine Learning Natural Language Processing

Best AI apps that actually deliver: No hype, just impact (2025)

Dataconomy

MARCH 7, 2025

Sales teams can forecast trends, optimize lead scoring, and enhance customer engagement all while reducing manual data analysis. IBM Watson A pioneer in AI-driven analytics, IBM Watson transforms enterprise operations with natural language processing, machine learning, and predictive modeling.

AI

AI AI Machine Learning Machine Learning

5 Top Large Language Models & Generative AI Books

Artificial Intelligence Using Python: A Comprehensive Guide

Webinars

Trending Sources

6 AI tools revolutionizing data analysis: Unleashing the best in business

Webinars

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

Top 10 Machine Learning (ML) Tools for Developers in 2023

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

Build an email spam detector using Amazon SageMaker

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

Neural Network in Machine Learning

From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker

Leveraging KNIME and Tableau: Connecting to Tableau with KNIME

Top 10 Deep Learning Platforms in 2024

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

Large Language Models: A Complete Guide

Training large language models on Amazon SageMaker: Best practices

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

Collaborate Smarter, Not Harder: Comet’s Integrations for Effective ML Project Management

Exploring the AI and data capabilities of watsonx

Master the Power of Machine Learning with PyCaret: A Step-by-Step Guide

Build a classification pipeline with Amazon Comprehend custom classification (Part I)

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Amazon Comprehend document classifier adds layout support for higher accuracy

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

Must-Have Skills for a Machine Learning Engineer

MLOps Landscape in 2023: Top Tools and Platforms

How Data Science and AI is Changing the Future

MLOps and the evolution of data science

Your Complete Roadmap to Become an Azure Data Scientist

Must-Have Prompt Engineering Skills for 2024

Predicting the Future of Data Science

ML Model Packaging [The Ultimate Guide]

Advanced RAG patterns on Amazon SageMaker

A review of purpose-built accelerators for financial services

Understanding and Building Machine Learning Models

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

An introduction to preparing your own dataset for LLM training

Best AI apps that actually deliver: No hype, just impact (2025)

Stay Connected