2019, Data Preparation and Python - Data Science Current

2019

Data Preparation

Python

How to Create a Vocabulary for NLP Tasks in Python

KDnuggets

NOVEMBER 7, 2019

This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.

Python

Python Data Preparation

How to Speed up Pandas by 4x with one line of code

KDnuggets

NOVEMBER 12, 2019

While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.

Python

Python Data Preparation

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

5 Advanced Features of Pandas and How to Use Them

KDnuggets

OCTOBER 25, 2019

The pandas library offers core functionality when preparing your data using Python. But, many don't go beyond the basics, so learn about these lesser-known advanced methods that will make handling your data easier and cleaner.

Python

Python Data Preparation

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

KDnuggets™ News 19:n28, Jul 31: Top 13 Skills To Become a Rockstar Data Scientist; Best Podcasts on AI, Analytics, Data Science

KDnuggets

JULY 31, 2019

Learn the essential skills needed to become a Data Science rockstar; Understand CNNs with Python + Tensorflow + Keras tutorial; Discover the best podcasts about AI, Analytics, Data Science; and find out where you can get the best Certificates in the field.

Data Science

Data Science Analytics Analytics Data Scientist

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

While this data holds valuable insights, its unstructured nature makes it difficult for AI algorithms to interpret and learn from it. According to a 2019 survey by Deloitte , only 18% of businesses reported being able to take advantage of unstructured data. This will land on a data flow page. And select Python (PySpark).

Data Preparation

Data Preparation AI AI Python

Build Pipelines with Pandas Using pdpipe

KDnuggets

DECEMBER 13, 2019

We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.

Data Preparation

Data Preparation Python

5 Great New Features in Latest Scikit-learn Release

KDnuggets

DECEMBER 10, 2019

From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.

K-nearest Neighbors

K-nearest Neighbors Data Preparation Python Machine Learning

Set Operations Applied to Pandas DataFrames

KDnuggets

NOVEMBER 7, 2019

In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.

Data Preparation

Data Preparation Python Data Science

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

AWS Machine Learning Blog

JULY 29, 2024

Launched in August 2019, Forecast predates Amazon SageMaker Canvas , a popular low-code no-code AWS tool for building, customizing, and deploying ML models, including time series forecasting models. Python script – Use a Python script to merge the datasets.

ML ML Algorithm AWS

Build a classification pipeline with Amazon Comprehend custom classification (Part I)

AWS Machine Learning Blog

SEPTEMBER 14, 2023

“Data locked away in text, audio, social media, and other unstructured sources can be a competitive advantage for firms that figure out how to use it“ Only 18% of organizations in a 2019 survey by Deloitte reported being able to take advantage of unstructured data. The majority of data, between 80% and 90%, is unstructured data.

AWS

AWS Machine Learning Machine Learning Data Scientist

ML Model Packaging [The Ultimate Guide]

The MLOps Blog

APRIL 5, 2023

The platform can assign specific roles to team members involved in the packaging process and grant them access to relevant aspects such as data preparation, training, deployment, and monitoring. How to Save and Reuse Your Machine Learning Models with Python Machine Learning Mastery. References Géron, A. Brownlee, J.

ML ML Machine Learning Machine Learning

Advanced RAG patterns on Amazon SageMaker

AWS Machine Learning Blog

MARCH 28, 2024

LangChain is an open source Python library designed to build applications with LLMs. Data preparation In this post, we use several years of Amazon’s Letters to Shareholders as a text corpus to perform QnA on. For more detailed steps to prepare the data, refer to the GitHub repo.

AWS

AWS Machine Learning Machine Learning AI

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

In terms of resulting speedups, the approximate order is programming hardware, then programming against PBA APIs, then programming in an unmanaged language such as C++, then a managed language such as Python. The CUDA platform is used through complier directives and extensions to standard languages, such as the Python cuNumeric library.

AWS

AWS ML ML Clustering

Welcome to a New Era of Building in the Cloud with Generative AI on AWS

AWS Machine Learning Blog

NOVEMBER 30, 2023

Q might answer with something like, “This application is building a basic support ticket system using Python Flask and AWS Lambda” and go on to describe each of its core capabilities, how they are implemented, and much more).

AWS

AWS AI AI ML

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

[link] David Mezzetti is the founder of NeuML, a data analytics and machine learning company that develops innovative products backed by machine learning. He previously co-founded and built Data Works into a 50+ person well-respected software services company. All of the notebooks are in Python.

ETL

ETL Data Scientist Data Science Machine Learning

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

AWS Machine Learning Blog

DECEMBER 18, 2024

Fastweb , one of Italys leading telecommunications operators, recognized the immense potential of AI technologies early on and began investing in this area in 2019. With a vision to build a large language model (LLM) trained on Italian data, Fastweb embarked on a journey to make this powerful AI capability available to third parties.

Clustering

Clustering AWS AI AI

Fine-tune Meta Llama 3.2 text generation models for generative AI inference using Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 11, 2024

We then also cover how to fine-tune the model using SageMaker Python SDK. FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. Fine-tune using the SageMaker Python SDK You can also fine-tune Meta Llama 3.2 models using the SageMaker Python SDK. You can access the Meta Llama 3.2

AI AI ML ML

How to Create a Vocabulary for NLP Tasks in Python

How to Speed up Pandas by 4x with one line of code

Webinars

Trending Sources

5 Advanced Features of Pandas and How to Use Them

Webinars

KDnuggets™ News 19:n28, Jul 31: Top 13 Skills To Become a Rockstar Data Scientist; Best Podcasts on AI, Analytics, Data Science

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

Build Pipelines with Pandas Using pdpipe

5 Great New Features in Latest Scikit-learn Release

Set Operations Applied to Pandas DataFrames

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

Build a classification pipeline with Amazon Comprehend custom classification (Part I)

ML Model Packaging [The Ultimate Guide]

Advanced RAG patterns on Amazon SageMaker

A review of purpose-built accelerators for financial services

Welcome to a New Era of Building in the Cloud with Generative AI on AWS

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

Fine-tune Meta Llama 3.2 text generation models for generative AI inference using Amazon SageMaker JumpStart

Stay Connected