Data Preparation, Download and ML - Data Science Current

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. You can download the dataset loans-part-1.csv

Data Preparation

Data Preparation ML ML Data Quality

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

AWS Machine Learning Blog

AUGUST 20, 2024

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. Choose the method that suits your needs and use case.

Data Preparation

Data Preparation ML ML AWS

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. We add this data to Snowflake as a new table.

ML

ML ML AWS Python

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others. You can now view the predictions and download them as CSV.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

AWS Machine Learning Blog

FEBRUARY 1, 2024

It offers industry-leading scalability, data availability, security, and performance. SageMaker Canvas now supports comprehensive data preparation capabilities powered by SageMaker Data Wrangler. We also demonstrate using the chat for data prep feature in SageMaker Canvas to analyze the data and visualize your findings.

Data Preparation

Data Preparation AWS AI AI

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

Starting today, you can interactively prepare large datasets, create end-to-end data flows, and invoke automated machine learning (AutoML) experiments on petabytes of data—a substantial leap from the previous 5 GB limit. Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data.

ML

ML ML Data Preparation AWS

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

In these scenarios, as you start to embrace generative AI, large language models (LLMs) and machine learning (ML) technologies as a core part of your business, you may be looking for options to take advantage of AWS AI and ML capabilities outside of AWS in a multicloud environment.

ML

ML ML Azure AWS

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. For Prepare template , select Template is ready. Enter a stack name, such as Demo-Redshift.

ML

ML ML AWS Data Warehouse

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

source env_vars After setting your environment variables, download the lifecycle scripts required for bootstrapping the compute nodes on your SageMaker HyperPod cluster and define its configuration settings before uploading the scripts to your S3 bucket. script to download the model and tokenizer. architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/

AWS

AWS Clustering Deep Learning Deep Learning

Bring your own ML model into Amazon SageMaker Canvas and generate accurate predictions

AWS Machine Learning Blog

MAY 2, 2023

Machine learning (ML) helps organizations generate revenue, reduce costs, mitigate risk, drive efficiencies, and improve quality by optimizing core business functions across multiple business units such as marketing, manufacturing, operations, sales, finance, and customer service. On the Studio console, choose AutoML in the navigation pane.

ML

ML ML Data Scientist AWS

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

The ability to quickly build and deploy machine learning (ML) models is becoming increasingly important in today’s data-driven world. However, building ML models requires significant time, effort, and specialized expertise. This is where the AWS suite of low-code and no-code ML services becomes an essential tool.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

Prepare image data with Amazon SageMaker Data Wrangler

Flipboard

MAY 1, 2023

Today, we are happy to announce that with Amazon SageMaker Data Wrangler , you can perform image data preparation for machine learning (ML) using little to no code. Data Wrangler reduces the time it takes to aggregate and prepare data for ML from weeks to minutes. Choose Import.

Data Preparation

Data Preparation AWS ML ML

Top 10 Machine Learning (ML) Tools for Developers in 2023

Towards AI

JUNE 27, 2023

Let’s get started with the best machine learning (ML) developer tools: TensorFlow TensorFlow, developed by the Google Brain team, is one of the most utilized machine learning tools in the industry. Scikit Learn Scikit Learn is a comprehensive machine learning tool designed for data mining and large-scale unstructured data analysis.

Machine Learning

Machine Learning Machine Learning ML ML

Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio

AWS Machine Learning Blog

APRIL 3, 2024

Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. SageMaker Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity.

Machine Learning

Machine Learning Machine Learning ML ML

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

Amazon SageMaker is a comprehensive, fully managed machine learning (ML) platform that revolutionizes the entire ML workflow. It offers an unparalleled suite of tools that cater to every stage of the ML lifecycle, from data preparation to model deployment and monitoring. jpg") or doc.endswith(".png"))

AWS

AWS Computer Science Computer Science Database

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

jpg", "prompt": "Which part of Virginia is this letter sent from", "completion": "Richmond"} SageMaker JumpStart SageMaker JumpStart is a powerful feature within the SageMaker machine learning (ML) environment that provides ML practitioners a comprehensive hub of publicly available and proprietary foundation models (FMs).

ML

ML ML Python AWS

Improve prediction quality in custom classification models with Amazon Comprehend

AWS Machine Learning Blog

OCTOBER 5, 2023

Artificial intelligence (AI) and machine learning (ML) have seen widespread adoption across enterprise and government organizations. Processing unstructured data has become easier with the advancements in natural language processing (NLP) and user-friendly AI/ML services like Amazon Textract , Amazon Transcribe , and Amazon Comprehend.

Data Preparation

Data Preparation ML ML AWS

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

OCTOBER 19, 2023

Customers increasingly want to use deep learning approaches such as large language models (LLMs) to automate the extraction of data and insights. For many industries, data that is useful for machine learning (ML) may contain personally identifiable information (PII).

Machine Learning

Machine Learning Machine Learning ML ML

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

Train a recommendation model in SageMaker Studio using training data that was prepared using SageMaker Data Wrangler. The real-time inference call data is first passed to the SageMaker Data Wrangler container in the inference pipeline, where it is preprocessed and passed to the trained model for product recommendation.

ML

ML ML AWS AI

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Second, because data, code, and other development artifacts like machine learning (ML) models are stored within different services, it can be cumbersome for users to understand how they interact with each other and make changes. With the SQL editor, you can query data lakes, databases, data warehouses, and federated data sources.

SQL

SQL AWS Data Lakes AI

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

Achieve effective business outcomes with no-code machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 29, 2023

On November 30, 2021, we announced the general availability of Amazon SageMaker Canvas , a visual point-and-click interface that enables business analysts to generate highly accurate machine learning (ML) predictions without having to write a single line of code. The key to scaling the use of ML is making it more accessible.

Machine Learning

Machine Learning Machine Learning ML ML

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

Dockerfile requirements.txt Create an Amazon Elastic Container Registry (Amazon ECR) repository in us-east-1 and push the container image created by the downloaded Dockerfile. For more information, refer to Granting Data Catalog permissions using the named resource method. We have completed the data preparation step.

ML

ML ML AWS Database

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 22, 2023

You can use machine learning (ML) to generate these insights and build predictive models. Educators can also use ML to identify challenges in learning outcomes, increase success and retention among students, and broaden the reach and impact of online learning content. Download the following student dataset to your local computer.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Prioritizing employee well-being: An innovative approach with generative AI and Amazon SageMaker Canvas

AWS Machine Learning Blog

JUNE 3, 2024

Solution overview SageMaker Canvas brings together a broad set of capabilities to help data professionals prepare, build, train, and deploy ML models without writing any code. We start from creating a data flow. To learn more about importing data to SageMaker Canvas, see Import data into Canvas.

AWS

AWS ML ML AI

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

AWS Machine Learning Blog

JULY 24, 2024

Fine-tuning an LLM can be a complex workflow for data scientists and machine learning (ML) engineers to operationalize. Solution overview Running hundreds of experiments, comparing the results, and keeping a track of the ML lifecycle can become very complex. In the preproccess step, you can log training data and evaluation data.

ML

ML ML AWS Machine Learning

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Automate and streamline our ML inference pipeline with SageMaker and Airflow Building an inference data pipeline on large datasets is a challenge many companies face. SageMaker Batch Job Allows you to run batch inference on large datasets and generate predictions in a batch mode using machine learning (ML) models hosted in SageMaker.

Data Pipeline

Data Pipeline ML ML AWS

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

AWS Machine Learning Blog

JULY 29, 2024

Amazon Forecast is a fully managed service that uses statistical and machine learning (ML) algorithms to deliver highly accurate time series forecasts. Benefits of SageMaker Canvas Forecast customers have been seeking greater transparency, lower costs, faster training, and enhanced controls for building time series ML models.

ML

ML ML Algorithm AWS

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics. Data scientist experience In this section, we cover how data scientists can connect to Snowflake as a data source in Data Wrangler and prepare data for ML.

AWS

AWS Data Preparation Azure Data Scientist

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

Machine learning (ML) models do not operate in isolation. To deliver value, they must integrate into existing production systems and infrastructure, which necessitates considering the entire ML lifecycle during design and development. GitHub serves as a centralized location to store, version, and manage your ML code base.

AWS

AWS ML ML Data Preparation

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

AWS Machine Learning Blog

MAY 31, 2024

In this blog post and open source project , we show you how you can pre-train a genomics language model, HyenaDNA , using your genomic data in the AWS Cloud. Amazon SageMaker Amazon SageMaker is a fully managed ML service offered by AWS, designed to reduce the time and cost associated with training and tuning ML models at scale.

AWS

AWS ML ML Machine Learning

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

By demonstrating the process of deploying fine-tuned models, we aim to empower data scientists, ML engineers, and application developers to harness the full potential of FMs while addressing unique application requirements. SageMaker Studio is a single web-based interface for end-to-end machine learning (ML) development.

SQL

SQL AWS ML ML

Optimizing costs for Amazon SageMaker Canvas with automatic shutdown of idle apps

AWS Machine Learning Blog

NOVEMBER 24, 2023

Amazon SageMaker Canvas is a rich, no-code Machine Learning (ML) and Generative AI workspace that has allowed customers all over the world to more easily adopt ML technologies to solve old and new challenges thanks to its visual, no-code interface. For example, using the AWS SDK for Python ( boto3 ): import boto3, datetime cw = boto3.client('cloudwatch')

AWS

AWS ML ML Machine Learning

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. Amazon SageMaker notebook jobs allow data scientists to run their notebooks on demand or on a schedule with a few clicks in SageMaker Studio.

ML

ML ML Data Scientist Python

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

ML operationalization summary As defined in the post MLOps foundation roadmap for enterprises with Amazon SageMaker , ML and operations (MLOps) is the combination of people, processes, and technology to productionize machine learning (ML) solutions efficiently.

AI

AI AI ML ML

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 21, 2024

In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks.

AWS

AWS Data Preparation ML ML

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 3: Processing and Data Wrangler jobs

AWS Machine Learning Blog

MAY 30, 2023

Since its introduction, we’ve helped hundreds of customers optimize their workloads, set guardrails, and improve the visibility of their machine learning (ML) workloads’ cost and usage. In this post, we focus on data preprocessing using Amazon SageMaker Processing and Amazon SageMaker Data Wrangler jobs.

ML

ML ML AWS Machine Learning

Build an email spam detector using Amazon SageMaker

AWS Machine Learning Blog

JULY 18, 2023

We walk you through the following steps to set up our spam detector model: Download the sample dataset from the GitHub repo. Load the data in an Amazon SageMaker Studio notebook. Prepare the data for the model. Download the dataset Download the email_dataset.csv from GitHub and upload the file to the S3 bucket.

Supervised Learning

Supervised Learning Algorithm Natural Language Processing AWS

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

AWS Machine Learning Blog

FEBRUARY 27, 2025

Lets examine the key components of this architecture in the following figure, following the data flow from left to right. The workflow consists of the following phases: Data preparation Our evaluation process begins with a prompt dataset containing paired radiology findings and impressions.

AWS

AWS AI AI ML

GenASL: Generative AI-powered American Sign Language avatars

AWS Machine Learning Blog

AUGUST 26, 2024

Users can input audio, video, or text into GenASL, which generates an ASL avatar video that interprets the provided data. The solution uses AWS AI and machine learning (AI/ML) services, including Amazon Transcribe , Amazon SageMaker , Amazon Bedrock , and FMs. You can download and install Docker from Docker’s official website.

AWS

AWS AI AI ML

Fine-tune large multimodal models using Amazon SageMaker

AWS Machine Learning Blog

MAY 29, 2024

Figure 1: LLaVA architecture Prepare data When it comes to fine-tuning the LLaVA model for specific tasks or domains, data preparation is of paramount importance because having high-quality, comprehensive annotations enables the model to learn rich representations and achieve human-level performance on complex visual reasoning challenges.

ML

ML ML AWS Data Visualization

Fine-tune Whisper models on Amazon SageMaker with LoRA

AWS Machine Learning Blog

NOVEMBER 16, 2023

Prepare the dataset for fine-tuning We use the low-resource language Marathi for the fine-tuning task. Using the Hugging Face datasets library, you can download and split the Common Voice dataset into training and testing datasets. His current areas of focus are AI/ML infrastructure and applications.

AWS

AWS ML ML Computer Science

Graphs in Motion: Spatio-Temporal Dynamics with Graph Neural Networks

Towards AI

APRIL 8, 2024

Recently, the standard GNN has been combined with ideas from other ML models to develop exciting, innovative applications. One such development is the integration of GNN with sequential models — Spatio-Temporal GNN that is able to capture both the temporal and spatial (hence the name) dependences of data.

Data Preparation

Data Preparation ML ML AI

Accelerate data preparation for ML in Amazon SageMaker Canvas

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Webinars

Trending Sources

Use Snowflake as a data source to train ML models with Amazon SageMaker

Webinars

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Train and deploy ML models in a multicloud environment using Amazon SageMaker

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Bring your own ML model into Amazon SageMaker Canvas and generate accurate predictions

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Prepare image data with Amazon SageMaker Data Wrangler

Top 10 Machine Learning (ML) Tools for Developers in 2023

Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

Improve prediction quality in custom classification models with Amazon Comprehend

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Achieve effective business outcomes with no-code machine learning using Amazon SageMaker Canvas

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

Prioritizing employee well-being: An innovative approach with generative AI and Amazon SageMaker Canvas

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Optimizing costs for Amazon SageMaker Canvas with automatic shutdown of idle apps

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

MLOps Landscape in 2023: Top Tools and Platforms

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 3: Processing and Data Wrangler jobs

Build an email spam detector using Amazon SageMaker

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

GenASL: Generative AI-powered American Sign Language avatars

Fine-tune large multimodal models using Amazon SageMaker

Fine-tune Whisper models on Amazon SageMaker with LoRA

Graphs in Motion: Spatio-Temporal Dynamics with Graph Neural Networks

Stay Connected