Data Preparation, Database and ML - Data Science Current

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler.

Data Preparation

Data Preparation ML ML Data Quality

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others. Database name : Enter dev. Choose Add connection.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

A comprehensive comparison of RPA and ML

Dataconomy

MARCH 27, 2023

However, while RPA and ML share some similarities, they differ in functionality, purpose, and the level of human intervention required. In this article, we will explore the similarities and differences between RPA and ML and examine their potential use cases in various industries. What is machine learning (ML)?

ML

ML ML Machine Learning Machine Learning

Webinars

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

With that, the need for data scientists and machine learning (ML) engineers has grown significantly. Data scientists and ML engineers require capable tooling and sufficient compute for their work. Data scientists and ML engineers require capable tooling and sufficient compute for their work.

ML

ML ML AWS AI

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

With data software pushing the boundaries of what’s possible in order to answer business questions and alleviate operational bottlenecks, data-driven companies are curious how they can go “beyond the dashboard” to find the answers they are looking for. One of the standout features of Dataiku is its focus on collaboration.

Machine Learning

Machine Learning Machine Learning Data Science ML

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Now all you need is some guidance on generative AI and machine learning (ML) sessions to attend at this twelfth edition of re:Invent. In addition to several exciting announcements during keynotes, most of the sessions in our track will feature generative AI in one form or another, so we can truly call our track “Generative AI and ML.”

AWS

AWS ML ML AI

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML.

ML

ML ML AWS Data Warehouse

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage. Also, traditional database management tasks, including backups, upgrades and routine maintenance drain valuable time and resources, hindering innovation.

AWS

AWS Database ETL AI

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone makes it straightforward for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate to derive data-driven insights.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. Photo by Myriam Jessier on Unsplash To set the stage, let’s examine the nuances between research-phase data and production-phase data. Data is a key differentiator in ML projects (more on this in my blog post below).

ML

ML ML Data Preparation Data Engineering

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Second, because data, code, and other development artifacts like machine learning (ML) models are stored within different services, it can be cumbersome for users to understand how they interact with each other and make changes. With the SQL editor, you can query data lakes, databases, data warehouses, and federated data sources.

SQL

SQL AWS Data Lakes AI

A comprehensive comparison of RPA and ML

Dataconomy

MARCH 27, 2023

However, while RPA and ML share some similarities, they differ in functionality, purpose, and the level of human intervention required. In this article, we will explore the similarities and differences between RPA and ML and examine their potential use cases in various industries. What is machine learning (ML)?

ML

ML ML Machine Learning Machine Learning

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

Multimodal Retrieval Augmented Generation (MM-RAG) is emerging as a powerful evolution of traditional RAG systems, addressing limitations and expanding capabilities across diverse data types. Traditionally, RAG systems were text-centric, retrieving information from large text databases to provide relevant context for language models.

AWS

AWS Computer Science Computer Science Database

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

Amazon Bedrock Model Distillation: Boost function calling accuracy while reducing cost and latency

AWS Machine Learning Blog

APRIL 30, 2025

In this post, we highlight the advanced data augmentation techniques and performance improvements in Amazon Bedrock Model Distillation with Metas Llama model family. Preparing your data Effective data preparation is crucial for successful distillation of agent function calling capabilities.

AWS

AWS AI AI Computer Science

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Data preparation is important at multiple stages in Retrieval Augmented Generation ( RAG ) models. Specifically, we clean the data and create RAG artifacts to answer the questions about the content of the dataset. Access to Amazon OpenSearch as a vector database. This will land on a data flow page.

Data Preparation

Data Preparation AI AI Python

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

The following screenshot shows the Data Catalog schema. Access permission to the AWS Glue databases and tables are managed by AWS Lake Formation. You can find the AWS Glue database name on the Outputs tab of the CloudFormation stack. We have completed the data preparation step. Choose Create data source.

ML

ML ML AWS Database

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

AWS Machine Learning Blog

JULY 11, 2024

RAG provides additional knowledge to the LLM through its input prompt space and its architecture typically consists of the following components: Indexing : Prepare a corpus of unstructured text, parse and chunk it, and then, embed each chunk and store it in a vector database. writefile opt/ml/model/inference.py

AWS

AWS ML ML Machine Learning

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment.

AWS

AWS Machine Learning Machine Learning ML

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. ML models don’t operate in isolation. This necessitates considering the entire ML lifecycle during design and development.

ML

ML ML AWS Machine Learning

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics. Data scientist experience In this section, we cover how data scientists can connect to Snowflake as a data source in Data Wrangler and prepare data for ML.

AWS

AWS Data Preparation Azure Data Scientist

Implement real-time personalized recommendations using Amazon Personalize

AWS Machine Learning Blog

NOVEMBER 13, 2023

At a basic level, Machine Learning (ML) technology learns from data to make predictions. Businesses use their data with an ML-powered personalization service to elevate their customer experience. This approach allows businesses to use data to derive actionable insights and help grow their revenue and brand loyalty.

AWS

AWS Data Preparation ML ML

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 22, 2023

You can use machine learning (ML) to generate these insights and build predictive models. Educators can also use ML to identify challenges in learning outcomes, increase success and retention among students, and broaden the reach and impact of online learning content. Import the Dropout_Academic Success - Sheet1.csv

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them.

SQL

SQL AWS Database Data Scientist

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

For readers who work in ML/AI, it’s well understood that machine learning models prefer feature vectors of numerical information. However, the majority of enterprise data remains unleveraged from an analytics and machine learning perspective, and much of the most valuable information remains in relational database schemas such as OLAP.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

AWS Machine Learning Blog

MARCH 10, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Starting today, you can connect to Amazon EMR Hive as a big data query engine to bring in large datasets for ML.

Clustering

Clustering AWS ML ML

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

The solution harnesses the capabilities of generative AI, specifically Large Language Models (LLMs), to address the challenges posed by diverse sensor data and automatically generate Python functions based on various data formats. This allows for data to be aggregated for further manufacturer-agnostic analysis.

AWS

AWS AI AI Python

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. This same interface is also used for provisioning EMR clusters.

AWS

AWS Clustering Big Data Big Data

Building ML Platform in Retail and eCommerce

The MLOps Blog

MAY 31, 2023

And eCommerce companies have a ton of use cases where ML can help. The problem is, with more ML models and systems in production, you need to set up more infrastructure to reliably manage everything. And because of that, many companies decide to centralize this effort in an internal ML platform. But how to build it?

ML

ML ML Algorithm Machine Learning

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

With this new capability, you can securely ask questions on single documents, without the overhead of setting up a vector database or ingesting data, making it effortless for businesses to use their enterprise data. You only need to provide a relevant data file as input and choose your FM to get started.

AWS

AWS Database Python AI

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

By demonstrating the process of deploying fine-tuned models, we aim to empower data scientists, ML engineers, and application developers to harness the full potential of FMs while addressing unique application requirements. SageMaker Studio is a single web-based interface for end-to-end machine learning (ML) development.

SQL

SQL AWS ML ML

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

AWS Machine Learning Blog

APRIL 6, 2023

Data is at the heart of machine learning (ML). Including relevant data to comprehensively represent your business problem ensures that you effectively capture trends and relationships so that you can derive the insights needed to drive business decisions. This dataset then needs to be imported into a separate application for ML.

Machine Learning

Machine Learning Machine Learning ML ML

Responsible AI in action: How Data Reply red teaming supports generative AI safety on AWS

AWS Machine Learning Blog

APRIL 29, 2025

To address potential fairness concerns, it can be helpful to evaluate disparities and imbalances in training data or outcomes. Amazon SageMaker Clarify helps identify potential biases during data preparation without requiring code. Davide Gallitelli is a Senior Specialist Solutions Architect for AI/ML in the EMEA region.

AWS

AWS AI AI ML

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

AWS Machine Learning Blog

MAY 23, 2023

With organizations increasingly investing in machine learning (ML), ML adoption has become an integral part of business transformation strategies. However, implementing ML into production comes with various considerations, notably being able to navigate the world of AI safely, strategically, and responsibly.

Machine Learning

Machine Learning Machine Learning AWS ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Statistical methods and machine learning (ML) methods are actively developed and adopted to maximize the LTV. In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker.

AWS

AWS ML ML ETL

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale. The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring.

AWS

AWS ML ML Machine Learning

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

Only involving necessary people to do case validation or augmentation tasks reduces the risk of document mishandling and human error when dealing with sensitive data. Sensitive data in these data stores needs to be secured. You can either secure the output PII in your data store or redact the PII in your IDP output.

AWS

AWS ML ML Machine Learning

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

How to Learn Machine Learning

APRIL 26, 2025

Each component in this ecosystem is very important in the data-driven decision-making process for an organization. Data Sources and Collection Everything in data science begins with data. Data can be generated from databases, sensors, social media platforms, APIs, logs, and web scraping.

Data Science

Data Science Data Analyst Data Scientist Machine Learning

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

AWS Machine Learning Blog

FEBRUARY 27, 2025

Lets examine the key components of this architecture in the following figure, following the data flow from left to right. The workflow consists of the following phases: Data preparation Our evaluation process begins with a prompt dataset containing paired radiology findings and impressions.

AWS

AWS AI AI ML

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

The final retrieval augmentation workflow covers the following high-level steps: The user query is used for a retriever component, which does a vector search, to retrieve the most relevant context from our database. Some of the models offer capabilities for you to fine-tune them with your own data.

AWS

AWS Database AI AI

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. We are happy to announce that SageMaker Data Wrangler now supports using Lake Formation with Amazon EMR to provide this fine-grained data access restriction.

AWS

AWS Data Lakes Clustering Data Preparation

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

ML operationalization summary As defined in the post MLOps foundation roadmap for enterprises with Amazon SageMaker , ML and operations (MLOps) is the combination of people, processes, and technology to productionize machine learning (ML) solutions efficiently.

AI

AI AI ML ML

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

aws sagemaker create-cluster --cli-input-json file://cluster-config.json --region $AWS_REGION You should be able to see your cluster by navigating to SageMaker Hyperpod in the AWS Management Console and see a cluster named ml-cluster listed. After a few minutes, its status should change from Creating to InService.

AWS

AWS Clustering Deep Learning Deep Learning

Accelerate data preparation for ML in Amazon SageMaker Canvas

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Trending Sources

A comprehensive comparison of RPA and ML

Webinars

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

How Dataiku and Snowflake Strengthen the Modern Data Stack

Your guide to generative AI and ML at AWS re:Invent 2023

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Tackling AI’s data challenges with IBM databases on AWS

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Data4ML Preparation Guidelines (Beyond The Basics)

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

A comprehensive comparison of RPA and ML

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Amazon Bedrock Model Distillation: Boost function calling accuracy while reducing cost and latency

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

MLOps Landscape in 2023: Top Tools and Platforms

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Implement real-time personalized recommendations using Amazon Personalize

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

GraphReduce: Using Graphs for Feature Engineering Abstractions

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

Improving air quality with generative AI

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Building ML Platform in Retail and eCommerce

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

Responsible AI in action: How Data Reply red teaming supports generative AI safety on AWS

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Build well-architected IDP solutions with a custom lens – Part 2: Security

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Stay Connected