Data Preparation, ML and SQL - Data Science Current

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others. A provisioned or serverless Amazon Redshift data warehouse.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them.

SQL

SQL AWS Database Data Scientist

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

By demonstrating the process of deploying fine-tuned models, we aim to empower data scientists, ML engineers, and application developers to harness the full potential of FMs while addressing unique application requirements. We use the sql-create-context dataset available on Hugging Face for fine-tuning.

SQL

SQL AWS ML ML

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. We add this data to Snowflake as a new table.

ML

ML ML AWS Python

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

With data software pushing the boundaries of what’s possible in order to answer business questions and alleviate operational bottlenecks, data-driven companies are curious how they can go “beyond the dashboard” to find the answers they are looking for. One of the standout features of Dataiku is its focus on collaboration.

Machine Learning

Machine Learning Machine Learning Data Science ML

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. You can use query_string to filter your dataset by SQL and unload it to Amazon S3.

ML

ML ML AWS Data Warehouse

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Second, because data, code, and other development artifacts like machine learning (ML) models are stored within different services, it can be cumbersome for users to understand how they interact with each other and make changes. Data and AI governance Publish your data products to the catalog with glossaries and metadata forms.

SQL

SQL AWS Data Lakes AI

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

The ability to quickly build and deploy machine learning (ML) models is becoming increasingly important in today’s data-driven world. However, building ML models requires significant time, effort, and specialized expertise. This is where the AWS suite of low-code and no-code ML services becomes an essential tool.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

Train a recommendation model in SageMaker Studio using training data that was prepared using SageMaker Data Wrangler. The real-time inference call data is first passed to the SageMaker Data Wrangler container in the inference pipeline, where it is preprocessed and passed to the trained model for product recommendation.

ML

ML ML AWS AI

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. ML models don’t operate in isolation. This necessitates considering the entire ML lifecycle during design and development.

ML

ML ML AWS Machine Learning

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

AWS Machine Learning Blog

APRIL 17, 2023

In other words, companies need to move from a model-centric approach to a data-centric approach.” – Andrew Ng A data-centric AI approach involves building AI systems with quality data involving data preparation and feature engineering. Custom transforms can be written as separate steps within Data Wrangler.

AWS

AWS ML Python ML

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

How to Learn Machine Learning

APRIL 26, 2025

Data can be generated from databases, sensors, social media platforms, APIs, logs, and web scraping. Data can be in structured (like tables in databases), semi-structured (like XML or JSON), or unstructured (like text, audio, and images) form. How to Choose the Right Data Science Career Path?

Data Science

Data Science Data Analyst Data Scientist Machine Learning

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

For readers who work in ML/AI, it’s well understood that machine learning models prefer feature vectors of numerical information. Tapping into these schemas and pulling out machine learning-ready features can be nontrivial as one needs to know where the data entity of interest lives (e.g.,

Data Preparation

Data Preparation Machine Learning Machine Learning ML

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

MPII is using a machine learning (ML) bid optimization engine to inform upstream decision-making processes in power asset management and trading. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. Data comes from disparate sources in a number of formats.

AWS

AWS Machine Learning Machine Learning Analytics

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

OCTOBER 19, 2023

Customers increasingly want to use deep learning approaches such as large language models (LLMs) to automate the extraction of data and insights. For many industries, data that is useful for machine learning (ML) may contain personally identifiable information (PII).

Machine Learning

Machine Learning Machine Learning ML ML

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Solution overview Amazon SageMaker is a fully managed service that helps developers and data scientists build, train, and deploy machine learning (ML) models. Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images.

AWS

AWS Data Lakes ML ML

Generative AI for Data Analytics: Top 7 Tools, Use-cases, and More

Data Science Dojo

AUGUST 16, 2024

Natural Language Processing (NLP) for Data Interaction Generative AI models like GPT-4 utilize transformer architectures to understand and generate human-like text based on a given context. Personalized Reporting : Perfect for managers and executives who need quick, relevant updates on key metrics without delving into complex data sets.

Analytics

Analytics Analytics Power BI AI

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

QuickSight connects to your data and combines data from many different sources, such as Amazon S3 and Athena. For our solution, we use Athena as the data source. Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL.

ML

ML ML AWS Database

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

AWS Machine Learning Blog

MARCH 10, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Starting today, you can connect to Amazon EMR Hive as a big data query engine to bring in large datasets for ML.

Clustering

Clustering AWS ML ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment.

AWS

AWS Machine Learning Machine Learning ML

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Data preparation is important at multiple stages in Retrieval Augmented Generation ( RAG ) models. Specifically, we clean the data and create RAG artifacts to answer the questions about the content of the dataset. Choose Create on the right side of page, then give a data flow name and select Create. Choose your domain.

Data Preparation

Data Preparation AI AI Python

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

The rules in this engine were predefined and written in SQL, which aside from posing a challenge to manage, also struggled to cope with the proliferation of data from TR’s various integrated data source. TR customer data is changing at a faster rate than the business rules can evolve to reflect changing customer needs.

AWS

AWS Data Warehouse ML ML

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

AWS Machine Learning Blog

APRIL 6, 2023

Data is at the heart of machine learning (ML). Including relevant data to comprehensively represent your business problem ensures that you effectively capture trends and relationships so that you can derive the insights needed to drive business decisions. This dataset then needs to be imported into a separate application for ML.

Machine Learning

Machine Learning Machine Learning ML ML

State of Machine Learning Survey Results Part Two

ODSC - Open Data Science

MARCH 13, 2023

Machine learning practitioners are often working with data at the beginning and during the full stack of things, so they see a lot of workflow/pipeline development, data wrangling, and data preparation. What percentage of machine learning models developed in your organization get deployed to a production environment?

Machine Learning

Machine Learning Machine Learning Data Wrangling Data Science

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. Data professionals such as data scientists want to use the power of Apache Spark , Hive , and Presto running on Amazon EMR for fast data preparation; however, the learning curve is steep.

AWS

AWS Data Lakes Clustering Data Preparation

AI Development Lifecycle Learnings of What Changed with LLMs

ODSC - Open Data Science

FEBRUARY 5, 2025

Common Pitfalls in LLM Development Neglecting Data Preparation: Poorly prepared data leads to subpar evaluation and iterations, reducing generalizability and stakeholder confidence. Real-world applications often expose gaps that proper data preparation could have preempted. Evaluation: Tools likeNotion.

Data Preparation

Data Preparation AI AI Data Scientist

How to Prepare Data for Use in Machine Learning Models

phData

JUNE 18, 2024

Machine learning (ML) is only possible because of all the data we collect. However, with data coming from so many different sources, it doesn’t always come in a format that’s easy for ML models to understand. Before you can take advantage of everything ML offers, much prep work is involved.

Machine Learning

Machine Learning Machine Learning ML ML

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 2: SageMaker notebooks and Studio

AWS Machine Learning Blog

MAY 30, 2023

Since its introduction, we have helped hundreds of customers optimize their workloads, set guardrails, and improve the visibility of their machine learning (ML) workloads’ cost and usage. Notebooks contain everything needed to run or recreate an ML workflow. You can build custom queries to look up AWS CUR data using standard SQL.

AWS

AWS ML ML EDA

What Do You Actually Need from a Data Catalog Tool?

Alation

SEPTEMBER 23, 2021

There are many other features and functions , but to verify that a data catalog functions as platform, look for these five features: Intelligence – A data catalog should leverage AI/ML-driven pattern detection, including popularity, pattern matching, and provenance/impact analysis. Key Features of a Data Catalog.

Data Preparation

Data Preparation SQL Data Governance Data Analysis

Exploring data using AI chat at Domo with Amazon Bedrock

AWS Machine Learning Blog

SEPTEMBER 9, 2024

The next step is to provide them with a more intuitive and conversational interface to interact with their data, empowering them to generate meaningful visualizations and reports through natural language interactions. Mohammad Tahsin is an AI/ML Specialist Solutions Architect at Amazon Web Services. powered by Amazon Bedrock Domo.AI

AI

AI AI AWS ML

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Becoming Human

MAY 15, 2023

One is a scripting language such as Python, and the other is a Query language like SQL (Structured Query Language) for SQL Databases. Python is a High-level, Procedural, and object-oriented language; it is also a vast language itself, and covering the whole of Python is one the worst mistakes we can make in the data science journey.

Data Science

Data Science Machine Learning Machine Learning Database

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

Snowpark is the set of libraries and runtimes in Snowflake that securely deploy and process non-SQL code, including Python, Java, and Scala. On the client side, Snowpark consists of libraries, including the DataFrame API and native Snowpark machine learning (ML) APIs for model development (public preview) and deployment (private preview).

Python

Python ML ML SQL

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Try Db2 Warehouse SaaS on AWS for free   Netezza SaaS on AWS IBM® Netezza® Performance Server is a cloud-native data warehouse designed to operationalize deep analytics, data mining and BI by unifying, accessing and scaling all types of data across the hybrid cloud. Netezza

AWS

AWS Database ETL AI

Amazon SageMaker Data Wrangler for dimensionality reduction

AWS Machine Learning Blog

APRIL 24, 2023

In the world of machine learning (ML), the quality of the dataset is of significant importance to model predictability. Although more data is usually better, large datasets with a high number of features can sometimes lead to non-optimal model performance due to the curse of dimensionality.

Data Quality

Data Quality Machine Learning Machine Learning Deep Learning

How and When to Use Dataflows in Power BI

phData

SEPTEMBER 28, 2023

Dataflows represent a cloud-based technology designed for data preparation and transformation purposes. Dataflows have different connectors to retrieve data, including databases, Excel files, APIs, and other similar sources, along with data manipulations that are performed using Online Power Query Editor.

Power BI

Power BI Data Preparation Machine Learning Machine Learning

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Nevertheless, many data scientists will agree that they can be really valuable – if used well. And that’s what we’re going to focus on in this article, which is the second in my series on Software Patterns for Data Science & ML Engineering. in a pandas DataFrame) but in the company’s data warehouse (e.g., Redshift).

SQL

SQL Database Data Scientist Python

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. Automated development: Automates data preparation, model development, feature engineering and hyperparameter optimization using AutoAI. .”

AI

AI AI Machine Learning Machine Learning

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

A traditional machine learning (ML) pipeline is a collection of various stages that include data collection, data preparation, model training and evaluation, hyperparameter tuning (if needed), model deployment and scaling, monitoring, security and compliance, and CI/CD. What is MLOps?

Machine Learning

Machine Learning Machine Learning ML ML

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

Data preparation Before creating a knowledge base using Knowledge Bases for Amazon Bedrock, it’s essential to prepare the data to augment the FM in a RAG implementation. In this demonstration, let’s assume that you need to remove the data related to a particular customer.

AWS

AWS Machine Learning Machine Learning Database

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Webinars

Trending Sources

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Webinars

Use Snowflake as a data source to train ML models with Amazon SageMaker

How Dataiku and Snowflake Strengthen the Modern Data Stack

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

GraphReduce: Using Graphs for Feature Engineering Abstractions

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Generative AI for Data Analytics: Top 7 Tools, Use-cases, and More

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

MLOps Landscape in 2023: Top Tools and Platforms

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

State of Machine Learning Survey Results Part Two

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AI Development Lifecycle Learnings of What Changed with LLMs

How to Prepare Data for Use in Machine Learning Models

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 2: SageMaker notebooks and Studio

What Do You Actually Need from a Data Catalog Tool?

Exploring data using AI chat at Domo with Amazon Bedrock

Roadmap to Learn Data Science for Beginners and Freshers in 2023

How Does Snowpark Work?

Tackling AI’s data challenges with IBM databases on AWS

Amazon SageMaker Data Wrangler for dimensionality reduction

How and When to Use Dataflows in Power BI

How to Use Exploratory Notebooks [Best Practices]

Exploring the AI and data capabilities of watsonx

How to Choose MLOps Tools: In-Depth Guide for 2024

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Stay Connected