AWS, Data Preparation and SQL - Data Science Current

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them. They then use SQL to explore, analyze, visualize, and integrate data from various sources before using it in their ML training and inference.

SQL

SQL AWS Database Data Scientist

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

To address this challenge, AWS recently announced the preview of Amazon Bedrock Custom Model Import , a feature that you can use to import customized models created in other environments—such as Amazon SageMaker , Amazon Elastic Compute Cloud (Amazon EC2) instances, and on premises—into Amazon Bedrock.

SQL

SQL AWS ML ML

Webinars

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

SageMaker Unied Studio is an integrated development environment (IDE) for data, analytics, and AI. Discover your data and put it to work using familiar AWS tools to complete end-to-end development workflows, including data analysis, data processing, model training, generative AI app building, and more, in a single governed environment.

SQL

SQL AWS Data Lakes AI

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images. The model was then fine-tuned with training data from the data preparation stage. About the authors Scott Patterson is a Senior Solutions Architect at AWS.

AWS

AWS Data Lakes ML ML

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. In this post, you will learn how Marubeni is optimizing market decisions by using the broad set of AWS analytics and ML services, to build a robust and cost-effective Power Bid Optimization solution.

AWS

AWS Machine Learning Machine Learning Analytics

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape. Let’s delve into the database portfolio from IBM available on AWS. 

AWS

AWS Database ETL AI

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface. Choose Create stack.

AWS

AWS Data Lakes Clustering Data Preparation

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

We create a custom training container that downloads data directly from the Snowflake table into the training instance rather than first downloading the data into an S3 bucket. Store your Snowflake account credentials in AWS Secrets Manager. Ingest the data in a table in your Snowflake account.

ML

ML ML AWS Python

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

This is where the AWS suite of low-code and no-code ML services becomes an essential tool. As a strategic systems integrator with deep ML experience, Deloitte utilizes the no-code and low-code ML tools from AWS to efficiently build and deploy ML models for Deloitte’s clients and for internal assets.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. Twilio needed to implement an MLOps pipeline that queried data from PrestoDB. All pipeline parameters used in this solution exist in a single config.yml file.

ML

ML ML AWS Machine Learning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. If you’re familiar with SageMaker and writing Spark code, option B could be your choice.

ML

ML ML AWS Data Warehouse

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

AWS Machine Learning Blog

APRIL 17, 2023

In other words, companies need to move from a model-centric approach to a data-centric approach.” – Andrew Ng A data-centric AI approach involves building AI systems with quality data involving data preparation and feature engineering. Custom transforms can be written as separate steps within Data Wrangler.

AWS

AWS ML ML Python

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the Amazon Web Services (AWS) tools without having to manage infrastructure. The following screenshot shows what the upload looks like when it’s complete.

AWS

AWS Machine Learning Machine Learning Database

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

AWS Machine Learning Blog

MARCH 10, 2023

Finally, they can also train and deploy models with SageMaker Autopilot , schedule jobs, or operationalize data preparation in a SageMaker Pipeline from Data Wrangler’s visual interface. Solution overview With SageMaker Studio setups, data professionals can quickly identify and connect to existing EMR clusters.

Clustering

Clustering AWS ML ML

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

OCTOBER 19, 2023

Additionally, using Amazon Comprehend with AWS PrivateLink means that customer data never leaves the AWS network and is continuously secured with the same data access and privacy controls as the rest of your applications. For more details, refer to Integrating SageMaker Data Wrangler with SageMaker Pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

Migrating From AWS Redshift to Snowflake: 2 Methods to Explore

phData

FEBRUARY 9, 2023

Welcome to our AWS Redshift to the Snowflake Data Cloud migration blog! In this blog, we’ll walk you through the process of migrating your data from AWS Redshift to the Snowflake Data Cloud. As an experienced data engineering consulting company, phData has helped with numerous migrations to Snowflake.

AWS

AWS ETL Data Preparation SQL

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 2: SageMaker notebooks and Studio

AWS Machine Learning Blog

MAY 30, 2023

In 2021, we launched AWS Support Proactive Services as part of the AWS Enterprise Support offering. In Part 1 , we showed how to get started using AWS Cost Explorer to identify cost optimization opportunities in SageMaker. You can build custom queries to look up AWS CUR data using standard SQL.

AWS

AWS ML ML EDA

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Data preparation is important at multiple stages in Retrieval Augmented Generation ( RAG ) models. Create a dataflow Complete the following steps to create a data flow in SageMaker Canvas: On the SageMaker Canvas home page, choose Data preparation. This will land on a data flow page. Choose your domain.

Data Preparation

Data Preparation AI AI Python

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

The rules in this engine were predefined and written in SQL, which aside from posing a challenge to manage, also struggled to cope with the proliferation of data from TR’s various integrated data source. TR customer data is changing at a faster rate than the business rules can evolve to reflect changing customer needs.

AWS

AWS Data Warehouse ML ML

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

Harnessing Machine Learning on Big Data with PySpark on AWS

ODSC - Open Data Science

AUGUST 9, 2023

Be sure to check out his talk, “ Build Classification and Regression Models with Spark on AWS ,” there! In the unceasingly dynamic arena of data science, discerning and applying the right instruments can significantly shape the outcomes of your machine learning initiatives. A cordial greeting to all data science enthusiasts!

Machine Learning

Machine Learning Machine Learning AWS Big Data

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

This allows you to create unique views and filters, and grants management teams access to a streamlined, one-click dashboard without needing to log in to the AWS Management Console and search for the appropriate dashboard. On the AWS CloudFormation console, create a new stack. amazonaws.com docker build -t. docker tag :latest.dkr.ecr.us-east-1.amazonaws.com/

ML

ML ML AWS Database

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

The following steps give an overview of how to use the new capabilities launched in SageMaker for Salesforce to enable the overall integration: Set up the Amazon SageMaker Studio domain and OAuth between Salesforce and the AWS account s. Select Other type of secret. Save the secret and note the ARN of the secret.

ML

ML ML AWS AI

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Becoming Human

MAY 15, 2023

One is a scripting language such as Python, and the other is a Query language like SQL (Structured Query Language) for SQL Databases. Python is a High-level, Procedural, and object-oriented language; it is also a vast language itself, and covering the whole of Python is one the worst mistakes we can make in the data science journey.

Data Science

Data Science Machine Learning Machine Learning Database

Top Ten Power BI Alternatives For Your Data Needs

Pickl AI

NOVEMBER 18, 2024

Advanced tools like AWS QuickSight support large datasets and growing businesses. Microsoft Power BI is a comprehensive business intelligence (BI) tool designed to help organisations turn raw data into meaningful insights. It supports various Visualisations and can connect to various SQL-based data sources.

Power BI

Power BI Tableau Data Analysis Data Analysis

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Visual modeling: Delivers easy-to-use workflows for data scientists to build data preparation and predictive machine learning pipelines that include text analytics, visualizations and a variety of modeling methods. Presto engine: Incorporates the latest performance enhancements to the Presto query engine.

AI

AI AI Machine Learning Machine Learning

Amazon SageMaker Data Wrangler for dimensionality reduction

AWS Machine Learning Blog

APRIL 24, 2023

Data Wrangler simplifies the process of data preparation and feature engineering like data selection, cleansing, exploration, and visualization from a single visual interface. Data Wrangler has more than 300 preconfigured data transformations that can effectively be used in transforming the data.

Data Quality

Data Quality Machine Learning Machine Learning Deep Learning

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Example template for an exploratory notebook | Source: Author How to organize code in Jupyter notebook For exploratory tasks, the code to produce SQL queries, pandas data wrangling, or create plots is not important for readers. in a pandas DataFrame) but in the company’s data warehouse (e.g., documentation. Redshift).

SQL

SQL Database Data Scientist Python

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

If you answer “yes” to any of these questions, you will need cloud storage, such as Amazon AWS’s S3, Azure Data Lake Storage or GCP’s Google Storage. Copy Into When loading data into Snowflake, the very first and most important rule to follow is: do not load data with SQL inserts!

Clustering

Clustering Database SQL Data Pipeline

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

By implementing efficient data pipelines , organisations can enhance their data processing capabilities, reduce time spent on data preparation, and improve overall data accessibility. Data Storage Solutions Data storage solutions are critical in determining how data is organised, accessed, and managed.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. SageMaker Studio offers built-in algorithms, automated model tuning, and seamless integration with AWS services, making it a powerful platform for developing and deploying machine learning solutions at scale.

Machine Learning

Machine Learning Machine Learning ML ML

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Tools like Apache NiFi, Talend, and Informatica provide user-friendly interfaces for designing workflows, integrating diverse data sources, and executing ETL processes efficiently. Choosing the right tool based on the organisation’s specific needs, such as data volume and complexity, is vital for optimising ETL efficiency.

ETL

ETL Data Warehouse Data Quality Data Governance

Must-Have Prompt Engineering Skills for 2024

ODSC - Open Data Science

JANUARY 29, 2024

Databricks: Powered by Apache Spark, Databricks is a unified data processing and analytics platform, facilitates data preparation, can be used for integration with LLMs, and performance optimization for complex prompt engineering tasks. Kubernetes: A long-established tool for containerized apps.

Data Science

Data Science Machine Learning Machine Learning Natural Language Processing

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

A traditional machine learning (ML) pipeline is a collection of various stages that include data collection, data preparation, model training and evaluation, hyperparameter tuning (if needed), model deployment and scaling, monitoring, security and compliance, and CI/CD.

Machine Learning

Machine Learning Machine Learning ML ML

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

AWS Machine Learning Blog

JANUARY 26, 2024

We also discuss common security concerns that can undermine trust in AI, as identified by the Open Worldwide Application Security Project (OWASP) Top 10 for LLM Applications , and show ways you can use AWS to increase your security posture and confidence while innovating with generative AI.

AWS

AWS ML ML AI

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

In addition to its groundbreaking AI innovations, Zeta Global has harnessed Amazon Elastic Container Service (Amazon ECS) with AWS Fargate to deploy a multitude of smaller models efficiently. It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines.

AWS

AWS Machine Learning Machine Learning ML

Exploring data using AI chat at Domo with Amazon Bedrock

AWS Machine Learning Blog

SEPTEMBER 9, 2024

The next step is to provide them with a more intuitive and conversational interface to interact with their data, empowering them to generate meaningful visualizations and reports through natural language interactions. Solution overview The following diagram illustrates the solution architecture and data flow.

AI

AI AI AWS ML

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

AWS Machine Learning Blog

APRIL 6, 2023

With this launch, Canvas empowers you to capitalize on data stored in disparate sources by supporting in-app data import and aggregation from over 40 data sources. This feature is made possible through new native connectors to Athena and to Amazon AppFlow via the AWS Glue Data Catalog.

Machine Learning

Machine Learning Machine Learning ML ML

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

Through this unified query capability, you can create comprehensive insights into customer transaction patterns and purchase behavior for active products without the traditional barriers of data silos or the need to copy data between systems. Create a user with administrative access.

SQL

SQL Data Analyst Data Warehouse AWS

Predicting the Future of Data Science

Pickl AI

DECEMBER 4, 2024

Augmented Analytics Augmented analytics is revolutionising the way businesses analyse data by integrating Artificial Intelligence (AI) and Machine Learning (ML) into analytics processes. Understand data structures and explore data warehousing concepts to efficiently manage and retrieve large datasets.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Develop a RAG-based application using Amazon Aurora with Amazon Kendra

AWS Machine Learning Blog

JANUARY 28, 2025

RAG retrieves data from a preexisting knowledge base (your data), combines it with the LLMs knowledge, and generates responses with more human-like language. However, in order for generative AI to understand your data, some amount of data preparation is required, which involves a big learning curve.

AWS

AWS Database Clustering Data Preparation

How Formula 1® uses generative AI to accelerate race-day issue resolution

AWS Machine Learning Blog

FEBRUARY 18, 2025

Recognizing this challenge as an opportunity for innovation, F1 partnered with Amazon Web Services (AWS) to develop an AI-driven solution using Amazon Bedrock to streamline issue resolution. The objective was to use AWS to replicate and automate the current manual troubleshooting process for two candidate systems.

AWS

AWS Database ETL AI

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Trending Sources

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Webinars

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

How Northpower used computer vision with AWS to automate safety inspection risk assessments

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Tackling AI’s data challenges with IBM databases on AWS

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Use Snowflake as a data source to train ML models with Amazon SageMaker

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

Migrating From AWS Redshift to Snowflake: 2 Methods to Explore

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 2: SageMaker notebooks and Studio

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Harnessing Machine Learning on Big Data with PySpark on AWS

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Top Ten Power BI Alternatives For Your Data Needs

Exploring the AI and data capabilities of watsonx

Amazon SageMaker Data Wrangler for dimensionality reduction

How to Use Exploratory Notebooks [Best Practices]

Getting Started With Snowflake: Best Practices For Launching

Discover the Most Important Fundamentals of Data Engineering

MLOps Landscape in 2023: Top Tools and Platforms

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Must-Have Prompt Engineering Skills for 2024

How to Choose MLOps Tools: In-Depth Guide for 2024

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Exploring data using AI chat at Domo with Amazon Bedrock

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Predicting the Future of Data Science

Develop a RAG-based application using Amazon Aurora with Amazon Kendra

How Formula 1® uses generative AI to accelerate race-day issue resolution

Stay Connected