Artificial Intelligence, Data Preparation and SQL

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them. They then use SQL to explore, analyze, visualize, and integrate data from various sources before using it in their ML training and inference.

SQL

SQL AWS Database Data Scientist

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

Data Science Dojo

MARCH 7, 2023

This includes sourcing, gathering, arranging, processing, and modeling data, as well as being able to analyze large volumes of structured or unstructured data. The goal of data preparation is to present data in the best forms for decision-making and problem-solving.

Data Scientist

Data Scientist Exploratory Data Analysis Data Science Data Visualization

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. This process might take a couple of hours.

SQL

SQL AWS ML ML

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

Its cloud-native architecture, combined with robust data-sharing capabilities, allows businesses to easily leverage cutting-edge tools from partners like Dataiku, fostering innovation and driving more insightful, data-driven outcomes. One of the standout features of Dataiku is its focus on collaboration.

Machine Learning

Machine Learning Machine Learning Data Science ML

State of Machine Learning Survey Results Part Two

ODSC - Open Data Science

MARCH 13, 2023

Machine learning practitioners are often working with data at the beginning and during the full stack of things, so they see a lot of workflow/pipeline development, data wrangling, and data preparation.

Machine Learning

Machine Learning Machine Learning Data Wrangling Data Science

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

Tapping into these schemas and pulling out machine learning-ready features can be nontrivial as one needs to know where the data entity of interest lives (e.g., customers), what its relations are, and how they’re connected, and then write SQL, python, or other to join and aggregate to a granularity of interest.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Specifically, we cover the computer vision and artificial intelligence (AI) techniques used to combine datasets into a list of prioritized tasks for field teams to investigate and mitigate. Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images.

AWS

AWS Data Lakes ML ML

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

We create a custom training container that downloads data directly from the Snowflake table into the training instance rather than first downloading the data into an S3 bucket. Previously, he was a software solutions architect for deep learning, analytics, and big data technologies at Intel.

ML

ML ML AWS Python

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. Twilio needed to implement an MLOps pipeline that queried data from PrestoDB. All pipeline parameters used in this solution exist in a single config.yml file.

ML

ML ML AWS Machine Learning

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Generative artificial intelligence ( generative AI ) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. This will land on a data flow page.

Data Preparation

Data Preparation AI AI Python

On the implementation of digital tools

Dataconomy

OCTOBER 15, 2024

Forbes reports that global data production increased from 2 zettabytes in 2010 to 44 ZB in 2020, with projections exceeding 180 ZB by 2025 – a staggering 9,000% growth in just 15 years, partly driven by artificial intelligence. However, raw data alone doesn’t equate to actionable insights.

Data Modeling

Data Modeling Data Models Analytics Analytics

AI Development Lifecycle Learnings of What Changed with LLMs

ODSC - Open Data Science

FEBRUARY 5, 2025

Common Pitfalls in LLM Development Neglecting Data Preparation: Poorly prepared data leads to subpar evaluation and iterations, reducing generalizability and stakeholder confidence. Real-world applications often expose gaps that proper data preparation could have preempted. Evaluation: Tools likeNotion.

Data Preparation

Data Preparation AI AI Data Scientist

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. Data professionals such as data scientists want to use the power of Apache Spark , Hive , and Presto running on Amazon EMR for fast data preparation; however, the learning curve is steep.

AWS

AWS Data Lakes Clustering Data Preparation

Introduction to Power BI Datamarts

ODSC - Open Data Science

JUNE 12, 2023

Power BI Datamarts provide no-code/low-code datamart capabilities using Azure SQL Database technology in the background. The Power BI Datamarts support sensitivity labels, endorsement, discovery, and Row-Level Security ( RLS ), which help protect and manage the data according to the business requirements and compliance needs.

Power BI

Power BI Data Warehouse ETL Data Preparation

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Becoming Human

MAY 15, 2023

One is a scripting language such as Python, and the other is a Query language like SQL (Structured Query Language) for SQL Databases. Python is a High-level, Procedural, and object-oriented language; it is also a vast language itself, and covering the whole of Python is one the worst mistakes we can make in the data science journey.

Data Science

Data Science Machine Learning Machine Learning Database

Exploring data using AI chat at Domo with Amazon Bedrock

AWS Machine Learning Blog

SEPTEMBER 9, 2024

Data insights are crucial for businesses to enable data-driven decisions, identify trends, and optimize operations. Generative artificial intelligence (AI) has revolutionized this by allowing users to interact with data through natural language queries, providing instant insights and visualizations without needing technical expertise.

AI

AI AI AWS ML

Life of modern-day alchemists: What does a data scientist do?

Dataconomy

AUGUST 16, 2023

” The answer: they craft predictive models that illuminate the future ( Image credit ) Data collection and cleaning : Data scientists kick off their journey by embarking on a digital excavation, unearthing raw data from the digital landscape. Interprets data to uncover actionable insights guiding business decisions.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

This means that individuals can ask companies to erase their personal data from their systems and from the systems of any third parties with whom the data was shared. Data preparation Before creating a knowledge base using Knowledge Bases for Amazon Bedrock, it’s essential to prepare the data to augment the FM in a RAG implementation.

AWS

AWS Machine Learning Machine Learning Database

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

AWS

AWS Database ETL AI

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

QuickSight connects to your data and combines data from many different sources, such as Amazon S3 and Athena. For our solution, we use Athena as the data source. Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL.

ML

ML ML AWS Database

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

AWS Machine Learning Blog

APRIL 6, 2023

You can now optionally select or deselect columns, join tables by dragging another table into the Drag and drop datasets to join section, or write SQL queries to specify your data slice. For this post, we use all the data in the table. To import the data, choose Import data. Volkan Unsal is a Sr.

Machine Learning

Machine Learning Machine Learning ML ML

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

The rules in this engine were predefined and written in SQL, which aside from posing a challenge to manage, also struggled to cope with the proliferation of data from TR’s various integrated data source. TR customer data is changing at a faster rate than the business rules can evolve to reflect changing customer needs.

AWS

AWS Data Warehouse ML ML

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

Train a recommendation model in SageMaker Studio using training data that was prepared using SageMaker Data Wrangler. The real-time inference call data is first passed to the SageMaker Data Wrangler container in the inference pipeline, where it is preprocessed and passed to the trained model for product recommendation.

ML

ML ML AWS AI

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Visual modeling: Delivers easy-to-use workflows for data scientists to build data preparation and predictive machine learning pipelines that include text analytics, visualizations and a variety of modeling methods. Presto engine: Incorporates the latest performance enhancements to the Presto query engine.

AI

AI AI Machine Learning Machine Learning

Top Ten Power BI Alternatives For Your Data Needs

Pickl AI

NOVEMBER 18, 2024

Challenges Learning Curve : Qlik’s unique Data Analysis approach requires a bit of a learning curve, especially for new users. Data Preparation : Preparing data in Qlik is not as intuitive as other BI tools, which may slow the time to actionable insights.

Power BI

Power BI Tableau Data Analysis Data Analysis

Predicting the Future of Data Science

Pickl AI

DECEMBER 4, 2024

This explosive growth is driven by the increasing volume of data generated daily, with estimates suggesting that by 2025, there will be around 181 zettabytes of data created globally. Understand data structures and explore data warehousing concepts to efficiently manage and retrieve large datasets.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Must-Have Prompt Engineering Skills for 2024

ODSC - Open Data Science

JANUARY 29, 2024

We also examined the results to gain a deeper understanding of why these prompt engineering skills and platforms are in demand for the role of Prompt Engineer, not to mention machine learning and data science roles. Kubernetes: A long-established tool for containerized apps.

Data Science

Data Science Machine Learning Machine Learning Natural Language Processing

Amazon SageMaker Data Wrangler for dimensionality reduction

AWS Machine Learning Blog

APRIL 24, 2023

Data Wrangler simplifies the process of data preparation and feature engineering like data selection, cleansing, exploration, and visualization from a single visual interface. Data Wrangler has more than 300 preconfigured data transformations that can effectively be used in transforming the data.

Data Quality

Data Quality Machine Learning Machine Learning Deep Learning

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Data Preparation: Cleaning, transforming, and preparing data for analysis and modelling. However, a master’s degree or specialised Data Science or Machine Learning courses can give you a competitive edge, offering advanced knowledge and practical experience.

Azure

Azure Data Scientist Data Science Machine Learning

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Tools like Apache NiFi, Talend, and Informatica provide user-friendly interfaces for designing workflows, integrating diverse data sources, and executing ETL processes efficiently. Choosing the right tool based on the organisation’s specific needs, such as data volume and complexity, is vital for optimising ETL efficiency.

ETL

ETL Data Warehouse Data Quality Data Governance

Harnessing Machine Learning on Big Data with PySpark on AWS

ODSC - Open Data Science

AUGUST 9, 2023

For a comprehensive understanding of the practical applications, including a detailed code walkthrough from data preparation to model deployment, please join us at the ODSC APAC conference 2023. We have a number of records, each with A target (or label ) column, dessert, containing a binary input (1.0 if the recipe is a dessert, 0.0

Machine Learning

Machine Learning Machine Learning AWS Big Data

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

AWS Machine Learning Blog

JANUARY 26, 2024

Generative artificial intelligence (AI) applications built around large language models (LLMs) have demonstrated the potential to create and accelerate economic value for businesses. Define strict data ingress and egress rules to help protect against manipulation and exfiltration using VPCs with AWS Network Firewall policies.

AWS

AWS ML ML AI

IBM watsonx Platform: Compliance obligations to controls mapping

IBM Journey to AI blog

OCTOBER 30, 2024

How the watsonx Regulatory Compliance Platform accelerates risk management The watsonx.ai™, watsonx.gov, and watsonx.data™ components of the platform are advanced artificial intelligence (AI) modules that offer a wide range of advance technical features designed to meet the unique needs of the industry.

Machine Learning

Machine Learning Machine Learning ETL AI

Develop a RAG-based application using Amazon Aurora with Amazon Kendra

AWS Machine Learning Blog

JANUARY 28, 2025

RAG retrieves data from a preexisting knowledge base (your data), combines it with the LLMs knowledge, and generates responses with more human-like language. However, in order for generative AI to understand your data, some amount of data preparation is required, which involves a big learning curve.

AWS

AWS Database Clustering Data Preparation

How Formula 1® uses generative AI to accelerate race-day issue resolution

AWS Machine Learning Blog

FEBRUARY 18, 2025

The assistant is connected to internal and external systems, with the capability to query various sources such as SQL databases, Amazon CloudWatch logs, and third-party tools to check the live system health status. Creating ETL pipelines to transform log data Preparing your data to provide quality results is the first step in an AI project.

AWS

AWS Database ETL AI

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

Webinars

Trending Sources

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

Webinars

How Dataiku and Snowflake Strengthen the Modern Data Stack

State of Machine Learning Survey Results Part Two

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

GraphReduce: Using Graphs for Feature Engineering Abstractions

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Use Snowflake as a data source to train ML models with Amazon SageMaker

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

On the implementation of digital tools

AI Development Lifecycle Learnings of What Changed with LLMs

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Introduction to Power BI Datamarts

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Exploring data using AI chat at Domo with Amazon Bedrock

Life of modern-day alchemists: What does a data scientist do?

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Tackling AI’s data challenges with IBM databases on AWS

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

Exploring the AI and data capabilities of watsonx

Top Ten Power BI Alternatives For Your Data Needs

Predicting the Future of Data Science

Must-Have Prompt Engineering Skills for 2024

Amazon SageMaker Data Wrangler for dimensionality reduction

Your Complete Roadmap to Become an Azure Data Scientist

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Harnessing Machine Learning on Big Data with PySpark on AWS

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

IBM watsonx Platform: Compliance obligations to controls mapping

Develop a RAG-based application using Amazon Aurora with Amazon Kendra

How Formula 1® uses generative AI to accelerate race-day issue resolution

Stay Connected