Data Warehouse, ML and SQL - Data Science Current

10 essential SQL concepts for data scientists: Tips and examples

Data Science Dojo

APRIL 25, 2023

SQL (Structured Query Language) is an important tool for data scientists. It is a programming language used to manipulate data stored in relational databases. Mastering SQL concepts allows a data scientist to quickly analyze large amounts of data and make decisions based on their findings.

Data Scientist

Data Scientist SQL Machine Learning Machine Learning

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others. A provisioned or serverless Amazon Redshift data warehouse.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Mastering Data Normalization: A Comprehensive Guide

Data Science Dojo

MARCH 27, 2025

Thats where data normalization comes in. Its a structured process that organizes data to reduce redundancy and improve efficiency. Whether you’re working with relational databases, data warehouses , or machine learning pipelines, normalization helps maintain clean, accurate, and optimized datasets. Simple, right?

Database

Database Data Warehouse Machine Learning Machine Learning

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Flipboard

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

AWS

AWS Data Warehouse ETL SQL

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them.

SQL

SQL AWS Database Data Scientist

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. You can use query_string to filter your dataset by SQL and unload it to Amazon S3.

ML

ML ML AWS Data Warehouse

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Organizations are building data-driven applications to guide business decisions, improve agility, and drive innovation. Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services.

SQL

SQL AWS Data Lakes AI

dbt: Codify and Automate Transformation of Data in Your Data Warehouse

Mlearning.ai

OCTOBER 22, 2023

dbt helps transform data in your warehouse all through SQL following software engineering best practices. Continue reading on MLearning.ai »

Data Warehouse

Data Warehouse SQL Data Engineering Data Engineering

How Will The Cloud Impact Data Warehousing Technologies?

Smart Data Collective

APRIL 8, 2020

Dating back to the 1970s, the data warehousing market emerged when computer scientist Bill Inmon first coined the term ‘data warehouse’. Created as on-premise servers, the early data warehouses were built to perform on just a gigabyte scale. The post How Will The Cloud Impact Data Warehousing Technologies?

Data Warehouse

Data Warehouse Big Data Big Data Big Data Analytics

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

OMRONs data strategyrepresented on ODAPalso allowed the organization to unlock generative AI use cases focused on tangible business outcomes and enhanced productivity. This tool democratizes data access across the organization, enabling even nontechnical users to gain valuable insights.

AWS

AWS Data Governance Data Silos SQL

Why companies need to accelerate data warehousing solution modernization

IBM Journey to AI blog

APRIL 24, 2023

Data is reported from one central repository, enabling management to draw more meaningful business insights and make faster, better decisions. By running reports on historical data, a data warehouse can clarify what systems and processes are working and what methods need improvement.

Data Warehouse

Data Warehouse Data Lakes Database Big Data

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

With our cleaned data from step one, we can now join our vehicle sensor measurements with warranty claim data to explore any correlations using data science. This capability can reveal hidden patterns and optimize data for improved model performance. Dataiku and Snowflake: A Good Combo?

Machine Learning

Machine Learning Machine Learning Data Science ML

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Flipboard

DECEMBER 6, 2023

In this post, we discuss a Q&A bot use case that Q4 has implemented, the challenges that numerical and structured datasets presented, and how Q4 concluded that using SQL may be a viable solution. RAG with semantic search – Conventional RAG with semantic search was the last step before moving to SQL generation.

SQL

SQL Database AWS Machine Learning

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Secure a generative AI assistant with OWASP Top 10 mitigation

Flipboard

JANUARY 24, 2025

RAG data store The Retrieval Augmented Generation (RAG) data store delivers up-to-date, precise, and access-controlled knowledge from various data sources such as data warehouses, databases, and other software as a service (SaaS) applications through data connectors.

AWS

AWS AI AI Data Warehouse

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

AWS Machine Learning Blog

AUGUST 20, 2024

Natural language is ambiguous and imprecise, whereas data adheres to rigid schemas. For example, SQL queries can be complex and unintuitive for non-technical users. Handling complex queries involving multiple tables, joins, and aggregations makes it difficult to interpret user intent and translate it into correct SQL operations.

SQL

SQL AWS Database Natural Language Processing

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

The rules in this engine were predefined and written in SQL, which aside from posing a challenge to manage, also struggled to cope with the proliferation of data from TR’s various integrated data source. TR customer data is changing at a faster rate than the business rules can evolve to reflect changing customer needs.

AWS

AWS Data Warehouse ML ML

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

Amazon Lookout for Metrics is a fully managed service that uses machine learning (ML) to detect anomalies in virtually any time-series business or operational metrics—such as revenue performance, purchase transactions, and customer acquisition and retention rates—with no ML experience required. To learn more, see the documentation.

AWS

AWS ML ML Data Quality

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment.

AWS

AWS Machine Learning Machine Learning ML

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Its goal is to help with a quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Apache Superset GitHub | Website Apache Superset is a must-try project for any ML engineer, data scientist, or data analyst.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

AWS Machine Learning Blog

JUNE 13, 2023

The natural language capabilities allow non-technical users to query data through conversational English rather than complex SQL. The AI and language models must identify the appropriate data sources, generate effective SQL queries, and produce coherent responses with embedded results at scale.

Database

Database SQL AWS AI

Connecting Amazon Redshift and RStudio on Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 29, 2022

You can quickly launch the familiar RStudio IDE and dial up and down the underlying compute resources without interrupting your work, making it easy to build machine learning (ML) and analytics solutions in R at scale. Loading data in Amazon Redshift Serverless. Create a SQL editor tab and be sure the sagemaker database is selected.

AWS

AWS Machine Learning Machine Learning Natural Language Processing

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

JANUARY 26, 2024

Luckily, we have tried and trusted tools and architectural patterns that provide a blueprint for reliable ML systems. In this article, I’ll introduce you to a unified architecture for ML systems built around the idea of FTI pipelines and a feature store as the central component. But what is an ML pipeline?

Machine Learning

Machine Learning Machine Learning ML ML

Celebrating 40 years of Db2: Running the world’s mission critical workloads

IBM Journey to AI blog

SEPTEMBER 11, 2023

Codd published his famous paper “ A Relational Model of Data for Large Shared Data Banks.” Boyce to create Structured Query Language (SQL). Developers can leverage features like REST APIs, JSON support and enhanced SQL compatibility to easily build cloud-native applications. Chamberlin and Raymond F.

Database

Database SQL Data Warehouse Machine Learning

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

The ultimate need for vast storage spaces manifests in data warehouses: specialized systems that aggregate data coming from numerous sources for centralized management and consistency. In this article, you’ll discover what a Snowflake data warehouse is, its pros and cons, and how to employ it efficiently.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

Build generative AI chatbots using prompt engineering with Amazon Redshift and Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 14, 2024

Amazon Redshift has announced a feature called Amazon Redshift ML that makes it straightforward for data analysts and database developers to create, train, and apply machine learning (ML) models using familiar SQL commands in Redshift data warehouses.

AWS

AWS AI AI Database

Google launches Differential Privacy for BigQuery

Mlearning.ai

JUNE 19, 2023

How you now anonymize Data more easily Photo by Dušan veverkolog on Unsplash Google has just announced the public preview of BigQuery differential privacy with SQL building blocks. You can use these functions to anonymize their data. Hence, with this feature you can also ensure that data is shared there securely.

Data Lakes

Data Lakes Data Warehouse SQL ML

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

AWS Machine Learning Blog

JULY 17, 2023

With cloud computing, as compute power and data became more available, machine learning (ML) is now making an impact across every industry and is a core part of every business and industry. Amazon SageMaker Studio is the first fully integrated ML development environment (IDE) with a web-based visual interface.

Clustering

Clustering AWS ML ML

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

AWS Machine Learning Blog

JUNE 5, 2023

Data is the foundation for machine learning (ML) algorithms. One of the most common formats for storing large amounts of data is Apache Parquet due to its compact and highly efficient format. Athena allows applications to use standard SQL to query massive amounts of data on an S3 data lake.

Machine Learning

Machine Learning Machine Learning AWS Data Lakes

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Hacker News

AUGUST 27, 2024

Policy Zones has been built into different Meta systems, including: Function-based systems that load, process, and propagate data through stacks of function calls in different programming languages. Batch-processing systems that process data rows in batch (mainly via SQL ). When data flows across different systems (e.g.,

Data Warehouse

Data Warehouse Database SQL Machine Learning

Extract non-PHI data from Amazon HealthLake, reduce complexity, and increase cost efficiency with Amazon Athena and Amazon SageMaker Canvas

AWS Machine Learning Blog

FEBRUARY 28, 2023

In today’s highly competitive market, performing data analytics using machine learning (ML) models has become a necessity for organizations. It enables them to unlock the value of their data, identify trends, patterns, and predictions, and differentiate themselves from their competitors.

ML

ML ML AWS Machine Learning

Microsoft secures your place in the world of business

Dataconomy

SEPTEMBER 27, 2023

The Microsoft Certified Solutions Associate and Microsoft Certified Solutions Expert certifications cover a wide range of topics related to Microsoft’s technology suite, including Windows operating systems, Azure cloud computing, Office productivity software, Visual Studio programming tools, and SQL Server databases.

Database Administration

Database Administration Database Azure Cloud Computing

How to Prepare Data for Use in Machine Learning Models

phData

JUNE 18, 2024

Machine learning (ML) is only possible because of all the data we collect. However, with data coming from so many different sources, it doesn’t always come in a format that’s easy for ML models to understand. Before you can take advantage of everything ML offers, much prep work is involved.

Machine Learning

Machine Learning Machine Learning ML ML

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

Dolt LakeFS Delta Lake Pachyderm Git-like versioning Database tool Data lake Data pipelines Experiment tracking Integration with cloud platforms Integrations with ML tools Examples of data version control tools in ML DVC Data Version Control DVC is a version control system for data and machine learning teams.

ML

ML ML Data Lakes Machine Learning

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Introduction ETL plays a crucial role in Data Management. This process enables organisations to gather data from various sources, transform it into a usable format, and load it into data warehouses or databases for analysis. Loading The transformed data is loaded into the target destination, such as a data warehouse.

ETL

ETL Data Warehouse Data Quality Data Governance

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. It is supported by querying, governance, and open data formats to access and share data across the hybrid cloud. IBM watsonx.ai

AI

AI AI Machine Learning Machine Learning

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

AWS Machine Learning Blog

JUNE 25, 2024

In addition, the generative business intelligence (BI) capabilities of QuickSight allow you to ask questions about customer feedback using natural language, without the need to write SQL queries or learn a BI tool. For more information, see Customize models in Amazon Bedrock with your own data using fine-tuning and continued pre-training.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

How Snowpark & Hex Supports Data Science

phData

MARCH 31, 2023

Snowpark brings the full power of SnowSQL running in the Snowflake Data Cloud as well as the flexibility to bring your favorite transformation and machine learning packages to the data. Since the most popular language for data science and ML is Python, the full Snowpark API is available in Python. Can’t wait?

Data Science

Data Science Machine Learning Machine Learning Python

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Why Migrate to a Modern Data Stack? With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. Data teams can focus on delivering higher-value data tasks with better organizational visibility.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Nevertheless, many data scientists will agree that they can be really valuable – if used well. And that’s what we’re going to focus on in this article, which is the second in my series on Software Patterns for Data Science & ML Engineering. in a pandas DataFrame) but in the company’s data warehouse (e.g.,

SQL

SQL Database Data Scientist Python

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI

JANUARY 24, 2023

Unfortunately accessing data across various locations and file types and then operationalizing that data for AI usage has traditionally been a painfully manual, time-consuming, and costly process. Ahmad Khan, Head of AI/ML Strategy at Snowflake, discusses the challenges of operationalizing ML in a recent talk.

AI

AI AI ML ML

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI

JANUARY 24, 2023

Unfortunately accessing data across various locations and file types and then operationalizing that data for AI usage has traditionally been a painfully manual, time-consuming, and costly process. Ahmad Khan, Head of AI/ML Strategy at Snowflake, discusses the challenges of operationalizing ML in a recent talk.

AI

AI AI ML ML

10 essential SQL concepts for data scientists: Tips and examples

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Trending Sources

Mastering Data Normalization: A Comprehensive Guide

Webinars

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

dbt: Codify and Automate Transformation of Data in Your Data Warehouse

How Will The Cloud Impact Data Warehousing Technologies?

Shaping the future: OMRON’s data-driven journey with AWS

Why companies need to accelerate data warehousing solution modernization

How Dataiku and Snowflake Strengthen the Modern Data Stack

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Secure a generative AI assistant with OWASP Top 10 mitigation

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Transitioning off Amazon Lookout for Metrics

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

11 Open Source Data Exploration Tools You Need to Know in 2023

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

Connecting Amazon Redshift and RStudio on Amazon SageMaker

How to Build Machine Learning Systems With a Feature Store

Celebrating 40 years of Db2: Running the world’s mission critical workloads

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Build generative AI chatbots using prompt engineering with Amazon Redshift and Amazon Bedrock

Google launches Differential Privacy for BigQuery

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Extract non-PHI data from Amazon HealthLake, reduce complexity, and increase cost efficiency with Amazon Athena and Amazon SageMaker Canvas

Microsoft secures your place in the world of business

How to Prepare Data for Use in Machine Learning Models

How to Version Control Data in ML for Various Data Sources

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Exploring the AI and data capabilities of watsonx

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

How Snowpark & Hex Supports Data Science

Top 20 most-asked questions about Amazon RDS for Db2 answered

The Ultimate Modern Data Stack Migration Guide

How to Use Exploratory Notebooks [Best Practices]

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Stay Connected