Blog, Data Warehouse and SQL - Data Science Current

Databricks SQL Year in Review (Part II): SQL Programming Features

databricks

JANUARY 31, 2024

Welcome to the blog series covering product advancements in 2023 for Databricks SQL, the serverless data warehouse from Databricks. This is part 2.

SQL

SQL Data Warehouse

Introducing the New SQL Editor

databricks

OCTOBER 14, 2024

Over the last few years, we've seen tremendous growth and adoption of Databricks SQL , our intelligent data warehouse purpose-built on the Data.

SQL

SQL Data Warehouse

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

In the contemporary age of Big Data, Data Warehouse Systems and Data Science Analytics Infrastructures have become an essential component for organizations to store, analyze, and make data-driven decisions. So why using IaC for Cloud Data Infrastructures?

Data Warehouse

Data Warehouse Azure SQL Database

Webinars

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

What’s new with Databricks SQL?

databricks

AUGUST 10, 2023

At this year's Data+AI Summit, Databricks SQL continued to push the boundaries of what a data warehouse can be, leveraging AI across the.

SQL

SQL Data Warehouse AI AI

Data lakes vs. data warehouses: Decoding the data storage debate

Data Science Dojo

JANUARY 12, 2023

When it comes to data, there are two main types: data lakes and data warehouses. What is a data lake? An enormous amount of raw data is stored in its original format in a data lake until it is required for analytics applications. Which one is right for your business? Let’s take a closer look.

Data Lakes

Data Lakes Data Warehouse Hadoop Machine Learning

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Built into Data Wrangler, is the Chat for data prep option, which allows you to use natural language to explore, visualize, and transform your data in a conversational interface. Amazon QuickSight powers data-driven organizations with unified (BI) at hyperscale. A provisioned or serverless Amazon Redshift data warehouse.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

How To Migrate Your Oracle PL/SQL Code to Databricks Lakehouse Platform

databricks

FEBRUARY 12, 2023

Oracle is a well-known technology for hosting Enterprise Data Warehouse solutions. However, many customers like Optum and the U.S. Citizenship and Immigration Services.

Data Warehouse

Data Warehouse SQL

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis.

ETL

ETL Data Warehouse Analytics Analytics

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Enter AnalyticsCreator AnalyticsCreator, a powerful tool for data management, brings a new level of efficiency and reliability to the CI/CD process. It offers full BI-Stack Automation, from source to data warehouse through to frontend. It supports a holistic data model, allowing for rapid prototyping of various models.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Understanding Caching in Databricks SQL: UI, Result, and Disk Caches

databricks

MAY 3, 2023

Caching is an essential technique for improving the performance of data warehouse systems by avoiding the need to recompute or fetch the same.

Data Warehouse

Data Warehouse SQL

Databricks SQL Year in Review (Part III): User Experience

databricks

MARCH 6, 2024

This blog continues our series looking at advancements from 2023 to the serverless data warehouse Databricks SQL. The best data warehouse is.

SQL

SQL Data Warehouse

Dedicated SQL pools in Azure Synapse analytics: How to optimize performance and cut costs

Data Science Dojo

FEBRUARY 1, 2023

Introduction Dedicated SQL pools offer fast and reliable data import and analysis, allowing businesses to access accurate insights while optimizing performance and reducing costs. DWUs (Data Warehouse Units) can customize resources and optimize performance and costs.

Azure

Azure SQL Analytics Analytics

A guide to Databricks SQL and Data Warehousing talks at Data + AI Summit 2023

databricks

JUNE 14, 2023

It's been only 18 months since we announced Databricks SQL general availability - the serverless data warehouse on the Lakehouse - and we.

SQL

SQL Data Warehouse AI AI

Mastering Data Normalization: A Comprehensive Guide

Data Science Dojo

MARCH 27, 2025

Thats where data normalization comes in. Its a structured process that organizes data to reduce redundancy and improve efficiency. Whether you’re working with relational databases, data warehouses , or machine learning pipelines, normalization helps maintain clean, accurate, and optimized datasets. Simple, right?

Database

Database Data Warehouse Machine Learning Machine Learning

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them. They then use SQL to explore, analyze, visualize, and integrate data from various sources before using it in their ML training and inference.

SQL

SQL AWS Database Data Scientist

CI/CD für Datenpipelines – Ein Game-Changer mit AnalyticsCreator

Data Science Blog

JULY 20, 2024

Es bietet vollständige Automatisierung des BI-Stacks und unterstützt ein breites Spektrum an Data Warehouses, analytischen Datenbanken und Frontends. Automatisierung: Erstellt SQL-Code, DACPAC-Dateien, SSIS-Pakete, Data Factory-ARM-Vorlagen und XMLA-Dateien. Data Lakes: Unterstützt MS Azure Blob Storage.

Azure

Azure SQL Power BI Data Lakes

Was ist ein Data Lakehouse?

Data Science Blog

MAY 15, 2023

tl;dr Ein Data Lakehouse ist eine moderne Datenarchitektur, die die Vorteile eines Data Lake und eines Data Warehouse kombiniert. Organisationen können je nach ihren spezifischen Bedürfnissen und Anforderungen zwischen einem Data Warehouse und einem Data Lakehouse wählen.

Data Warehouse

Data Warehouse Data Lakes Azure AWS

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

In this blog post, we will be discussing 7 tips that will help you become a successful data engineer and take your career to the next level. Learn SQL: As a data engineer, you will be working with large amounts of data, and SQL is the most commonly used language for interacting with databases.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Organizations are building data-driven applications to guide business decisions, improve agility, and drive innovation. Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. The following screenshot shows an example of the unified notebook page.

SQL

SQL AWS Data Lakes AI

Why companies need to accelerate data warehousing solution modernization

IBM Journey to AI blog

APRIL 24, 2023

Data is reported from one central repository, enabling management to draw more meaningful business insights and make faster, better decisions. By running reports on historical data, a data warehouse can clarify what systems and processes are working and what methods need improvement.

Data Warehouse

Data Warehouse Data Lakes Database Big Data

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

Pickl AI

NOVEMBER 15, 2023

Discover the nuanced dissimilarities between Data Lakes and Data Warehouses. Data management in the digital age has become a crucial aspect of businesses, and two prominent concepts in this realm are Data Lakes and Data Warehouses. It acts as a repository for storing all the data.

Data Lakes

Data Lakes Data Warehouse Database ETL

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

This tool democratizes data access across the organization, enabling even nontechnical users to gain valuable insights. A standout application is the SQL-to-natural language capability, which translates complex SQL queries into plain English and vice versa, bridging the gap between technical and business teams.

AWS

AWS Data Governance Data Silos SQL

How to Use Custom SQL and CSVs in Sigma Computing

phData

JULY 10, 2024

Sigma Computing , a cloud-based analytics platform, helps data analysts and business professionals maximize their data with collaborative and scalable analytics. One of Sigma’s key features is its support for custom SQL queries and CSV file uploads. These tools allow users to handle more advanced data tasks and analyses.

SQL

SQL Data Warehouse Analytics Analytics

Understanding ETL Tools as a Data-Centric Organization

Smart Data Collective

SEPTEMBER 8, 2021

The ETL process is defined as the movement of data from its source to destination storage (typically a Data Warehouse) for future use in reports and analyzes. The data is initially extracted from a vast array of sources before transforming and converting it to a specific format based on business requirements. Conclusion.

ETL

ETL Hadoop Data Warehouse Data Pipeline

Is Google BigQuery The Future Of Big Data Analytics?

Smart Data Collective

JUNE 6, 2021

Google BigQuery is a service (within the Google Cloud platform (GCP)) implemented to collect and analyze big data (also known as a data warehouse). It was released in 2011 and praised for its serverless architecture that enables highly scalable and fast-provided structured query language (SQL) analytics. What is Big Data?”

Big Data Analytics

Big Data Analytics Big Data Analytics Big Data Big Data

Connecting Amazon Redshift and RStudio on Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 29, 2022

Many of the RStudio on SageMaker users are also users of Amazon Redshift , a fully managed, petabyte-scale, massively parallel data warehouse for data storage and analytical workloads. It makes it fast, simple, and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools.

AWS

AWS Machine Learning Machine Learning Natural Language Processing

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

AWS Machine Learning Blog

AUGUST 20, 2024

Natural language is ambiguous and imprecise, whereas data adheres to rigid schemas. For example, SQL queries can be complex and unintuitive for non-technical users. Handling complex queries involving multiple tables, joins, and aggregations makes it difficult to interpret user intent and translate it into correct SQL operations.

SQL

SQL AWS Database Natural Language Processing

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

Snowflake excels in efficient data storage and governance, while Dataiku provides the tooling to operationalize advanced analytics and machine learning models. Together they create a powerful, flexible, and scalable foundation for modern data applications.

Machine Learning

Machine Learning Machine Learning Data Science ML

AWS re:Invent 2024 Highlights: Top takeaways from Swami Sivasubramanian to help customers manage generative AI at scale

AWS Machine Learning Blog

DECEMBER 16, 2024

Now they can access databases and data warehouses, as well as unstructured business data, like emails, reports, charts, graphs, and images. Access all your data whether its stored in data lakes, data warehouses, third-party or federated data sources.

AWS

AWS AI AI Data Warehouse

Process Mining – Ist Celonis wirklich so gut? Ein Praxisbericht.

Data Science Blog

SEPTEMBER 3, 2024

Während vor zehn Jahren ich für Celonis noch eine Installation erst einer MS SQL Server Datenbank, etwas später dann bevorzugt eine SAP Hana Datenbank auf einem on-prem Server beim Kunden voraussetzend installieren musste, bevor ich dann zur Installation der Celonis ServerAnwendung selbst kam, ist es heute eine 100% externe Cloud-Lösung.

Data Science

Data Science Power BI Azure Data Warehouse

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

The blog post explains how the Internal Cloud Analytics team leveraged cloud resources like Code-Engine to improve, refine, and scale the data pipelines. Background One of the Analytics teams tasks is to load data from multiple sources and unify it into a data warehouse. Thus, it has only a minimal footprint.

ETL

ETL Data Pipeline Database Data Warehouse

Celebrating 40 years of Db2: Running the world’s mission critical workloads

IBM Journey to AI blog

SEPTEMBER 11, 2023

Codd published his famous paper “ A Relational Model of Data for Large Shared Data Banks.” Boyce to create Structured Query Language (SQL). Developers can leverage features like REST APIs, JSON support and enhanced SQL compatibility to easily build cloud-native applications. Chamberlin and Raymond F.

Database

Database SQL Data Warehouse Machine Learning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. You can use query_string to filter your dataset by SQL and unload it to Amazon S3. If you’re familiar with SageMaker and writing Spark code, option B could be your choice.

ML

ML ML AWS Data Warehouse

IBM to help businesses scale AI workloads, for all data, anywhere

IBM Journey to AI blog

MAY 9, 2023

Watsonx.data will allow users to access their data through a single point of entry and run multiple fit-for-purpose query engines across IT environments. Through workload optimization an organization can reduce data warehouse costs by up to 50 percent by augmenting with this solution. [1]

Data Warehouse

Data Warehouse AWS AI AI

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

Fivetran, a cloud-based automated data integration platform, has emerged as a leading choice among businesses looking for an easy and cost-effective way to unify their data from various sources. Fivetran is used by businesses to centralize data from various sources into a single, comprehensive data warehouse.

Data Warehouse

Data Warehouse Data Engineering Data Engineering Data Engineering

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

In our previous blog, Top 5 Fivetran Connectors for Financial Services , we explored Fivetran’s capabilities that address the data integration needs of the finance industry. Now, let’s cover the healthcare industry, which also has a surging demand for data and analytics, along with the underlying processes to make it happen.

SQL

SQL Data Warehouse Azure Cloud Data

Learn the Differences Between ETL and ELT

Pickl AI

OCTOBER 6, 2024

Summary: This blog explores the key differences between ETL and ELT, detailing their processes, advantages, and disadvantages. Understanding these methods helps organizations optimize their data workflows for better decision-making. It ensures the data is accurate and reliable, leading to better decision-making.

ETL

ETL Data Warehouse Data Quality Data Lakes

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

AWS Machine Learning Blog

JUNE 13, 2023

The natural language capabilities allow non-technical users to query data through conversational English rather than complex SQL. The AI and language models must identify the appropriate data sources, generate effective SQL queries, and produce coherent responses with embedded results at scale.

Database

Database SQL AWS AI

Why Open Table Format Architecture is Essential for Modern Data Systems

phData

NOVEMBER 8, 2024

Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms. In this blog, we will discuss: What is the Open Table format (OTF)? Delta Lake became popular for making data lakes more reliable and easy to manage.

Data Lakes

Data Lakes Data Warehouse Database Azure

Where Does Fivetran Fit into The Modern Data Stack?

phData

JULY 17, 2023

Over the past few decades, the corporate data landscape has changed significantly. The shift from on-premise databases and spreadsheets to the modern era of cloud data warehouses and AI/ LLMs has transformed what businesses can do with data. Designed to cheaply and efficiently process large quantities of data.

Data Warehouse

Data Warehouse Data Pipeline Cloud Data ETL

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Hacker News

AUGUST 27, 2024

Policy Zones has been built into different Meta systems, including: Function-based systems that load, process, and propagate data through stacks of function calls in different programming languages. Batch-processing systems that process data rows in batch (mainly via SQL ). When data flows across different systems (e.g.,

Data Warehouse

Data Warehouse Database SQL Machine Learning

Unleashing the power of Presto: The Uber case study

IBM Journey to AI blog

SEPTEMBER 25, 2023

Every day, millions of riders use the Uber app, unwittingly contributing to a complex web of data-driven decisions. This blog takes you on a journey into the world of Uber’s analytics and the critical role that Presto, the open source SQL query engine, plays in driving their success. What is Presto?

Data Lakes

Data Lakes Analytics Analytics Clustering

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

In this blog, we will cover the best practices for developing jobs in Matillion, an ETL/ELT tool built specifically for cloud database platforms. The blog will be divided into three broad sections: Design, SDLC, and Security, each with its best practices. What Are Matillion Jobs and Why Do They Matter? Below are the best practices.

ETL

ETL Data Warehouse SQL Database

Databricks SQL Year in Review (Part II): SQL Programming Features

Introducing the New SQL Editor

Webinars

Trending Sources

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Webinars

What’s new with Databricks SQL?

Data lakes vs. data warehouses: Decoding the data storage debate

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

How To Migrate Your Oracle PL/SQL Code to Databricks Lakehouse Platform

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Understanding Caching in Databricks SQL: UI, Result, and Disk Caches

Top 20 Data Warehouse Interview Questions You Must Know in 2025

Databricks SQL Year in Review (Part III): User Experience

Dedicated SQL pools in Azure Synapse analytics: How to optimize performance and cut costs

A guide to Databricks SQL and Data Warehousing talks at Data + AI Summit 2023

Mastering Data Normalization: A Comprehensive Guide

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

CI/CD für Datenpipelines – Ein Game-Changer mit AnalyticsCreator

Was ist ein Data Lakehouse?

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Why companies need to accelerate data warehousing solution modernization

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

Shaping the future: OMRON’s data-driven journey with AWS

How to Use Custom SQL and CSVs in Sigma Computing

Understanding ETL Tools as a Data-Centric Organization

Is Google BigQuery The Future Of Big Data Analytics?

Connecting Amazon Redshift and RStudio on Amazon SageMaker

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

How Dataiku and Snowflake Strengthen the Modern Data Stack

AWS re:Invent 2024 Highlights: Top takeaways from Swami Sivasubramanian to help customers manage generative AI at scale

Process Mining – Ist Celonis wirklich so gut? Ein Praxisbericht.

Serverless High Volume ETL data processing on Code Engine

Celebrating 40 years of Db2: Running the world’s mission critical workloads

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

IBM to help businesses scale AI workloads, for all data, anywhere

What Is Fivetran and How Much Does It Cost?

Top 5 Fivetran Connectors for Healthcare

Learn the Differences Between ETL and ELT

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

Why Open Table Format Architecture is Essential for Modern Data Systems

Where Does Fivetran Fit into The Modern Data Stack?

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Unleashing the power of Presto: The Uber case study

Best Practices When Developing Matillion Jobs

Stay Connected