Data Warehouse, Document and SQL - Data Science Current

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

In the contemporary age of Big Data, Data Warehouse Systems and Data Science Analytics Infrastructures have become an essential component for organizations to store, analyze, and make data-driven decisions. So why using IaC for Cloud Data Infrastructures?

Data Warehouse

Data Warehouse Azure SQL Database

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis.

ETL

ETL Data Warehouse Analytics Analytics

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

This tool democratizes data access across the organization, enabling even nontechnical users to gain valuable insights. A standout application is the SQL-to-natural language capability, which translates complex SQL queries into plain English and vice versa, bridging the gap between technical and business teams.

AWS

AWS Data Governance Data Silos SQL

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

A Few Proven Suggestions for Handling Large Data Sets

Smart Data Collective

SEPTEMBER 26, 2021

There’s not much value in holding on to raw data without putting it to good use, yet as the cost of storage continues to decrease, organizations find it useful to collect raw data for additional processing. The raw data can be fed into a database or data warehouse. The central concept is the idea of a document.

Database

Database Data Visualization Big Data Big Data

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

The extraction of raw data, transforming to a suitable format for business needs, and loading into a data warehouse. Data transformation. This process helps to transform raw data into clean data that can be analysed and aggregated. Data analytics and visualisation.

Data Warehouse

Data Warehouse Azure SQL ETL

Is Google BigQuery The Future Of Big Data Analytics?

Smart Data Collective

JUNE 6, 2021

Other uses may include: Maintenance checks Guides, resources, training and tutorials (all available in BigQuery documentation ) Employee efficiency reviews Machine learning Innovation advancements through the examination of trends. (1). Big data analytics advantages. What is Big Data?” References.

Big Data Analytics

Big Data Analytics Big Data Analytics Big Data Big Data

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

A Matillion pipeline is a collection of jobs that extract, load, and transform (ETL/ELT) data from various sources into a target system, such as a cloud data warehouse like Snowflake. Intuitive Workflow Design Workflows should be easy to follow and visually organized, much like clean, well-structured SQL or Python code.

AI

AI AI SQL ETL

Top 10 Big Data CRM Tools To Increase Business Sales

Smart Data Collective

JULY 20, 2021

As a standalone product, this software helps professionals with rich sets of spreadsheets, charts and documents. Quip integration tool will allow teams to improve collaborations, export and import live data, enhanced visibility and outstanding device support. This tool will help you to sync and store data from multiple sources quickly.

Big Data

Big Data Big Data ETL Analytics

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

The blog post explains how the Internal Cloud Analytics team leveraged cloud resources like Code-Engine to improve, refine, and scale the data pipelines. Background One of the Analytics teams tasks is to load data from multiple sources and unify it into a data warehouse. Thus, it has only a minimal footprint.

ETL

ETL Data Pipeline Database Data Warehouse

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

For our hypothetical car company, we will use Dataiku’s Answers application to create a personalized customer service chatbot that can pull data from warranty contracts, car spec manuals, and customer history to respond to inquiries. Dataiku and Snowflake: A Good Combo?

Machine Learning

Machine Learning Machine Learning Data Science ML

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

Examples of data sources and destinations include: Shopify Google Analytics Snowflake Data Cloud Oracle Salesforce Fivetran’s mission is to, “make access to data as easy as electricity” – so for the last 10 years, they have developed their platform into a leader in the cloud-based ELT market. What is Fivetran Used For?

Data Warehouse

Data Warehouse Data Engineering Data Engineer Data Engineering

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Great Expectations GitHub | Website Great Expectations (GX) helps data teams build a shared understanding of their data through quality testing, documentation, and profiling. With Great Expectations , data teams can express what they “expect” from their data using simple assertions.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

How to Split Text For Vector Embeddings in Snowflake

phData

NOVEMBER 28, 2024

“ Vector Databases are completely different from your cloud data warehouse.” – You might have heard that statement if you are involved in creating vector embeddings for your RAG-based Gen AI applications. When documents are split into smaller chunks, search systems can find relevant sections more precisely and quickly.

Python

Python Database SQL Machine Learning

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

To start using OpenSearch for anomaly detection you first must index your data into OpenSearch , from there you can enable anomaly detection in OpenSearch Dashboards. To learn more, see the documentation. To learn more, see the documentation. To learn more, see the documentation.

AWS

AWS ML ML Data Quality

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

By 2025, global data volumes are expected to reach 181 zettabytes, according to IDC. To harness this data effectively, businesses rely on ETL (Extract, Transform, Load) tools to extract, transform, and load data into centralized systems like data warehouses.

ETL

ETL Data Warehouse AWS Business Intelligence

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Introduction ETL plays a crucial role in Data Management. This process enables organisations to gather data from various sources, transform it into a usable format, and load it into data warehouses or databases for analysis. Loading The transformed data is loaded into the target destination, such as a data warehouse.

ETL

ETL Data Warehouse Data Quality Data Governance

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

The ultimate need for vast storage spaces manifests in data warehouses: specialized systems that aggregate data coming from numerous sources for centralized management and consistency. In this article, you’ll discover what a Snowflake data warehouse is, its pros and cons, and how to employ it efficiently.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

Why Upgrade to dbt Cloud over dbt Core?

phData

OCTOBER 12, 2022

It comes with a rather lightweight intellisense, and highlights for both SQL and Jinja use. The real power is the ability to run your models and view the outputs, or even have your SQL compiled to verify that your Jinja or SQL compiles into the correct model. Our team of data experts are happy to assist. Reach out today!

SQL

SQL Data Warehouse Data Visualization Cloud Data

Build generative AI chatbots using prompt engineering with Amazon Redshift and Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 14, 2024

Amazon Redshift has announced a feature called Amazon Redshift ML that makes it straightforward for data analysts and database developers to create, train, and apply machine learning (ML) models using familiar SQL commands in Redshift data warehouses.

AWS

AWS AI AI Database

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

By incorporating metadata into the data model, users can easily discover, understand, and interpret the data stored in the lake. With the amounts of data involved, this can be crucial to utilizing a data lake effectively. Avro and Parquet File Formats Avro and Parquet are file formats commonly used in data lakes.

Data Lakes

Data Lakes Data Models Data Modeling Data Warehouse

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

The modern data stack is a combination of various software tools used to collect, process, and store data on a well-integrated cloud-based data platform. It is known to have benefits in handling data due to its robustness, speed, and scalability. A typical modern data stack consists of the following: A data warehouse.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

Database Activity Monitoring – A Security Investment That Pays Off

Smart Data Collective

FEBRUARY 20, 2022

In addition, well-known products boast a lot of implementations and use cases that are comprehensively reflected in the documentation. Nowadays, DAM systems only scarcely cover the segment of SQL databases that are widely represented in microservices architectures. Stopping insiders in their tracks.

Database

Database Machine Learning Machine Learning Data Warehouse

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Oracle – The Oracle connector, a database-type connector, enables real-time data transfer of large volumes of data from on-premises or cloud sources to the destination of choice, such as a cloud data lake or data warehouse. File – Fivetran offers several options to sync files to your destination.

SQL

SQL Data Warehouse Azure Cloud Data

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly. It promotes a disciplined approach to data modeling, making it easier to ensure data quality and consistency across the ML pipelines.

AWS

AWS Machine Learning Machine Learning ML

Why Migrate From Teradata to Snowflake

phData

MAY 4, 2023

To date, the company’s data warehousing solutions are largely built from the same template used in 1979. In short, they are still the model of multiple processors and massive disk storage with data warehouse software on the top layer managing it all.

SQL

SQL Data Warehouse Azure Big Data

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

Data Vault - Data Lifecycle Architecturally, let’s understand the data lifecycle in the data vault into the following layers, which play a key role in choosing the right pattern and tools to implement. Data Acquisition: Extracting data from source systems and making it accessible.

SQL

SQL Data Observability Data Quality Data Pipeline

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

References : Links to internal or external documentation with background information or specific information used within the analysis presented in the notebook. Data to explore: Outline the tables or datasets you’re exploring/analyzing and reference their sources or link their data catalog entries. documentation.

SQL

SQL Database Data Scientist Python

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. First up, let’s dive into the foundation of every Modern Data Stack, a cloud-based data warehouse.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

These encoder-only architecture models are fast and effective for many enterprise NLP tasks, such as classifying customer feedback and extracting information from large documents. While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases.

AI

AI AI Machine Learning Machine Learning

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

AWS Machine Learning Blog

JULY 17, 2023

Amazon Redshift is a fully managed, fast, secure, and scalable cloud data warehouse. Organizations often want to use SageMaker Studio to get predictions from data stored in a data warehouse such as Amazon Redshift. This should return the records successfully for further data processing and analysis.

Clustering

Clustering AWS ML ML

How to Better Plan Your Snowflake Migration

phData

SEPTEMBER 26, 2023

A common problem solved by phData is the migration from an existing data platform to the Snowflake Data Cloud , in the best possible manner. The necessary access is granted so data flows without issue. SQL Server Agent jobs). Either way, it’s important to understand what data is transformed, and how so.

SQL

SQL Database ETL Data Modeling

Exploring the fundamentals of online transaction processing databases

Dataconomy

APRIL 27, 2023

They are also designed to handle concurrent access by multiple users and applications, while ensuring data integrity and transactional consistency. Examples of OLTP databases include Oracle Database, Microsoft SQL Server, and MySQL. An OLAP database may also be organized as a data warehouse.

Database

Database Data Scientist Data Mining Data Mining

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Also Read: Top 10 Data Science tools for 2024. It is a process for moving and managing data from various sources to a central data warehouse. This process ensures that data is accurate, consistent, and usable for analysis and reporting. This process helps organisations manage large volumes of data efficiently.

ETL

ETL Data Quality Data Pipeline Data Warehouse

Hierarchies in Dimensional Modelling

Pickl AI

AUGUST 9, 2024

Document Hierarchy Structures Maintain thorough documentation of hierarchy designs, including definitions, relationships, and data sources. This documentation is invaluable for future reference and modifications. Simplify hierarchies where possible and provide clear documentation to help users understand the structure.

Data Warehouse

Data Warehouse Data Quality ETL Business Intelligence

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Data integration is essentially the Extract and Load portion of the Extract, Load, and Transform (ELT) process. Data ingestion involves connecting your data sources, including databases, flat files, streaming data, etc, to your data warehouse. Snowflake provides native ways for data ingestion.

Data Warehouse

Data Warehouse Azure AWS Database

Data Lineage Through the Decades: Where It’s Going (And Where It’s Been)

Alation

FEBRUARY 7, 2023

It wouldn’t be until 2013 that the topic of data lineage would surface again – this time while working on a data warehouse project. Data warehouses obfuscate data’s origin In 2013, I was a Business Intelligence Engineer at a financial services company.

Data Warehouse

Data Warehouse ETL Business Intelligence SQL

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

For example, a new data scientist who is curious about which customers are most likely to be repeat buyers, might search for customer data only to discover an article documenting a previous project that answered their exact question. Query editors embedded directly into data catalogs have a few advantages for data scientists.

Data Scientist

Data Scientist Data Quality Data Science Data Analyst

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

One of the easiest ways for Snowflake to achieve this is to have analytics solutions query their data warehouse in real-time (also known as DirectQuery). The June 2021 release of Power BI Desktop introduced Custom SQL queries to Snowflake in DirectQuery mode. This ensures the maximum amount of Snowflake consumption possible.

Power BI

Power BI Analytics Analytics Azure

How to Pull Data From On-prem Systems Using Fivetran’s HVA Connectors

phData

OCTOBER 20, 2023

Some of the databases supported by Fivetran are: Snowflake Data Cloud (BETA) MySQL PostgreSQL SAP ERP SQL Server Oracle In this blog, we will review how to pull Data from on-premise Systems using Fivetran to a specific target or destination. You can find more information about them in their official documentation.

Database

Database SQL ETL Data Warehouse

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Alation

MAY 24, 2022

Prime examples of this in the data catalog include: Trust Flags — Allow the data community to endorse, warn, and deprecate data to signal whether data can or can’t be used. Data Profiling — Statistics such as min, max, mean, and null can be applied to certain columns to understand its shape.

Data Quality

Data Quality Data Governance ETL Data Observability

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. This graph is an example of one analysis, documented in our internal catalog.

Data Analyst

Data Analyst Data Scientist Analytics Analytics

What Free Tools Pair Well With The Snowflake AI Data Cloud?

phData

OCTOBER 17, 2024

dbt offers a SQL-first transformation workflow that lets teams build data transformation pipelines while following software engineering best practices like CI/CD, modularity, and documentation. The entire Toolkit is free for any phData customer in perpetuity, which is why these next few tools are (basically) free.

AI

AI AI SQL Data Quality

Who is a BI Developer: Role, Responsibilities & Skills

Pickl AI

JULY 3, 2023

Here are steps you can follow to pursue a career as a BI Developer: Acquire a solid foundation in data and analytics: Start by building a strong understanding of data concepts, relational databases, SQL (Structured Query Language), and data modeling.

Business Intelligence

Business Intelligence Business Intelligence SQL Data Visualization

What Are the Key Features of Fivetran & dbt?

phData

AUGUST 21, 2023

They provide loose coupling between the business logic that processes your data and the platform and data that it is executed upon. With Fivetran, you can quickly and easily switch between different data warehouse technologies in which to land your data, as well as popular open-source lake formats such as Apache Iceberg.

Data Warehouse

Data Warehouse SQL Cloud Data Database

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

Shaping the future: OMRON’s data-driven journey with AWS

Webinars

A Few Proven Suggestions for Handling Large Data Sets

The Best Data Management Tools For Small Businesses

Is Google BigQuery The Future Of Big Data Analytics?

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Top 10 Big Data CRM Tools To Increase Business Sales

Serverless High Volume ETL data processing on Code Engine

How Dataiku and Snowflake Strengthen the Modern Data Stack

What Is Fivetran and How Much Does It Cost?

11 Open Source Data Exploration Tools You Need to Know in 2023

How to Split Text For Vector Embeddings in Snowflake

Transitioning off Amazon Lookout for Metrics

List of ETL Tools: Explore the Top ETL Tools for 2025

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Why Upgrade to dbt Cloud over dbt Core?

Build generative AI chatbots using prompt engineering with Amazon Redshift and Amazon Bedrock

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

The Modern Data Stack Explained: What The Future Holds

Database Activity Monitoring – A Security Investment That Pays Off

Top 5 Fivetran Connectors for Healthcare

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Why Migrate From Teradata to Snowflake

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

How to Use Exploratory Notebooks [Best Practices]

The Ultimate Modern Data Stack Migration Guide

Exploring the AI and data capabilities of watsonx

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

How to Better Plan Your Snowflake Migration

Exploring the fundamentals of online transaction processing databases

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Hierarchies in Dimensional Modelling

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Data Lineage Through the Decades: Where It’s Going (And Where It’s Been)

The Data Scientist’s Guide to the Data Catalog

How to Optimize Power BI and Snowflake for Advanced Analytics

How to Pull Data From On-prem Systems Using Fivetran’s HVA Connectors

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

What Free Tools Pair Well With The Snowflake AI Data Cloud?

Who is a BI Developer: Role, Responsibilities & Skills

What Are the Key Features of Fivetran & dbt?

Stay Connected