Data Engineering, Data Warehouse and Document

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

In the contemporary age of Big Data, Data Warehouse Systems and Data Science Analytics Infrastructures have become an essential component for organizations to store, analyze, and make data-driven decisions. So why using IaC for Cloud Data Infrastructures?

Data Warehouse

Data Warehouse Azure SQL Database

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

AWS Machine Learning Blog

NOVEMBER 1, 2023

Organizations can search for PII using methods such as keyword searches, pattern matching, data loss prevention tools, machine learning (ML), metadata analysis, data classification software, optical character recognition (OCR), document fingerprinting, and encryption.

AWS

AWS Machine Learning Machine Learning ML

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

When needed, the system can access an ODAP data warehouse to retrieve additional information. Document management Documents are securely stored in Amazon S3, and when new documents are added, a Lambda function processes them into chunks.

AWS

AWS Data Governance Data Silos SQL

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

Fivetran is used by businesses to centralize data from various sources into a single, comprehensive data warehouse. It allows organizations to easily connect their disparate data sources without having to manage any infrastructure. This frees up our data engineers to do what they do best.

Data Warehouse

Data Warehouse Data Engineering Data Engineer Data Engineering

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

The blog post explains how the Internal Cloud Analytics team leveraged cloud resources like Code-Engine to improve, refine, and scale the data pipelines. Background One of the Analytics teams tasks is to load data from multiple sources and unify it into a data warehouse.

ETL

ETL Data Pipeline Database Data Warehouse

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Introduction ETL plays a crucial role in Data Management. This process enables organisations to gather data from various sources, transform it into a usable format, and load it into data warehouses or databases for analysis. Loading The transformed data is loaded into the target destination, such as a data warehouse.

ETL

ETL Data Warehouse Data Quality Data Governance

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

To start using OpenSearch for anomaly detection you first must index your data into OpenSearch , from there you can enable anomaly detection in OpenSearch Dashboards. To learn more, see the documentation. To learn more, see the documentation. To learn more, see the documentation.

AWS

AWS ML ML Data Quality

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

By 2025, global data volumes are expected to reach 181 zettabytes, according to IDC. To harness this data effectively, businesses rely on ETL (Extract, Transform, Load) tools to extract, transform, and load data into centralized systems like data warehouses.

ETL

ETL Data Warehouse AWS Business Intelligence

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools.

AI

AI AI Data Warehouse ML

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

The ultimate need for vast storage spaces manifests in data warehouses: specialized systems that aggregate data coming from numerous sources for centralized management and consistency. In this article, you’ll discover what a Snowflake data warehouse is, its pros and cons, and how to employ it efficiently.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

The modern data stack is a combination of various software tools used to collect, process, and store data on a well-integrated cloud-based data platform. It is known to have benefits in handling data due to its robustness, speed, and scalability. A typical modern data stack consists of the following: A data warehouse.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly. It promotes a disciplined approach to data modeling, making it easier to ensure data quality and consistency across the ML pipelines. Saurabh Gupta is a Principal Engineer at Zeta Global.

AWS

AWS Machine Learning Machine Learning ML

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

By incorporating metadata into the data model, users can easily discover, understand, and interpret the data stored in the lake. With the amounts of data involved, this can be crucial to utilizing a data lake effectively. However, this can be time-consuming and prone to human error, leading to misinformation.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

Scale knowledge management use cases with generative AI

IBM Journey to AI blog

JULY 27, 2023

Precisely conducted a study that found that within enterprises, data scientists spend 80% of their time cleaning, integrating and preparing data , dealing with many formats, including documents, images, and videos. Overall placing emphasis on establishing a trusted and integrated data platform for AI.

AI

AI AI Data Scientist Data Quality

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

These encoder-only architecture models are fast and effective for many enterprise NLP tasks, such as classifying customer feedback and extracting information from large documents. While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases.

AI

AI AI Machine Learning Machine Learning

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. First up, let’s dive into the foundation of every Modern Data Stack, a cloud-based data warehouse.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Exploring the fundamentals of online transaction processing databases

Dataconomy

APRIL 27, 2023

On the other hand, OLAP systems use a multidimensional database, which is created from multiple relational databases and enables complex queries involving multiple data facts from current and historical data. An OLAP database may also be organized as a data warehouse.

Database

Database Data Scientist Data Mining Data Mining

Why Upgrade to dbt Cloud over dbt Core?

phData

OCTOBER 12, 2022

Hosted Doc Site for Documentation One of the most powerful features of dbt can be the documentation you generate. This documentation can give different users insight into where data came from, what the profile of the data is, what the SQL looked like, and the DAG to know where the data is being used.

SQL

SQL Data Warehouse Data Visualization Cloud Data

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Data integration is essentially the Extract and Load portion of the Extract, Load, and Transform (ELT) process. Data ingestion involves connecting your data sources, including databases, flat files, streaming data, etc, to your data warehouse. Snowflake provides native ways for data ingestion.

Data Warehouse

Data Warehouse Azure AWS Database

Getting Started With Matillion Data Productivity Cloud

phData

NOVEMBER 28, 2023

How to Get Started with Matillion Data Productivity Cloud That looks unbelievable, but trust me, you can get started with Matillion Data Productivity Cloud from 0 to start your first job in around 5 minutes. Creating Your Account First things first, let’s create your Matillion account in order to deploy your Data Productivity Cloud.

Data Warehouse

Data Warehouse Data Pipeline ETL Azure

Considerations and Approaches to Loading Reference Data into Snowflake

phData

AUGUST 9, 2024

Typically, this data is scattered across Excel files on business users’ desktops. Typically, this data is scattered across Excel files on business users’ desktops. They usually operate outside any data governance structure; often, no documentation exists outside the user’s mind.

ETL

ETL Data Warehouse Data Governance Tableau

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Alation is pleased to be named a dbt Metrics Partner and to announce the start of a partnership with dbt, which will bring dbt data into the Alation data catalog. In the modern data stack, dbt is a key tool to make data ready for analysis. Data Transformation in the Modern Data Stack.

Data Analyst

Data Analyst Data Engineering Data Engineering Data Engineering

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

For example, a new data scientist who is curious about which customers are most likely to be repeat buyers, might search for customer data only to discover an article documenting a previous project that answered their exact question. Modern data catalogs also facilitate data quality checks.

Data Scientist

Data Scientist Data Quality Data Science Data Analyst

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. But what does this mean from a practitioner perspective?

Data Analyst

Data Analyst Data Scientist Analytics Analytics

Top 5 Fivetran Connectors For Financial Services

phData

JANUARY 24, 2024

Understanding Fivetran Fivetran is a user-friendly, code-free platform enabling customers to easily synchronize their data by automating extraction, transformation, and loading from many sources. Fivetran automates the time-consuming steps of the ELT process so your data engineers can focus on more impactful projects.

Data Warehouse

Data Warehouse Data Pipeline Data Governance Cloud Data

Alation 2022.1: Customize Your Data Catalog

Alation

MARCH 1, 2022

Through Impact Analysis, users can determine if a problem occurred with data upstream, and locate the impacted data downstream. With robust data lineage, data engineers can find and fix issues fast and prevent them from recurring. Similarly, analysts gain a clear view of how data is created.

Data Warehouse

Data Warehouse Data Lakes Cloud Data Database

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Also Read: Top 10 Data Science tools for 2024. It is a process for moving and managing data from various sources to a central data warehouse. This process ensures that data is accurate, consistent, and usable for analysis and reporting. This process helps organisations manage large volumes of data efficiently.

ETL

ETL Data Quality Data Pipeline Data Warehouse

Where Does Tableau Store Data?

phData

JULY 21, 2023

Tableau Architecture Let’s understand a bit more about Tableau architecture which will help in better knowledge of where Tableau data is stored. Data Server : These are basically the databases, files, and data warehouses to which any dashboard connects for the rendering of visuals. Hyper is a compiling query engine.

Tableau

Tableau Database Data Engineering Data Engineering

Where Does Tableau Store Data?

phData

JULY 21, 2023

Tableau Architecture Let’s understand a bit more about Tableau architecture which will help in better knowledge of where Tableau data is stored. Data Server : These are basically the databases, files, and data warehouses to which any dashboard connects for the rendering of visuals. Hyper is a compiling query engine.

Tableau

Tableau Database Data Engineering Data Engineering

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Alation

MAY 24, 2022

With the Open Data Quality Initiative, Alation introduces an Open Data Quality Framework (ODQF), which includes a starter kit for data quality partners. This kit offers an open DQ API, developer documentation, onboarding, integration best practices, and co-marketing support.

Data Quality

Data Quality Data Governance ETL Data Observability

Top 5 Use Cases of phData’s Advisor Tool

phData

MARCH 29, 2024

Founded in 2014 by three leading cloud engineers, phData focuses on solving real-world data engineering, operations, and advanced analytics problems with the best cloud platforms and products. Over the years, one of our primary focuses became Snowflake and migrating customers to this leading cloud data platform.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

Data Vault - Data Lifecycle Architecturally, let’s understand the data lifecycle in the data vault into the following layers, which play a key role in choosing the right pattern and tools to implement. Data Acquisition: Extracting data from source systems and making it accessible.

SQL

SQL Data Observability Data Quality Data Pipeline

Top Advanced Text Data Labeling Techniques: A Comprehensive Guide

DagsHub

JANUARY 27, 2025

Text Data Labeling Techniques Text data labeling is a nuanced process, where success lies in finding the right balance between human expertise and automatic efficiency for each specific use case. Improve your data quality for better AI Easily curate and annotate your vision, audio, and document data with a single platform.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Supervised Learning

How to Build a Data Mesh in Snowflake

phData

SEPTEMBER 20, 2023

A data mesh is a conceptual architectural approach for managing data in large organizations. Traditional data management approaches often involve centralizing data in a data warehouse or data lake, leading to challenges like data silos, data ownership issues, and data access and processing bottlenecks.

Data Silos

Data Silos Database Data Quality Data Engineering

How CURO Financial Technologies Successfully Integrated Data Sources After a Major Merger

Alation

JUNE 20, 2023

Alation: And you likely had plenty of data even before the acquisition. The challenge wasn’t that we had nothing to document. Data from any other brands we acquired or built were based on the same schema, so an engineering team looking for data would say, “We know about where it is, but we don’t know exactly where it is.”

Data Governance

Data Governance Database SQL Data Engineering

Top Advanced Text Data Labeling: A Comprehensive Guide

DagsHub

JANUARY 27, 2025

The approach has shown exceptional results in specialized domains such as legal document classification and medical text analysis, where nuanced interpretation is crucial [ reference ]. Documentation Requirements Documentation serves both as a historical record and a living guide.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Supervised Learning

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Alation

SEPTEMBER 7, 2021

With Snowflake, data stewards have a choice to leverage Snowflake’s governance policies. First, stewards are dependent on data warehouse admins to provide information and to create and edit enforcement policies in Snowflake. Alation’s data lineage helps organizations to secure their data in the Snowflake Data Cloud.

Data Governance

Data Governance Data Scientist Data Quality Data Profiling

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Alation

APRIL 6, 2023

Data governance is traditionally applied to structured data assets that are most often found in databases and information systems. This blog focuses on governing spreadsheets that contain data, information, and metadata, and must themselves be governed. There are others that consider spreadsheets to be trouble.

Data Governance

Data Governance Database Data Lakes Data Warehouse

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

One of the easiest ways for Snowflake to achieve this is to have analytics solutions query their data warehouse in real-time (also known as DirectQuery). For more information on composite models, check out Microsoft’s official documentation. This ensures the maximum amount of Snowflake consumption possible.

Power BI

Power BI Analytics Analytics Azure

Beginner’s Guide To GCP BigQuery (Part 2)

Mlearning.ai

JULY 10, 2023

Without partitioning, daily data activities will cost your company a fortune and a moment will come where the cost advantage of GCP BigQuery becomes questionable. There are other options you can place, and as usual, I suggest you to reference the official documentation to learn more.

SQL

SQL Database Database Administration Data Lakes

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Data Preparation: Cleaning, transforming, and preparing data for analysis and modelling. Collaborating with Teams: Working with data engineers, analysts, and stakeholders to ensure data solutions meet business needs. Start by setting up your own Azure account and experimenting with various services.

Azure

Azure Data Scientist Data Science Machine Learning

What Are dbt Artifacts

phData

FEBRUARY 8, 2024

Data Modeling, dbt has gradually emerged as a powerful tool that largely simplifies the process of building and handling data pipelines. dbt is an open-source command-line tool that allows data engineers to transform, test, and document the data into one single hub which follows the best practices of software engineering.

Data Modeling

Data Modeling Data Models Data Warehouse Database

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

phData

OCTOBER 25, 2024

Data Quality Monitoring implements quality checks in operational data processes to ensure that the data meets pre-defined standards and business rules. This results in poor credibility and data consistency after some time, leading businesses to mistrust the data pipelines and processes.

Data Quality

Data Quality Data Pipeline Data Governance Database

What is Identity Resolution? A Comprehensive Guide

phData

MAY 6, 2024

Another benefit of deterministic matching is that the process to build these identities is relatively simple, and tools your teams might already use, like SQL and dbt , can efficiently manage this process within your cloud data warehouse. However, targeted web advertising may only require linkage to a browser or device ID.

Data Lakes

Data Lakes Data Warehouse Cloud Data SQL

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

Webinars

Trending Sources

Shaping the future: OMRON’s data-driven journey with AWS

Webinars

What Is Fivetran and How Much Does It Cost?

Serverless High Volume ETL data processing on Code Engine

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Transitioning off Amazon Lookout for Metrics

List of ETL Tools: Explore the Top ETL Tools for 2025

How to use foundation models and trusted governance to manage AI workflow risk

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

The Modern Data Stack Explained: What The Future Holds

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Scale knowledge management use cases with generative AI

Exploring the AI and data capabilities of watsonx

The Ultimate Modern Data Stack Migration Guide

Exploring the fundamentals of online transaction processing databases

Why Upgrade to dbt Cloud over dbt Core?

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Getting Started With Matillion Data Productivity Cloud

Considerations and Approaches to Loading Reference Data into Snowflake

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

The Data Scientist’s Guide to the Data Catalog

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Top 5 Fivetran Connectors For Financial Services

Alation 2022.1: Customize Your Data Catalog

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Where Does Tableau Store Data?

Where Does Tableau Store Data?

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Top 5 Use Cases of phData’s Advisor Tool

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

Top Advanced Text Data Labeling Techniques: A Comprehensive Guide

How to Build a Data Mesh in Snowflake

How CURO Financial Technologies Successfully Integrated Data Sources After a Major Merger

Top Advanced Text Data Labeling: A Comprehensive Guide

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

How to Optimize Power BI and Snowflake for Advanced Analytics

Beginner’s Guide To GCP BigQuery (Part 2)

Your Complete Roadmap to Become an Azure Data Scientist

What Are dbt Artifacts

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

What is Identity Resolution? A Comprehensive Guide

Stay Connected