Data Warehouse, Document and ETL - Data Science Current

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. Create dbt models in dbt Cloud.

ETL

ETL Data Warehouse Analytics Analytics

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

In the contemporary age of Big Data, Data Warehouse Systems and Data Science Analytics Infrastructures have become an essential component for organizations to store, analyze, and make data-driven decisions. So why using IaC for Cloud Data Infrastructures?

Data Warehouse

Data Warehouse Azure SQL Database

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

By Santhosh Kumar Neerumalla , Niels Korschinsky & Christian Hoeboer Introduction This blogpost describes how to manage and orchestrate high volume Extract-Transform-Load (ETL) loads using a serverless process based on Code Engine. The source data is unstructured JSON, while the target is a structured, relational database.

ETL

ETL Data Pipeline Database Data Warehouse

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

Summary: This guide explores the top list of ETL tools, highlighting their features and use cases. It provides insights into considerations for choosing the right tool, ensuring businesses can optimize their data integration processes for better analytics and decision-making. What is ETL? What are ETL Tools?

ETL

ETL Data Warehouse AWS Business Intelligence

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Summary: This article explores the significance of ETL Data in Data Management. It highlights key components of the ETL process, best practices for efficiency, and future trends like AI integration and real-time processing, ensuring organisations can leverage their data effectively for strategic decision-making.

ETL

ETL Data Warehouse Data Quality Data Governance

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Summary: Choosing the right ETL tool is crucial for seamless data integration. Top contenders like Apache Airflow and AWS Glue offer unique features, empowering businesses with efficient workflows, high data quality, and informed decision-making capabilities. Choosing the right ETL tool is crucial for smooth data management.

ETL

ETL Data Quality Data Pipeline Data Warehouse

Top 10 Big Data CRM Tools To Increase Business Sales

Smart Data Collective

JULY 20, 2021

Top Big Data CRM Integration Tools in 2021: #1 MuleSoft: Mulesoft is a data integration platform owned by Salesforce to accelerate digital customer transformations. This tool is designed to connect various data sources, enterprise applications and perform analytics and ETL processes.

Big Data

Big Data Big Data ETL Analytics

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

Extraction, Transform, Load (ETL). The extraction of raw data, transforming to a suitable format for business needs, and loading into a data warehouse. Data transformation. This process helps to transform raw data into clean data that can be analysed and aggregated. Data analytics and visualisation.

Data Warehouse

Data Warehouse SQL Azure ETL

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

A Matillion pipeline is a collection of jobs that extract, load, and transform (ETL/ELT) data from various sources into a target system, such as a cloud data warehouse like Snowflake. Document business rules and assumptions directly within the workflow. Data tables used and their role in the workflow.

AI

AI AI SQL ETL

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Text analytics: Text analytics, also known as text mining, deals with unstructured text data, such as customer reviews, social media comments, or documents. It uses natural language processing (NLP) techniques to extract valuable insights from textual data. Poor data integration can lead to inaccurate insights.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

Fivetran is used by businesses to centralize data from various sources into a single, comprehensive data warehouse. It allows organizations to easily connect their disparate data sources without having to manage any infrastructure. How Much Does Fivetran Cost? The answer to that question is, it depends.

Data Warehouse

Data Warehouse Data Engineering Data Engineer Data Engineering

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

It is known to have benefits in handling data due to its robustness, speed, and scalability. A typical modern data stack consists of the following: A data warehouse. Data ingestion/integration services. Reverse ETL tools. Data orchestration tools. A Note on the Shift from ETL to ELT.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

To start using OpenSearch for anomaly detection you first must index your data into OpenSearch , from there you can enable anomaly detection in OpenSearch Dashboards. To learn more, see the documentation. To learn more, see the documentation. To learn more, see the documentation.

AWS

AWS ML ML Data Quality

Considerations and Approaches to Loading Reference Data into Snowflake

phData

AUGUST 9, 2024

Typically, this data is scattered across Excel files on business users’ desktops. Typically, this data is scattered across Excel files on business users’ desktops. They usually operate outside any data governance structure; often, no documentation exists outside the user’s mind.

ETL

ETL Data Warehouse Data Governance Tableau

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

An example direct acyclic graph (DAG) might automate data ingestion, processing, model training, and deployment tasks, ensuring that each step is run in the correct order and at the right time. Though it’s worth mentioning that Airflow isn’t used at runtime as is usual for extract, transform, and load (ETL) tasks.

AWS

AWS Machine Learning Machine Learning ML

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

By incorporating metadata into the data model, users can easily discover, understand, and interpret the data stored in the lake. With the amounts of data involved, this can be crucial to utilizing a data lake effectively. However, this can be time-consuming and prone to human error, leading to misinformation.

Data Lakes

Data Lakes Data Models Data Modeling Data Warehouse

Getting Started With Matillion Data Productivity Cloud

phData

NOVEMBER 28, 2023

As a result, Matillion is an excellent choice for businesses wishing to optimize their data operations in a scalable and user-friendly environment. Matillion’s Data Productivity Cloud is a pivotal tool for modern data teams, designed to accelerate data delivery and transform the ETL process. No problem.

Data Warehouse

Data Warehouse Data Pipeline ETL Azure

Why a Streaming-First Approach to Digital Modernization Matters

Precisely

APRIL 3, 2023

How can an organization enable flexible digital modernization that brings together information from multiple data sources, while still maintaining trust in the integrity of that data? To speed analytics, data scientists implemented pre-processing functions to aggregate, sort, and manage the most important elements of the data.

ETL

ETL Analytics Analytics Database

Understanding Business Intelligence Architecture: Key Components

Pickl AI

JANUARY 28, 2025

Data can be structured (e.g., documents and images). The diversity of data sources allows organizations to create a comprehensive view of their operations and market conditions. Data Integration Once data is collected from various sources, it needs to be integrated into a cohesive format.

Business Intelligence

Business Intelligence Business Intelligence ETL Data Lakes

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

AWS Machine Learning Blog

JUNE 25, 2024

When the automated content processing steps are complete, you can use the output for downstream tasks, such as to invoke different components in a customer service backend application, or to insert the generated tags into metadata of each document for product recommendation. The stored data is visualized in a BI dashboard using QuickSight.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Hierarchies in Dimensional Modelling

Pickl AI

AUGUST 9, 2024

Document Hierarchy Structures Maintain thorough documentation of hierarchy designs, including definitions, relationships, and data sources. This documentation is invaluable for future reference and modifications. Simplify hierarchies where possible and provide clear documentation to help users understand the structure.

Data Warehouse

Data Warehouse Data Quality ETL Business Intelligence

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. This typically results in long-running ETL pipelines that cause decisions to be made on stale or old data.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Alation

MAY 24, 2022

The Lineage & Dataflow API is a good example enabling customers to add ETL transformation logic to the lineage graph. The Open Connector Framework SDK enables engineers to custom-build data source connectors , which are indexed by Alation. Open Data Quality Initiative.

Data Quality

Data Quality Data Governance ETL Data Observability

Data Lineage Through the Decades: Where It’s Going (And Where It’s Been)

Alation

FEBRUARY 7, 2023

It wouldn’t be until 2013 that the topic of data lineage would surface again – this time while working on a data warehouse project. Data warehouses obfuscate data’s origin In 2013, I was a Business Intelligence Engineer at a financial services company.

Data Warehouse

Data Warehouse ETL Business Intelligence SQL

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Data integration is essentially the Extract and Load portion of the Extract, Load, and Transform (ELT) process. Data ingestion involves connecting your data sources, including databases, flat files, streaming data, etc, to your data warehouse. Snowflake provides native ways for data ingestion.

Data Warehouse

Data Warehouse Azure AWS Database

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

These encoder-only architecture models are fast and effective for many enterprise NLP tasks, such as classifying customer feedback and extracting information from large documents. While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases.

AI

AI AI Machine Learning Machine Learning

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Understanding Fivetran Fivetran is a popular Software-as-a-Service platform that enables users to automate the movement of data and ETL processes across diverse sources to a target destination. For a longer overview, along with insights and best practices, please feel free to jump back to the previous blog.

SQL

SQL Data Warehouse Azure Cloud Data

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. This graph is an example of one analysis, documented in our internal catalog.

Data Analyst

Data Analyst Data Scientist Analytics Analytics

Unlocking the 12 Ways to Improve Data Quality

Pickl AI

OCTOBER 19, 2023

Data Training and Awareness Invest in training for your staff. Ensure that everyone handling data understands its importance and the role it plays in maintaining data quality. Data Documentation Comprehensive data documentation is essential. Identify anomalies, inconsistencies, and missing values.

Data Quality

Data Quality Data Governance Data Warehouse Machine Learning

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

Documentation: Keep detailed documentation of the deployed model, including its architecture, training data, and performance metrics, so that it can be understood and managed effectively. If you aren’t aware already, let’s introduce the concept of ETL. We primarily used ETL services offered by AWS.

AWS

AWS ETL ML ML

Understanding Zero-Code Development Life Cycle in Matillion

phData

MAY 11, 2023

With the “Data Productivity Cloud” launch, Matillion has achieved a balance of simplifying source control, collaboration, and dataops by elevating Git integration to a “first-class citizen” within the framework. In Matillion ETL, the Git integration enables an organization to connect to any Git offering (e.g.,

ETL

ETL Analytics Analytics Data Modeling

Best Practices for Fact Tables in Dimensional Models

Pickl AI

AUGUST 11, 2024

Consider factors such as data volume, query patterns, and hardware constraints. Document and Communicate Maintain thorough documentation of fact table designs, including definitions, calculations, and relationships. Use indexing and partitioning strategies to improve query performance.

Data Quality

Data Quality Data Warehouse Data Governance Analytics

How to Better Plan Your Snowflake Migration

phData

SEPTEMBER 26, 2023

A common problem solved by phData is the migration from an existing data platform to the Snowflake Data Cloud , in the best possible manner. Data flows from the current data platform to the destination. Either way, it’s important to understand what data is transformed, and how so. Ready to Get Started?

SQL

SQL Database ETL Data Modeling

What Free Tools Pair Well With The Snowflake AI Data Cloud?

phData

OCTOBER 17, 2024

Apache Airflow Airflow is an open-source ETL software that is very useful when paired with Snowflake. dbt offers a SQL-first transformation workflow that lets teams build data transformation pipelines while following software engineering best practices like CI/CD, modularity, and documentation.

AI

AI AI SQL Data Quality

How to Pull Data From On-prem Systems Using Fivetran’s HVA Connectors

phData

OCTOBER 20, 2023

The on-premise agent is responsible for sending data to Fivetran, which is then processed and loaded into the destination. You can find more information about them in their official documentation. Extra Points Data Warehouses as Source Currently, it is in Beta, but you can use BigQuery and Snowflake as data sources in Fivetran.

Database

Database SQL ETL Data Warehouse

AI that’s ready for business starts with data that’s ready for AI

IBM Journey to AI blog

JULY 3, 2024

As data types and applications evolve, you might need specialized NoSQL databases to handle diverse data structures and specific application requirements. With an open data lakehouse, you can access a single copy of data wherever your data resides.

AI

AI Data Quality AI Database

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

Data Vault - Data Lifecycle Architecturally, let’s understand the data lifecycle in the data vault into the following layers, which play a key role in choosing the right pattern and tools to implement. Data Acquisition: Extracting data from source systems and making it accessible.

SQL

SQL Data Observability Data Quality Data Pipeline

Who is a BI Developer: Role, Responsibilities & Skills

Pickl AI

JULY 3, 2023

Gain hands-on experience with data integration: Learn about data integration techniques to combine data from various sources, such as databases, spreadsheets, and APIs. BI Developer Skills Required To excel in this role, BI Developers need to possess a range of technical and soft skills.

Business Intelligence

Business Intelligence Business Intelligence SQL Data Visualization

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

References : Links to internal or external documentation with background information or specific information used within the analysis presented in the notebook. Data to explore: Outline the tables or datasets you’re exploring/analyzing and reference their sources or link their data catalog entries. documentation.

SQL

SQL Database Data Scientist Python

How to Effectively Handle Unstructured Data Using AI

DagsHub

NOVEMBER 11, 2024

So, we must understand the different unstructured data types and effectively process them to uncover hidden patterns. Textual Data Textual data is one of the most common forms of unstructured data and can be in the format of documents, social media posts, emails, web pages, customer reviews, or conversation logs.

AI

AI AI Data Lakes Database

dbt and Sigma Integration

phData

JUNE 27, 2023

Using SQL-centric transformations to model data to be deployed. dbt is also great for data lineage and documentation to empower business analysts to make informed decisions on their data. Data Ingestion with Fivetran Fivetran is used to move your source(s) into a centralized space for storage.

SQL

SQL Database Data Quality Data Warehouse

Leveraging KNIME and Power BI: Integrating Power BI in KNIME

phData

OCTOBER 11, 2023

KNIME and Power BI: The Power of Integration The data analytics process invariably involves a crucial phase: data preparation. This phase demands meticulous customization to optimize data for analysis. Consider a scenario: a data repository residing within a cloud-based data warehouse. Execute the workflow.

Power BI

Power BI Data Preparation Analytics Analytics

Using Matillion Data Productivity Cloud to call APIs

phData

JANUARY 19, 2024

Check the API documentation to discover what parameters must be passed into the API call and configured in this wizard. Check out the API documentation for our sample. Aside from that, you will choose where the data will be stored in your data warehouse and the staging location.

Data Pipeline

Data Pipeline Data Warehouse ETL Azure

Beginner’s Guide To GCP BigQuery (Part 1)

Mlearning.ai

JULY 10, 2023

In my 7 years of Data Science journey, I’ve been exposed to a number of different databases including but not limited to Oracle Database, MS SQL, MySQL, EDW, and Apache Hadoop. A lot of you who are already in the data science field must be familiar with BigQuery and its advantages.

SQL

SQL Database Apache Hadoop Data Science

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Webinars

Trending Sources

Serverless High Volume ETL data processing on Code Engine

Webinars

List of ETL Tools: Explore the Top ETL Tools for 2025

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Top 10 Big Data CRM Tools To Increase Business Sales

The Best Data Management Tools For Small Businesses

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Beyond data: Cloud analytics mastery for business brilliance

What Is Fivetran and How Much Does It Cost?

The Modern Data Stack Explained: What The Future Holds

Transitioning off Amazon Lookout for Metrics

Considerations and Approaches to Loading Reference Data into Snowflake

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Getting Started With Matillion Data Productivity Cloud

Why a Streaming-First Approach to Digital Modernization Matters

Understanding Business Intelligence Architecture: Key Components

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

Hierarchies in Dimensional Modelling

The Ultimate Modern Data Stack Migration Guide

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Data Lineage Through the Decades: Where It’s Going (And Where It’s Been)

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Exploring the AI and data capabilities of watsonx

Top 5 Fivetran Connectors for Healthcare

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Unlocking the 12 Ways to Improve Data Quality

How to Build a CI/CD MLOps Pipeline [Case Study]

Understanding Zero-Code Development Life Cycle in Matillion

Best Practices for Fact Tables in Dimensional Models

How to Better Plan Your Snowflake Migration

What Free Tools Pair Well With The Snowflake AI Data Cloud?

How to Pull Data From On-prem Systems Using Fivetran’s HVA Connectors

AI that’s ready for business starts with data that’s ready for AI

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

Who is a BI Developer: Role, Responsibilities & Skills

How to Use Exploratory Notebooks [Best Practices]

How to Effectively Handle Unstructured Data Using AI

dbt and Sigma Integration

Leveraging KNIME and Power BI: Integrating Power BI in KNIME

Using Matillion Data Productivity Cloud to call APIs

Beginner’s Guide To GCP BigQuery (Part 1)

Stay Connected