Data Pipeline, Data Quality and SQL

Securing the data pipeline, from blockchain to AI

Dataconomy

OCTOBER 8, 2024

Accurate and secure data can help to streamline software engineering processes and lead to the creation of more powerful AI tools, but it has become a challenge to maintain the quality of the expansive volumes of data needed by the most advanced AI models.

Data Pipeline

Data Pipeline AI AI Data Warehouse

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

These tools provide data engineers with the necessary capabilities to efficiently extract, transform, and load (ETL) data, build data pipelines, and prepare data for analysis and consumption by other applications. Essential data engineering tools for 2023 Top 10 data engineering tools to watch out for in 2023 1.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Organizations require reliable data for robust AI models and accurate insights, yet the current technology landscape presents unparalleled data quality challenges. ETL/ELT tools typically have two components: a design time (to design data integration jobs) and a runtime (to execute data integration jobs).

Data Pipeline

Data Pipeline ETL SQL Database

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. Introduction Data pipelines play a pivotal role in modern data architecture by seamlessly transporting and transforming raw data into valuable insights.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

This tool democratizes data access across the organization, enabling even nontechnical users to gain valuable insights. A standout application is the SQL-to-natural language capability, which translates complex SQL queries into plain English and vice versa, bridging the gap between technical and business teams.

AWS

AWS Data Governance Data Silos SQL

Data architecture strategy for data quality

IBM Journey to AI blog

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Quality

Data Quality Data Lakes Data Warehouse Big Data

Data Quality Framework: What It Is, Components, and Implementation

DagsHub

AUGUST 23, 2024

As such, the quality of their data can make or break the success of the company. This article will guide you through the concept of a data quality framework, its essential components, and how to implement it effectively within your organization. What is a data quality framework?

Data Quality

Data Quality Data Governance Machine Learning Machine Learning

Alation Launches Open Data Quality Framework

Alation

MAY 24, 2022

In a sea of questionable data, how do you know what to trust? Data quality tells you the answer. It signals what data is trustworthy, reliable, and safe to use. It empowers engineers to oversee data pipelines that deliver trusted data to the wider organization. Today, as part of its 2022.2

Data Quality

Data Quality Data Pipeline DataOps Analytics

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

phData

OCTOBER 25, 2024

“Quality over Quantity” is a phrase we hear regularly in life, but when it comes to the world of data, we often fail to adhere to this rule. Data Quality Monitoring implements quality checks in operational data processes to ensure that the data meets pre-defined standards and business rules.

Data Quality

Data Quality Data Pipeline Data Governance Database

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Data quality control: Robust dataset labeling and annotation tools incorporate quality control mechanisms such as inter-annotator agreement analysis, review workflows, and data validation checks to ensure the accuracy and reliability of annotations. Data monitoring tools help monitor the quality of the data.

Machine Learning

Machine Learning Machine Learning ML ML

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. ETL is vital for ensuring data quality and integrity.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 12, 2024

AWS data engineering pipeline The adaptable approach detailed in this post starts with an automated data engineering pipeline to make data stored in Splunk available to a wide range of personas, including business intelligence (BI) analysts, data scientists, and ML practitioners, through a SQL interface.

ML

ML ML AWS AI

A Few Proven Suggestions for Handling Large Data Sets

Smart Data Collective

SEPTEMBER 26, 2021

The raw data can be fed into a database or data warehouse. An analyst can examine the data using business intelligence tools to derive useful information. . To arrange your data and keep it raw, you need to: Make sure the data pipeline is simple so you can easily move data from point A to point B.

Database

Database Data Visualization Big Data Big Data

Big Data vs. Data Science: Demystifying the Buzzwords

Pickl AI

APRIL 21, 2025

Real-World Example: Healthcare systems manage a huge variety of data: structured patient demographics, semi-structured lab reports, and unstructured doctor’s notes, medical images (X-rays, MRIs), and even data from wearable health monitors. Ensuring data quality and accuracy is a major challenge.

Big Data

Big Data Big Data Data Science Machine Learning

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage. Data Visualization: Matplotlib, Seaborn, Tableau, etc.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Data Quality Now that you’ve learned more about your data and cleaned it up, it’s time to ensure the quality of your data is up to par. With these data exploration tools, you can determine if your data is accurate, consistent, and reliable. You can watch it on demand here.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

That said, dbt provides the ability to generate data vault models and also allows you to write your data transformations using SQL and code-reusable macros powered by Jinja2 to run your data pipelines in a clean and efficient way. The most important reason for using DBT in Data Vault 2.0

SQL

SQL Data Observability Data Quality Data Pipeline

phData Toolkit February 2023 Update

phData

MARCH 1, 2023

The first one we want to talk about is the Toolkit SQL analyze command. When customers are looking to perform a migration, one of the first things that needs to occur is an assessment of the level of effort to migrate existing data definition language (DDL) and data markup language (DML).

SQL

SQL Data Pipeline Data Quality Database

Journeying into the realms of ML engineers and data scientists

Dataconomy

MAY 16, 2023

Key skills and qualifications for machine learning engineers include: Strong programming skills: Proficiency in programming languages such as Python, R, or Java is essential for implementing machine learning algorithms and building data pipelines.

Data Scientist

Data Scientist ML ML Machine Learning

AI-Powered Digital Transformation: Get Your Data and AI Ready

Precisely

AUGUST 15, 2024

Address common challenges in managing SAP master data by using AI tools to automate SAP processes and ensure data quality. Create an AI-driven data and process improvement loop to continuously enhance your business operations. Think about material master data, for example. Data creation and management processes.

AI

AI AI Data Quality Data Engineer

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Top contenders like Apache Airflow and AWS Glue offer unique features, empowering businesses with efficient workflows, high data quality, and informed decision-making capabilities. Introduction In today’s business landscape, data integration is vital. Scalability: Designed to handle large volumes of data efficiently.

ETL

ETL Data Quality Data Pipeline Data Warehouse

ODSC West 2023 Recap in Pictures

ODSC - Open Data Science

DECEMBER 5, 2023

We had bigger sessions on getting started with machine learning or SQL, up to advanced topics in NLP, and of course, plenty related to large language models and generative AI. You can see our photos from the event here , and be sure to follow our YouTube for virtual highlights from the conference as well.

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

What are the Biggest Challenges with Migrating to Snowflake?

phData

FEBRUARY 5, 2024

Setting up the Information Architecture Setting up an information architecture during migration to Snowflake poses challenges due to the need to align existing data structures, types, and sources with Snowflake’s multi-cluster, multi-tier architecture. Moving historical data from a legacy system to Snowflake poses several challenges.

SQL

SQL Database Data Quality Data Warehouse

phData Toolkit December 2022 Update

phData

DECEMBER 29, 2022

This automation includes things like SQL translation during a data platform migration (SQLMorph), making changes to your Snowflake information architecture (Tram), and checking for parity and data quality between platforms (Data Source Automation). Let’s dive in and take a deeper look at these.

SQL

SQL Database Database Administration Data Profiling

Experimenting with GenAI: Building Self-Healing CI/CD Pipelines for dbt Cloud

phData

AUGUST 22, 2024

Consider a data pipeline that detects its own failures, diagnoses the issue, and recommends the fix—all automatically. This is the potential of self-healing pipelines, and this blog explores how to implement them using dbt, Snowflake Cortex , and GitHub Actions.

SQL

SQL Data Quality Python Data Warehouse

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Best Practices for ETL Efficiency Maximising efficiency in ETL (Extract, Transform, Load) processes is crucial for organisations seeking to harness the power of data. Implementing best practices can improve performance, reduce costs, and improve data quality.

ETL

ETL Data Warehouse Data Quality Data Governance

ETL Process Explained: Essential Steps for Effective Data Management

Pickl AI

OCTOBER 17, 2024

ETL facilitates Data Analytics by transforming raw data into meaningful insights, empowering businesses to uncover trends, track performance, and make strategic decisions. ETL also enhances data quality and consistency by performing necessary data cleansing and validation during the transformation stage.

ETL

ETL Data Warehouse SQL Data Quality

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

The primary goal of Data Engineering is to transform raw data into a structured and usable format that can be easily accessed, analyzed, and interpreted by Data Scientists, analysts, and other stakeholders. Future of Data Engineering The Data Engineering market will expand from $18.2

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

What Free Tools Pair Well With The Snowflake AI Data Cloud?

phData

OCTOBER 17, 2024

dbt offers a SQL-first transformation workflow that lets teams build data transformation pipelines while following software engineering best practices like CI/CD, modularity, and documentation. Aside from migrations, Data Source is also great for data quality checks and can generate data pipelines.

AI

AI AI SQL Data Quality

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

It supports complex data transformations and offers advanced features like data quality management and metadata management. PowerCenter is particularly favored by large organizations with extensive data integration needs. It allows users to define complex workflows as code and provides scheduling capabilities.

ETL

ETL Data Warehouse AWS Business Intelligence

What is Snowflake Horizon?

phData

AUGUST 5, 2024

Horizon addresses key aspects of data governance, including: Compliance Security Access Privacy Interoperability Throughout the remainder of this blog, we will dive deeper into each of the above components and take a look at the ways in which Horizon can help. We will begin with compliance.

Data Governance

Data Governance Data Quality Data Lakes ML

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI

JANUARY 24, 2023

Data science and machine learning teams use Snorkel Flow’s programmatic labeling to intelligently capture knowledge from various sources such as previously labeled data (even when imperfect), heuristics from subject matter experts, business logic, and even the latest foundation models, then scale this knowledge to label large quantities of data.

AI

AI AI ML ML

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI

JANUARY 24, 2023

Data science and machine learning teams use Snorkel Flow’s programmatic labeling to intelligently capture knowledge from various sources such as previously labeled data (even when imperfect), heuristics from subject matter experts, business logic, and even the latest foundation models, then scale this knowledge to label large quantities of data.

AI

AI AI ML ML

What are the Advantages of Using Fivetran?

phData

JULY 19, 2023

In this blog post, we’ll dive into the amazing advantages of using Fivetran , a powerful data integration platform that will revolutionize the way you handle your data pipelines. This enabled the client to centralize their data, improve data quality and consistency, and empower business units with self-service analytics.

Data Warehouse

Data Warehouse Database Data Pipeline Cloud Data

What are the Advantages of Using Fivetran?

phData

JULY 19, 2023

In this blog post, we’ll dive into the amazing advantages of using Fivetran , a powerful data integration platform that will revolutionize the way you handle your data pipelines. This enabled the client to centralize their data, improve data quality and consistency, and empower business units with self-service analytics.

Data Warehouse

Data Warehouse Database Data Pipeline Cloud Data

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

Data fabric Data fabric architectures are designed to connect data platforms with the applications where users interact with information for simplified data access in an organization and self-service data consumption. This lets users across the organization treat the data like a product with widespread access.

Data Lakes

Data Lakes AI AI Data Governance

How to Maximize Time to Value with Fivetran and dbt

phData

OCTOBER 17, 2023

This cuts into time that can be spent delivering new data/features – and often results in leadership wondering why it is taking so long for new products to arrive (which leads to projects being cut). Additionally, frequent trust issues arise as these pipelines break or data quality suffers.

ETL

ETL Data Pipeline Data Engineer Data Engineering

Building a Data Culture with Snowflake: A Guide for CIOs

phData

JUNE 20, 2024

This oftentimes leads to shadow IT processes and duplicated data pipelines. Data is siloed, and there is no singular source of truth but fragmented data spread across the organization. Establishing a data culture changes this paradigm. Promoting Data Literacy Snowflake is an accessible platform.

Data Governance

Data Governance Analytics Analytics Power BI

What Are dbt Execution Best Practices?

phData

FEBRUARY 6, 2024

One of the more common practices when developing a data pipeline is rebuilding your data for testing changes. As one of the leaders in the industry, dbt provides several options on how to execute your pipelines to increase efficiency and specifically execute what you need. What is dbt Run + dbt Test?

Data Pipeline

Data Pipeline Database SQL Data Warehouse

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

A 2019 survey by McKinsey on global data transformation revealed that 30 percent of total time spent by enterprise IT teams was spent on non-value-added tasks related to poor data quality and availability. It truly is an all-in-one data lake solution. It’s not a widely known programming language like Java, Python, or SQL.

Data Lakes

Data Lakes Clustering Big Data Big Data

What Industries are Hiring for Different Jobs in AI

ODSC - Open Data Science

APRIL 26, 2023

Though scripted languages such as R and Python are at the top of the list of required skills for a data analyst, Excel is still one of the most important tools to be used. Because they are the most likely to communicate data insights, they’ll also need to know SQL, and visualization tools such as Power BI and Tableau as well.

Data Analyst

Data Analyst Machine Learning Machine Learning Power BI

Securing the data pipeline, from blockchain to AI

Essential data engineering tools for 2023: Empowering for management and analysis

Webinars

Trending Sources

The power of remote engine execution for ETL/ELT data pipelines

Webinars

Build Data Pipelines: Comprehensive Step-by-Step Guide

Shaping the future: OMRON’s data-driven journey with AWS

Data architecture strategy for data quality

Data Quality Framework: What It Is, Components, and Implementation

Alation Launches Open Data Quality Framework

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

MLOps Landscape in 2023: Top Tools and Platforms

Comparing Tools For Data Processing Pipelines

Discover the Most Important Fundamentals of Data Engineering

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

A Few Proven Suggestions for Handling Large Data Sets

Big Data vs. Data Science: Demystifying the Buzzwords

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

11 Open Source Data Exploration Tools You Need to Know in 2023

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData Toolkit February 2023 Update

Journeying into the realms of ML engineers and data scientists

AI-Powered Digital Transformation: Get Your Data and AI Ready

Top ETL Tools: Unveiling the Best Solutions for Data Integration

ODSC West 2023 Recap in Pictures

What are the Biggest Challenges with Migrating to Snowflake?

phData Toolkit December 2022 Update

Experimenting with GenAI: Building Self-Healing CI/CD Pipelines for dbt Cloud

Maximising Efficiency with ETL Data: Future Trends and Best Practices

ETL Process Explained: Essential Steps for Effective Data Management

10 Best Data Engineering Books [Beginners to Advanced]

What Free Tools Pair Well With The Snowflake AI Data Cloud?

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

List of ETL Tools: Explore the Top ETL Tools for 2025

What is Snowflake Horizon?

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

What are the Advantages of Using Fivetran?

What are the Advantages of Using Fivetran?

Data democratization: How data architecture can drive business decisions and AI initiatives

Top Big Data Interview Questions for 2025

How to Maximize Time to Value with Fivetran and dbt

Building a Data Culture with Snowflake: A Guide for CIOs

What Are dbt Execution Best Practices?

Drowning in Data? A Data Lake May Be Your Lifesaver

What Industries are Hiring for Different Jobs in AI

Stay Connected