2023, Data Pipeline and Data Warehouse

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

These tools provide data engineers with the necessary capabilities to efficiently extract, transform, and load (ETL) data, build data pipelines, and prepare data for analysis and consumption by other applications. Essential data engineering tools for 2023 Top 10 data engineering tools to watch out for in 2023 1.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Great Expectations provides support for different data backends such as flat file formats, SQL databases, Pandas dataframes and Sparks, and comes with built-in notification and data documentation functionality. At ODSC East 2023, we have a number of sessions related to data visualization and data exploration tools.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

phData Toolkit December 2023 Update

phData

JANUARY 10, 2024

Imagine you wanted to build a dbt project for your existing source data warehouse in your migration to Snowflake. You could leverage the data source tool to profile your source, apply a template against the generated metadata, and automatically create a dbt project with models for each table!

Data Warehouse

Data Warehouse Data Profiling Data Pipeline Database

A Primer to Scaling Pandas

ODSC - Open Data Science

AUGUST 23, 2023

Run pandas at scale on your data warehouse Most enterprise data teams store their data in a database or data warehouse, such as Snowflake, BigQuery, or DuckDB. Ponder solves this problem by translating your pandas code to SQL that can be understood by your data warehouse.

Data Warehouse

Data Warehouse Data Science Database SQL

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. ETL is vital for ensuring data quality and integrity.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Getting Started With Matillion Data Productivity Cloud

phData

NOVEMBER 28, 2023

In July 2023, Matillion launched their fully SaaS platform called Data Productivity Cloud, aiming to create a future-ready, everyone-ready, and AI-ready environment that companies can easily adopt and start automating their data pipelines coding, low-coding, or even no-coding at all. Or would you even go to that directly?

Data Warehouse

Data Warehouse Data Pipeline ETL Azure

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

The ultimate need for vast storage spaces manifests in data warehouses: specialized systems that aggregate data coming from numerous sources for centralized management and consistency. In this article, you’ll discover what a Snowflake data warehouse is, its pros and cons, and how to employ it efficiently.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

How to Ingest Salesforce Data into Snowflake Using Salesforce Sync Out

phData

SEPTEMBER 15, 2023

Salesforce Sync Out is a crucial tool that enables businesses to transfer data from their Salesforce platform to external systems like Snowflake, AWS S3, and Azure ADLS. Warehouse for loading the data (start with XSMALL or SMALL warehouses).

Data Warehouse

Data Warehouse Tableau Data Silos Analytics

What are Snowflake Dynamic Tables?

phData

NOVEMBER 2, 2023

Managing data pipelines efficiently is paramount for any organization. The Snowflake Data Cloud has introduced a groundbreaking feature that promises to simplify and supercharge this process: Snowflake Dynamic Tables. If you’re looking to leverage the full power of the Snowflake Data Cloud, let phData be your guide.

Data Pipeline

Data Pipeline SQL Data Warehouse Data Engineer

How to Slowly Change Dimensions with Snapshots in dbt

phData

MARCH 18, 2024

product_id product_name category price update_date 1 MacBook Air (15-inch, M2, 2023) Computers & Accessories $999 June 13, 2023 In the product table, we have a product, MacBook Air (15-inch, M2, 2023), and the price is $999 as of June 13, 2023. SCD Type 1 In this type, changes overwrite the existing data.

Data Warehouse

Data Warehouse SQL Data Pipeline Analytics

What is Salesforce Data Cloud for Tableau?

Tableau

DECEMBER 7, 2022

Allison (Ally) Witherspoon Johnston Senior Vice President, Product Marketing, Tableau Bronwen Boyd December 7, 2022 - 11:16pm February 14, 2023 In the quest to become a customer-focused company, the ability to quickly act on insights and deliver personalized customer experiences has never been more important.

Tableau

Tableau Data Warehouse Data Pipeline Data Visualization

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

The Ultimate Modern Data Stack Migration Guide phData Marketing July 18, 2023 This guide was co-written by a team of data experts, including Dakota Kelley, Ahmad Aburia, Sam Hall, and Sunny Yan. Imagine a world where all of your data is organized, easily accessible, and routinely leveraged to drive impactful outcomes.

Data Warehouse

Data Warehouse Analytics Analytics SQL

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Context In early 2023, Zeta’s machine learning (ML) teams shifted from traditional vertical teams to a more dynamic horizontal structure, introducing the concept of pods comprising diverse skill sets. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

It is supported by querying, governance, and open data formats to access and share data across the hybrid cloud. Through workload optimization across multiple query engines and storage tiers, organizations can reduce data warehouse costs by up to 50 percent.

AI

AI AI Machine Learning Machine Learning

What Are dbt Artifacts

phData

FEBRUARY 8, 2024

Data Modeling, dbt has gradually emerged as a powerful tool that largely simplifies the process of building and handling data pipelines. dbt is an open-source command-line tool that allows data engineers to transform, test, and document the data into one single hub which follows the best practices of software engineering.

Data Modeling

Data Modeling Data Models Data Warehouse Database

Implementing GenAI in Practice

Iguazio

JANUARY 22, 2024

You can watch the full talk this blog post is based on, which took place at ODSC West 2023, here. Feedback - Collect production data, metadata, and metrics to tune the model and application further, and to enable governance and explainability. The importance of data pipelines lies in the fact that data pipelines improve quality.

Data Pipeline

Data Pipeline ML ML Data Warehouse

Retail & CPG Questions phData Can Answer with Data

phData

JUNE 26, 2024

Cleaning and preparing the data Raw data typically shouldn’t be used in machine learning models as it’ll throw off the prediction. This can be achieved by, you guessed it, analyzing the data. What if you could know what drives them to buy your products and could use that to bring in more customers like them?

Machine Learning

Machine Learning Machine Learning Data Engineer Data Engineering

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

Traditionally, answering this question would involve multiple data exports, complex extract, transform, and load (ETL) processes, and careful data synchronization across systems. The existing Data Catalog becomes the Default catalog (identified by the AWS account number) and is readily available in SageMaker Lakehouse.

SQL

SQL Data Analyst Data Warehouse AWS

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

JANUARY 26, 2024

Reference table for which technologies to use for your FTI pipelines for each ML system. Related article How to Build ETL Data Pipelines for ML See also MLOps and FTI pipelines testing Once you have built an ML system, you have to operate, maintain, and update it. a key-value store or a low-latency relational database).

Machine Learning

Machine Learning Machine Learning ML ML

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

How to Optimize Power BI and Snowflake for Advanced Analytics Spencer Baucke May 25, 2023 The world of business intelligence and data modernization has never been more competitive than it is today. This ensures the maximum amount of Snowflake consumption possible.

Power BI

Power BI Analytics Analytics Azure

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

Dolt LakeFS Delta Lake Pachyderm Git-like versioning Database tool Data lake Data pipelines Experiment tracking Integration with cloud platforms Integrations with ML tools Examples of data version control tools in ML DVC Data Version Control DVC is a version control system for data and machine learning teams.

ML

ML ML Data Lakes Machine Learning

What Are dbt Execution Best Practices?

phData

FEBRUARY 6, 2024

One of the more common practices when developing a data pipeline is rebuilding your data for testing changes. As one of the leaders in the industry, dbt provides several options on how to execute your pipelines to increase efficiency and specifically execute what you need.

Data Pipeline

Data Pipeline Database SQL Data Warehouse

Experimenting with GenAI: Building Self-Healing CI/CD Pipelines for dbt Cloud

phData

AUGUST 22, 2024

Consider a data pipeline that detects its own failures, diagnoses the issue, and recommends the fix—all automatically. This is the potential of self-healing pipelines, and this blog explores how to implement them using dbt, Snowflake Cortex , and GitHub Actions. This output is less helpful.

SQL

SQL Data Quality Python Data Warehouse

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

What’s really important in the before part is having production-grade machine learning data pipelines that can feed your model training and inference processes. And that’s really key for taking data science experiments into production. And so that’s where we got started as a cloud data warehouse.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

What’s really important in the before part is having production-grade machine learning data pipelines that can feed your model training and inference processes. And that’s really key for taking data science experiments into production. And so that’s where we got started as a cloud data warehouse.

SQL

SQL ML ML Python

Fivetran Modern Data Stack Conference 2023: Key Takeaways

Alation

APRIL 14, 2023

In “The modern data stack is dead, long live the modern data stack!” the presenters elaborated on the common pain points of the cloud data warehouse today and predicted what it may look like in the future. So, how can a data catalog support the critical project of building data pipelines?

Data Pipeline

Data Pipeline Data Warehouse Cloud Data ETL

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

In transitional modeling, we’d add new atoms: Subject: Customer#1234 Predicate: hasEmailAddress Object: "john.new@example.com" Timestamp: 2023-07-24T10:00:00Z The old email address atoms are still there, giving us a complete history of how to contact John. Both persistent staging and data lakes involve storing large amounts of raw data.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Data Science Current

Top 10 Data Pipeline Interview Questions to Read in 2023

Essential data engineering tools for 2023: Empowering for management and analysis

Webinars

Trending Sources

11 Open Source Data Exploration Tools You Need to Know in 2023

Webinars

phData Toolkit December 2023 Update

A Primer to Scaling Pandas

Discover the Most Important Fundamentals of Data Engineering

Getting Started With Matillion Data Productivity Cloud

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

How to Ingest Salesforce Data into Snowflake Using Salesforce Sync Out

What are Snowflake Dynamic Tables?

How to Slowly Change Dimensions with Snapshots in dbt

What is Salesforce Data Cloud for Tableau?

The Ultimate Modern Data Stack Migration Guide

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Exploring the AI and data capabilities of watsonx

What Are dbt Artifacts

Implementing GenAI in Practice

Retail & CPG Questions phData Can Answer with Data

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

How to Build Machine Learning Systems With a Feature Store

Top Big Data Interview Questions for 2025

How to Optimize Power BI and Snowflake for Advanced Analytics

How to Version Control Data in ML for Various Data Sources

What Are dbt Execution Best Practices?

Experimenting with GenAI: Building Self-Healing CI/CD Pipelines for dbt Cloud

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Fivetran Modern Data Stack Conference 2023: Key Takeaways

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Stay Connected