Cloud Data, Data Engineer and Data Pipeline

Databricks Named a Leader in Stream Processing and Cloud Data Pipelines

databricks

JULY 8, 2024

We are proud to announce two new analyst reports recognizing Databricks in the data engineering and data streaming space: IDC MarketScape: Worldwide Analytic.

Data Pipeline

Data Pipeline Cloud Data Data Engineer Data Engineering

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

How Cloud Data Platforms improve Shopfloor Management

Data Science Blog

FEBRUARY 4, 2023

The fusion of data in a central platform enables smooth analysis to optimize processes and increase business efficiency in the world of Industry 4.0 using methods from business intelligence , process mining and data science. Cloud Data Platform for shopfloor management and data sources such like MES, ERP, PLM and machine data.

Cloud Data

Cloud Data Data Science Business Intelligence Business Intelligence

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

These experiences facilitate professionals from ingesting data from different sources into a unified environment and pipelining the ingestion, transformation, and processing of data to developing predictive models and analyzing the data by visualization in interactive BI reports.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow. Organizations can harness the full potential of their data while reducing risk and lowering costs.

Data Pipeline

Data Pipeline ETL SQL Database

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineering

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

This article explores the importance of ETL pipelines in machine learning, a hands-on example of building ETL pipelines with a popular tool, and suggests the best ways for data engineers to enhance and sustain their pipelines. What is an ETL data pipeline in ML?

ETL

ETL Data Pipeline ML ML

Announcing the 2024 Data Engineering & Ai X Innovation Summits

ODSC - Open Data Science

JANUARY 2, 2024

We couldn’t be more excited to announce two events that will be co-located with ODSC East in Boston this April: The Data Engineering Summit and the Ai X Innovation Summit. Data Engineering Summit Our second annual Data Engineering Summit will be in-person for the first time! Learn more about them below.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

Data engineering has become an integral part of the modern tech landscape, driving advancements and efficiencies across industries. So let’s explore the world of open-source tools for data engineers, shedding light on how these resources are shaping the future of data handling, processing, and visualization.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Migrating to the cloud? Follow these steps to encourage success

Smart Data Collective

JUNE 20, 2022

When data leaders move to the cloud, it’s easy to get caught up in the features and capabilities of various cloud services without thinking about the day-to-day workflow of data scientists and data engineers. Failing to make production data accessible in the cloud.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

In recent years, data engineering teams working with the Snowflake Data Cloud platform have embraced the continuous integration/continuous delivery (CI/CD) software development process to develop data products and manage ETL/ELT workloads more efficiently. What Are the Benefits of CI/CD Pipeline For Snowflake?

Data Pipeline

Data Pipeline Database SQL Data Engineering

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

Fivetran is an automated data integration platform that offers a convenient solution for businesses to consolidate and sync data from disparate data sources. With over 160 data connectors available, Fivetran makes it easy to move data out of, into, and across any cloud data platform in the market.

Data Warehouse

Data Warehouse Data Engineering Data Engineering Data Engineering

How The Explosive Growth Of Data Access Affects Your Engineer’s Team Efficiency

Smart Data Collective

OCTOBER 17, 2022

Engineering teams, in particular, can quickly get overwhelmed by the abundance of information pertaining to competition data, new product and service releases, market developments, and industry trends, resulting in information anxiety. Explosive data growth can be too much to handle. Can’t get to the data.

Big Data

Big Data Big Data Data Engineering Data Engineering

Where Does Fivetran Fit into The Modern Data Stack?

phData

JULY 17, 2023

Over the past few decades, the corporate data landscape has changed significantly. The shift from on-premise databases and spreadsheets to the modern era of cloud data warehouses and AI/ LLMs has transformed what businesses can do with data. This is where Fivetran and the Modern Data Stack come in.

Data Warehouse

Data Warehouse Data Pipeline Cloud Data ETL

On-Prem vs. The Cloud: Key Considerations

phData

FEBRUARY 21, 2025

With a traditional on-prem data warehouse, an organization will face more substantial Capital Expenditures (CapEx), or one-time costs, such as infrastructure setup, network configuration, and investments in servers and storage devices. When investing in a cloud data warehouse, the Operational Expenditures (OpEx) will be larger.

Data Warehouse

Data Warehouse Cloud Data ETL Cloud Computing

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

JuMa is tightly integrated with a range of BMW Central IT services, including identity and access management, roles and rights management, BMW Cloud Data Hub (BMW’s data lake on AWS) and on-premises databases. He works closely with enterprise customers to design data platforms and build advanced analytics and ML use cases.

ML

ML ML AWS AI

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Big Data Technologies : Handling and processing large datasets using tools like Hadoop, Spark, and cloud platforms such as AWS and Google Cloud. Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

How to Connect Snowflake to Python

phData

JANUARY 5, 2023

Python is the top programming language used by data engineers in almost every industry. Python has proven proficient in setting up pipelines, maintaining data flows, and transforming data with its simple syntax and proficiency in automation. Truly a must-have tool in your data engineering arsenal!

Python

Python Data Engineering Data Engineering Data Engineer

The Data Engineer’s Roadmap

Dataversity

SEPTEMBER 28, 2022

Data engineering is a fascinating and fulfilling career – you are at the helm of every business operation that requires data, and as long as users generate data, businesses will always need data engineers. The journey to becoming a successful data engineer […].

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Fivetran Announces dbt Cloud Integration

phData

FEBRUARY 13, 2024

Fivetran Fivetran is a leading automated data integration service, providing businesses with an efficient way to move and centralize data from all their sources. Boasting nearly 500 pre-built data connectors, Fivetran simplifies transferring data to, from, and within any cloud data platform available today.

Data Pipeline

Data Pipeline Cloud Data Data Engineering Data Engineering

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

Accenture calls it the Intelligent Data Foundation (IDF), and it’s used by dozens of enterprises with very complex data landscapes and analytic requirements. Simply put, IDF standardizes data engineering processes. IDF works natively on cloud platforms like AWS. How the IDF Supports a Smarter Data Pipeline.

DataOps

DataOps Data Pipeline Data Engineering Data Engineering

The Audience for Data Catalogs and Data Intelligence

Alation

JUNE 21, 2022

Why start with a data source and build a visualization, if you can just find a visualization that already exists, complete with metadata about it? Data scientists went beyond database tables to data lakes and cloud data stores. Data scientists want to catalog not just information sources, but models.

DataOps

DataOps Data Scientist Data Quality Data Pipeline

Retail & CPG Questions phData Can Answer with Data

phData

JUNE 26, 2024

Cleaning and preparing the data Raw data typically shouldn’t be used in machine learning models as it’ll throw off the prediction. Data engineers can prepare the data by removing duplicates, dealing with outliers, standardizing data types and precision between data sets, and joining data sets together.

Machine Learning

Machine Learning Machine Learning Data Engineering Data Engineering

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Alation

APRIL 4, 2023

However, the race to the cloud has also created challenges for data users everywhere, including: Cloud migration is expensive, migrating sensitive data is risky, and navigating between on-prem sources is often confusing for users. To build effective data pipelines, they need context (or metadata) on every source.

DataOps

DataOps Data Engineering Data Engineering Data Engineering

Getting Started With Matillion Data Productivity Cloud

phData

NOVEMBER 28, 2023

In July 2023, Matillion launched their fully SaaS platform called Data Productivity Cloud, aiming to create a future-ready, everyone-ready, and AI-ready environment that companies can easily adopt and start automating their data pipelines coding, low-coding, or even no-coding at all.

Data Warehouse

Data Warehouse Data Pipeline ETL Azure

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently. Below are the best practices.

ETL

ETL Data Warehouse SQL Database

Manufacturing Questions phData Can Answer with Data

phData

JULY 18, 2024

Many data engineering consulting companies could also answer these questions for you, or maybe you think you have the talent on your team to do it in-house. Many data engineering consulting companies could also answer these questions for you, or maybe you think you have the talent on your team to do it in-house.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Engineering

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

However, if there’s one thing we’ve learned from years of successful cloud data implementations here at phData, it’s the importance of: Defining and implementing processes Building automation, and Performing configuration …even before you create the first user account. Use with caution, and test before committing to using them.

Clustering

Clustering Database SQL Data Pipeline

Top 5 Use Cases of phData’s Advisor Tool

phData

MARCH 29, 2024

Founded in 2014 by three leading cloud engineers, phData focuses on solving real-world data engineering, operations, and advanced analytics problems with the best cloud platforms and products. Over the years, one of our primary focuses became Snowflake and migrating customers to this leading cloud data platform.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Top 5 Fivetran Connectors For Financial Services

phData

JANUARY 24, 2024

Understanding Fivetran Fivetran is a user-friendly, code-free platform enabling customers to easily synchronize their data by automating extraction, transformation, and loading from many sources. Fivetran automates the time-consuming steps of the ELT process so your data engineers can focus on more impactful projects.

Data Warehouse

Data Warehouse Data Pipeline Data Governance Cloud Data

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These tools are used to manage big data, which is defined as data that is too large or complex to be processed by traditional means. How Did the Modern Data Stack Get Started? The rise of cloud computing and cloud data warehousing has catalyzed the growth of the modern data stack. Data scientists.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

The Snowflake Data Cloud is a leading cloud data platform that provides various features and services for data storage, processing, and analysis. A new feature that Snowflake offers is called Snowpark, which provides an intuitive library for querying and processing data at scale in Snowflake.

Python

Python ML ML SQL

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.). Streamsets Data Collector StreamSets Data Collector Engine is an easy-to-use data pipeline engine for streaming, CDC, and batch ingestion from any source to any destination. The biggest reason is the ease of use.

Data Warehouse

Data Warehouse Azure AWS Database

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Thus, the solution allows for scaling data workloads independently from one another and seamlessly handling data warehousing, data lakes , data sharing, and engineering. Furthermore, a shared-data approach stems from this efficient combination. Simplify and Win Experienced data engineers value simplicity.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities. Savings may vary depending on configurations, workloads and vendor.

AI

AI AI Machine Learning Machine Learning

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

One big issue that contributes to this resistance is that although Snowflake is a great cloud data warehousing platform, Microsoft has a data warehousing tool of its own called Synapse. Gateways are being used as another layer of security between Snowflake or cloud data source and Power BI users.

Power BI

Power BI Analytics Analytics Azure

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

What’s really important in the before part is having production-grade machine learning data pipelines that can feed your model training and inference processes. And that’s really key for taking data science experiments into production. And so that’s where we got started as a cloud data warehouse.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

What’s really important in the before part is having production-grade machine learning data pipelines that can feed your model training and inference processes. And that’s really key for taking data science experiments into production. And so that’s where we got started as a cloud data warehouse.

SQL

SQL ML ML Python

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Both persistent staging and data lakes involve storing large amounts of raw data. But persistent staging is typically more structured and integrated into your overall customer data pipeline. It’s not just a dumping ground for data, but a crucial step in your customer data processing workflow.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Tayo Olajide is a seasoned Cloud Data Engineering generalist with over a decade of experience in architecting and implementing data solutions in cloud environments. Outside of work, he loves watching Formula1, playing badminton, and racing Go Karts.

AWS

AWS ML ML Machine Learning

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

Modern low-code/no-code ETL tools allow data engineers and analysts to build pipelines seamlessly using a drag-and-drop and configure approach with minimal coding. Matillion ETL for Snowflake is an ELT/ETL tool that allows for the ingestion, transformation, and building of analytics for data in the Snowflake AI Data Cloud.

Python

Python ETL AWS Database

What is the Snowflake Data Cloud and How Much Does it Cost?

phData

NOVEMBER 9, 2023

Snowflake’s Data Cloud has emerged as a leader in cloud data warehousing. As a fundamental piece of the modern data stack , Snowflake is helping thousands of businesses store, transform, and derive insights from their data easier, faster, and more efficiently than ever before.

Data Warehouse

Data Warehouse Data Lakes Clustering Cloud Data

How to Maximize Time to Value with Fivetran and dbt

phData

OCTOBER 17, 2023

In our previous blog , we discussed how Fivetran and dbt scale for any data volume and workload, both small and large. Now, you might be wondering what these tools can do for your data team and the efficiency of your organization as a whole. Can these tools help reduce the time our data engineers spend fixing things?

ETL

ETL Data Pipeline Data Engineering Data Engineer

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. First up, let’s dive into the foundation of every Modern Data Stack, a cloud-based data warehouse.

Data Warehouse

Data Warehouse Analytics Analytics SQL

Databricks Named a Leader in Stream Processing and Cloud Data Pipelines

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Trending Sources

How Cloud Data Platforms improve Shopfloor Management

Webinars

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

The power of remote engine execution for ETL/ELT data pipelines

How to Build Effective Data Pipelines in Snowpark

How to Build ETL Data Pipeline in ML

Announcing the 2024 Data Engineering & Ai X Innovation Summits

11 Open-Source Data Engineering Tools Every Pro Should Use

Migrating to the cloud? Follow these steps to encourage success

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

What Is Fivetran and How Much Does It Cost?

How The Explosive Growth Of Data Access Affects Your Engineer’s Team Efficiency

Where Does Fivetran Fit into The Modern Data Stack?

On-Prem vs. The Cloud: Key Considerations

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

A Guide to Choose the Best Data Science Bootcamp

How to Connect Snowflake to Python

The Data Engineer’s Roadmap

Fivetran Announces dbt Cloud Integration

Turnkey Cloud DataOps: Solution from Alation and Accenture

The Audience for Data Catalogs and Data Intelligence

Retail & CPG Questions phData Can Answer with Data

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Getting Started With Matillion Data Productivity Cloud

Best Practices When Developing Matillion Jobs

Manufacturing Questions phData Can Answer with Data

Getting Started With Snowflake: Best Practices For Launching

Top 5 Use Cases of phData’s Advisor Tool

Top 5 Fivetran Connectors For Financial Services

The Modern Data Stack Explained: What The Future Holds

How Does Snowpark Work?

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Exploring the AI and data capabilities of watsonx

How to Optimize Power BI and Snowflake for Advanced Analytics

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Top 10 Python Scripts for use in Matillion for Snowflake

What is the Snowflake Data Cloud and How Much Does it Cost?

How to Maximize Time to Value with Fivetran and dbt

The Ultimate Modern Data Stack Migration Guide

Stay Connected