Cloud Data, Data Pipeline and SQL - Data Science Current

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

A provisioned or serverless Amazon Redshift data warehouse. Basic knowledge of a SQL query editor. Implementation steps Load data to the Amazon Redshift cluster Connect to your Amazon Redshift cluster using Query Editor v2. For this post we’ll use a provisioned Amazon Redshift cluster. A SageMaker domain.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

Let’s explore each of these components and its application in the sales domain: Synapse Data Engineering: Synapse Data Engineering provides a powerful Spark platform designed for large-scale data transformations through Lakehouse. Here, we changed the data types of columns and dealt with missing values.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow. With a multicloud data strategy, organizations need to optimize for data gravity and data locality.

Data Pipeline

Data Pipeline ETL SQL Database

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. Choose Delete stack.

ETL

ETL Data Warehouse Analytics Analytics

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

This tool democratizes data access across the organization, enabling even nontechnical users to gain valuable insights. A standout application is the SQL-to-natural language capability, which translates complex SQL queries into plain English and vice versa, bridging the gap between technical and business teams.

AWS

AWS Data Governance Data Silos SQL

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineering

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

Snowflake’s cloud-agnosticism, separation of storage and compute resources, and ability to handle semi-structured data have exemplified Snowflake as the best-in-class cloud data warehousing solution. Snowflake supports data sharing and collaboration across organizations without the need for complex data pipelines.

Machine Learning

Machine Learning Machine Learning Data Science ML

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

which play a crucial role in building end-to-end data pipelines, to be included in your CI/CD pipelines. Each migration SQL script is assigned a unique sequence number to facilitate the correct order of application. Additionally, we need to incorporate Flyway variables into the Flyway configuration file.

Data Pipeline

Data Pipeline Database SQL Data Engineering

How Databricks and Tableau customers are fueling innovation with data lakehouse architecture

Tableau

JUNE 8, 2021

Domain experts, for example, feel they are still overly reliant on core IT to access the data assets they need to make effective business decisions. In all of these conversations there is a sense of inertia: Data warehouses and data lakes feel cumbersome and data pipelines just aren't agile enough.

Tableau

Tableau Data Lakes Data Warehouse SQL

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Big Data Technologies : Handling and processing large datasets using tools like Hadoop, Spark, and cloud platforms such as AWS and Google Cloud. Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Where Does Fivetran Fit into The Modern Data Stack?

phData

JULY 17, 2023

Over the past few decades, the corporate data landscape has changed significantly. The shift from on-premise databases and spreadsheets to the modern era of cloud data warehouses and AI/ LLMs has transformed what businesses can do with data. This is where Fivetran and the Modern Data Stack come in.

Data Warehouse

Data Warehouse Data Pipeline Cloud Data ETL

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

Fivetran is an automated data integration platform that offers a convenient solution for businesses to consolidate and sync data from disparate data sources. With over 160 data connectors available, Fivetran makes it easy to move data out of, into, and across any cloud data platform in the market.

Data Warehouse

Data Warehouse Data Engineering Data Engineering Data Engineer

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

Our continued investments in connectivity with Google technologies help ensure your data is secure, governed, and scalable. Tableau’s lightning-fast Google BigQuery connector allows customers to engineer optimized data pipelines with direct connections that power business-critical reporting. Direct connection to Google BigQuery.

Tableau

Tableau Analytics Analytics Machine Learning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, ML, and application development.

ML

ML ML AWS Data Warehouse

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Fivetran enables healthcare organizations to ingest data securely and effectively from a variety of sources into their target destinations, such as Snowflake or other cloud data platforms, for further analytics or curation for sharing data with external providers or customers.

SQL

SQL Data Warehouse Azure Cloud Data

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

However, if there’s one thing we’ve learned from years of successful cloud data implementations here at phData, it’s the importance of: Defining and implementing processes Building automation, and Performing configuration …even before you create the first user account. And once again, for loading data, do not use SQL Inserts.

Clustering

Clustering Database SQL Data Pipeline

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

For those unfamiliar with GIT or GIT practices, please refer Git for Business Users with Matillion DPC What is a Matillion Pipeline? A Matillion pipeline is a collection of jobs that extract, load, and transform (ETL/ELT) data from various sources into a target system, such as a cloud data warehouse like Snowflake.

AI

AI AI SQL ETL

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently. Below are the best practices.

ETL

ETL Data Warehouse SQL Database

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

Our continued investments in connectivity with Google technologies help ensure your data is secure, governed, and scalable. . Tableau’s lightning-fast Google BigQuery connector allows customers to engineer optimized data pipelines with direct connections that power business-critical reporting.

Tableau

Tableau Analytics Analytics Machine Learning

How Databricks and Tableau customers are fueling innovation with data lakehouse architecture

Tableau

JUNE 8, 2021

Domain experts, for example, feel they are still overly reliant on core IT to access the data assets they need to make effective business decisions. In all of these conversations there is a sense of inertia: Data warehouses and data lakes feel cumbersome and data pipelines just aren't agile enough.

Tableau

Tableau Data Lakes Data Warehouse SQL

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

One big issue that contributes to this resistance is that although Snowflake is a great cloud data warehousing platform, Microsoft has a data warehousing tool of its own called Synapse. The June 2021 release of Power BI Desktop introduced Custom SQL queries to Snowflake in DirectQuery mode.

Power BI

Power BI Analytics Analytics Azure

What are the Biggest Challenges with Migrating to Snowflake?

phData

FEBRUARY 5, 2024

The tool converts the templated configuration into a set of SQL commands that are executed against the target Snowflake environment. Replicate can interact with a wide variety of databases, data warehouses, and data lakes (on-premise or based in the cloud). It is also a helpful tool for learning a new SQL dialect.

SQL

SQL Database Data Quality Data Warehouse

Orchestrating Coalesce Pipelines Using Fivetran

phData

AUGUST 14, 2024

It provides businesses with an efficient way to move and centralize data from all their sources. Boasting over 500 pre-built data connectors, Fivetran simplifies transferring data to, from, and within any cloud data platform available today. What Does Fivetran’s Coalesce Orchestration Integration Do?

Data Warehouse

Data Warehouse Data Pipeline Cloud Data SQL

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Alation

APRIL 4, 2023

However, the race to the cloud has also created challenges for data users everywhere, including: Cloud migration is expensive, migrating sensitive data is risky, and navigating between on-prem sources is often confusing for users. This empowers users to judge data’s quality and fitness for purpose quickly.

DataOps

DataOps Data Engineering Data Engineering Data Engineer

How Fivetran + dbt provides Enterprise Scale to ELT Pipelines

phData

OCTOBER 12, 2023

When the data or pipeline configuration needs to be changed, tools like Fivetran and dbt reduce the time required to make the change, and increase the confidence your team can have around the change. These allow you to scale your pipelines quickly. Governance doesn’t have to be scary or preventative to your cloud data warehouse.

Data Warehouse

Data Warehouse Database Cloud Data Data Pipeline

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These tools are used to manage big data, which is defined as data that is too large or complex to be processed by traditional means. How Did the Modern Data Stack Get Started? The rise of cloud computing and cloud data warehousing has catalyzed the growth of the modern data stack.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

The Snowflake Data Cloud is a leading cloud data platform that provides various features and services for data storage, processing, and analysis. A new feature that Snowflake offers is called Snowpark, which provides an intuitive library for querying and processing data at scale in Snowflake.

Python

Python ML ML SQL

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

It supports complex data transformations and offers advanced features like data quality management and metadata management. PowerCenter is particularly favored by large organizations with extensive data integration needs. It allows users to define complex workflows as code and provides scheduling capabilities.

ETL

ETL Data Warehouse AWS Business Intelligence

Retail & CPG Questions phData Can Answer with Data

phData

JUNE 26, 2024

Cleaning and preparing the data Raw data typically shouldn’t be used in machine learning models as it’ll throw off the prediction. This can be achieved by, you guessed it, analyzing the data. phData Retail Case Study phData helps many retail businesses answer these questions and more by utilizing their data to the fullest.

Machine Learning

Machine Learning Machine Learning Data Engineering Data Engineering

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Thus, the solution allows for scaling data workloads independently from one another and seamlessly handling data warehousing, data lakes , data sharing, and engineering. Data warehousing is a vital constituent of any business intelligence operation. Simplify and Win Experienced data engineers value simplicity.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

What are the Advantages of Using Fivetran?

phData

JULY 19, 2023

In this blog post, we’ll dive into the amazing advantages of using Fivetran , a powerful data integration platform that will revolutionize the way you handle your data pipelines. They established an Information Architecture for Snowflake Data Cloud , enabling automated database and role creation.

Data Warehouse

Data Warehouse Database Data Pipeline Cloud Data

What are the Advantages of Using Fivetran?

phData

JULY 19, 2023

In this blog post, we’ll dive into the amazing advantages of using Fivetran , a powerful data integration platform that will revolutionize the way you handle your data pipelines. They established an Information Architecture for Snowflake Data Cloud , enabling automated database and role creation.

Data Warehouse

Data Warehouse Database Data Pipeline Cloud Data

Manufacturing Questions phData Can Answer with Data

phData

JULY 18, 2024

Snowflake 2024 Partner of the Year Fivetran 2024 Partner of the Year dbt 2023 Partner of the Year Alation 2024 SI Partner of the Year Automation With all our experience with projects, we have created in-house automation tools for many DE tasks, especially when using Snowflake as your cloud data provider.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Engineering

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

This two-part series will explore how data discovery, fragmented data governance , ongoing data drift, and the need for ML explainability can all be overcome with a data catalog for accurate data and metadata record keeping. The Cloud Data Migration Challenge. Data pipeline orchestration.

Data Governance

Data Governance ML ML Cloud Data

How to Maximize Time to Value with Fivetran and dbt

phData

OCTOBER 17, 2023

Fivetran also takes care of all the manual elements of building and maintaining a data pipeline that is not business-related so that data teams don’t have to. With dbt, transforming the data according to business logic becomes easy. dbt allows you to write templated SQL using Jinja to create macros.

ETL

ETL Data Pipeline Data Engineering Data Engineering

Top 5 Use Cases of phData’s Advisor Tool

phData

MARCH 29, 2024

Founded in 2014 by three leading cloud engineers, phData focuses on solving real-world data engineering, operations, and advanced analytics problems with the best cloud platforms and products. Over the years, one of our primary focuses became Snowflake and migrating customers to this leading cloud data platform.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.). Streamsets Data Collector StreamSets Data Collector Engine is an easy-to-use data pipeline engine for streaming, CDC, and batch ingestion from any source to any destination.

Data Warehouse

Data Warehouse Azure AWS Database

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Real-time analytics and BI: Combine data from existing sources with new data to unlock new, faster insights without the cost and complexity of duplicating and moving data across different environments. The post Exploring the AI and data capabilities of watsonx appeared first on IBM Blog.

AI

AI AI Machine Learning Machine Learning

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. First up, let’s dive into the foundation of every Modern Data Stack, a cloud-based data warehouse.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

The Data Engineer’s Roadmap

Dataversity

SEPTEMBER 28, 2022

Data engineering is a fascinating and fulfilling career – you are at the helm of every business operation that requires data, and as long as users generate data, businesses will always need data engineers. The journey to becoming a successful data engineer […]. In other words, job security is guaranteed.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

However, if the tool supposes an option where we can write our custom programming code to implement features that cannot be achieved using the drag-and-drop components, it broadens the horizon of what we can do with our data pipelines. JV_STAGING_TBL} Here is what the outline of the pipeline looks like.

Python

Python ETL AWS Database

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Some modern CDPs are starting to incorporate these concepts, allowing for more flexible and evolving customer data models. It also requires a shift in how we query our customer data. Instead of simple SQL queries, we often need to use more complex temporal query languages or rely on derived views for simpler querying.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Trending Sources

The power of remote engine execution for ETL/ELT data pipelines

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Shaping the future: OMRON’s data-driven journey with AWS

How to Build Effective Data Pipelines in Snowpark

How Dataiku and Snowflake Strengthen the Modern Data Stack

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

How Databricks and Tableau customers are fueling innovation with data lakehouse architecture

A Guide to Choose the Best Data Science Bootcamp

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Where Does Fivetran Fit into The Modern Data Stack?

What Is Fivetran and How Much Does It Cost?

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Top 5 Fivetran Connectors for Healthcare

Getting Started With Snowflake: Best Practices For Launching

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Best Practices When Developing Matillion Jobs

Self-Service Analytics for Google Cloud, now with Looker and Tableau

How Databricks and Tableau customers are fueling innovation with data lakehouse architecture

How to Optimize Power BI and Snowflake for Advanced Analytics

What are the Biggest Challenges with Migrating to Snowflake?

Orchestrating Coalesce Pipelines Using Fivetran

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

How Fivetran + dbt provides Enterprise Scale to ELT Pipelines

The Modern Data Stack Explained: What The Future Holds

How Does Snowpark Work?

List of ETL Tools: Explore the Top ETL Tools for 2025

Retail & CPG Questions phData Can Answer with Data

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

What are the Advantages of Using Fivetran?

What are the Advantages of Using Fivetran?

Manufacturing Questions phData Can Answer with Data

The Cloud Connection: How Governance Supports Security

How to Maximize Time to Value with Fivetran and dbt

Top 5 Use Cases of phData’s Advisor Tool

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Exploring the AI and data capabilities of watsonx

The Ultimate Modern Data Stack Migration Guide

The Data Engineer’s Roadmap

Top 10 Python Scripts for use in Matillion for Snowflake

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Stay Connected