Data Pipeline, Definition and SQL - Data Science Current

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. Introduction Data pipelines play a pivotal role in modern data architecture by seamlessly transporting and transforming raw data into valuable insights.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

Advanced Snowflake Features in Coalesce

phData

JULY 4, 2024

Because it runs Snowflake SQL from an easy-to-use, code-first GUI interface, it can take advantage of everything Snowflake offers, even if the feature is brand new. This blog will cover creating customized nodes in Coalesce, what new advanced features can already be used as nodes, and how to create them as part of your data pipeline.

SQL

SQL Data Pipeline Data Engineering Data Engineering

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

Your data scientists develop models on this component, which stores all parameters, feature definitions, artifacts, and other experiment-related information they care about for every experiment they run. I have worked with customers where R and SQL were the first-class languages of their data science community.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. You can use query_string to filter your dataset by SQL and unload it to Amazon S3.

ML

ML ML AWS Data Warehouse

Journeying into the realms of ML engineers and data scientists

Dataconomy

MAY 16, 2023

With their technical expertise and proficiency in programming and engineering, they bridge the gap between data science and software engineering. Programming skills: Data scientists should be proficient in programming languages such as Python, R, or SQL to manipulate and analyze data, automate processes, and develop statistical models.

Data Scientist

Data Scientist ML ML Machine Learning

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

To get a better grip on those changes we reviewed over 25,000 data scientist job descriptions from that past year to find out what employers are looking for in 2023. Much of what we found was to be expected, though there were definitely a few surprises. While knowing Python, R, and SQL are expected, you’ll need to go beyond that.

Data Science

Data Science Data Scientist Computer Science Computer Science

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics. Connecting directly to this semantic layer will help give customers access to critical business data in a safe, governed manner. Direct connection to Google BigQuery.

Tableau

Tableau Analytics Analytics Machine Learning

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

phData

AUGUST 2, 2024

Snowflake AI Data Cloud is one of the most powerful platforms, including storage services supporting complex data. Integrating Snowflake with dbt adds another layer of automation and control to the data pipeline. Snowflake stored procedures and dbt Hooks are essential to modern data engineering and analytics workflows.

Data Pipeline

Data Pipeline Python Database SQL

phData Toolkit February 2023 Update

phData

MARCH 1, 2023

The first one we want to talk about is the Toolkit SQL analyze command. When customers are looking to perform a migration, one of the first things that needs to occur is an assessment of the level of effort to migrate existing data definition language (DDL) and data markup language (DML).

SQL

SQL Data Pipeline Data Quality Database

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly. The following figure shows schema definition and model which reference it.

AWS

AWS Machine Learning Machine Learning ML

phData Toolkit August 2023 Update

phData

SEPTEMBER 7, 2023

Over the last month, we’ve been heavily focused on adding additional support for SQL translations to our SQL Translations tool. Specifically, we’ve been introducing fixes and features for our Microsoft SQL Server to Snowflake translation. Let’s dive in! Let’s take a look at some of the more interesting translations.

SQL

SQL Data Profiling Data Pipeline Database

phData Toolkit June 2023 Update

phData

JUNE 26, 2023

While many of our customers leverage our UI for tools like our SQL Translation or Privilege Audit tooling, there are limitations when it comes to using a UI. You wouldn’t want to pay someone (or perform yourself) to manually copy/paste each file into a browser window and copy/paste the translated SQL back.

SQL

SQL Data Profiling Data Pipeline Data Governance

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics. Connecting directly to this semantic layer will help give customers access to critical business data in a safe, governed manner. Direct connection to Google BigQuery.

Tableau

Tableau Analytics Analytics Machine Learning

Experimenting with GenAI: Building Self-Healing CI/CD Pipelines for dbt Cloud

phData

AUGUST 22, 2024

Consider a data pipeline that detects its own failures, diagnoses the issue, and recommends the fix—all automatically. This is the potential of self-healing pipelines, and this blog explores how to implement them using dbt, Snowflake Cortex , and GitHub Actions.

SQL

SQL Data Quality Python Data Warehouse

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

The primary goal of Data Engineering is to transform raw data into a structured and usable format that can be easily accessed, analyzed, and interpreted by Data Scientists, analysts, and other stakeholders. Future of Data Engineering The Data Engineering market will expand from $18.2

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

It is a process for moving and managing data from various sources to a central data warehouse. This process ensures that data is accurate, consistent, and usable for analysis and reporting. Definition and Explanation of the ETL Process ETL is a data integration method that combines data from multiple sources.

ETL

ETL Data Quality Data Pipeline Data Warehouse

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

SEPTEMBER 24, 2024

An optional CloudFormation stack to deploy a data pipeline to enable a conversation analytics dashboard. Choose an option for allowing unredacted logs for the Lambda function in the data pipeline. This allows you to control which IAM principals are allowed to decrypt the data and view it. For testing, choose yes.

AWS

AWS AI AI Analytics

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

phData

OCTOBER 25, 2024

It’s common to have terabytes of data in most data warehouses, data quality monitoring is often challenging and cost-intensive due to dependencies on multiple tools and eventually ignored. This results in poor credibility and data consistency after some time, leading businesses to mistrust the data pipelines and processes.

Data Quality

Data Quality Data Pipeline Data Governance Database

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

The June 2021 release of Power BI Desktop introduced Custom SQL queries to Snowflake in DirectQuery mode. While the loss of certain DAX functions is definitely a shortcoming that we hope Microsoft will address in the near future, the impact of these lost DAX functions is not necessarily as big as you would expect.

Power BI

Power BI Analytics Analytics Azure

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

Generative AI can be used to automate the data modeling process by generating entity-relationship diagrams or other types of data models and assist in UI design process by generating wireframes or high-fidelity mockups. GPT-4 Data Pipelines: Transform JSON to SQL Schema Instantly Blockstream’s public Bitcoin API.

AI

AI AI Data Analysis Data Analysis

Schema Detection and Evolution in Snowflake

phData

MARCH 1, 2024

Sample CSV files (download files here ) Step 1: Load Sample CSV Files Into the Internal Stage Location Open the SQL worksheet and create a stage if it doesn’t exist. From the homepage: Data > Databases > Select your database/schema and select stages. Go back to the SQL worksheet and verify if the files exist.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

How to Load Google Analytics 4 Dataset into Snowflake with BigQuery & Azure Data Factory

phData

SEPTEMBER 5, 2023

Let’s briefly look at the key components and their roles in this process: Azure Data Factory (ADF) : ADF will serve as our data orchestration and integration platform. It enables us to create, schedule, and monitor the data pipeline, ensuring seamless movement of data between the various sources and destinations.

Azure

Azure Analytics Analytics Data Pipeline

Beginner’s Guide To GCP BigQuery (Part 2)

Mlearning.ai

JULY 10, 2023

In case of complex data pipelines, a combination of Materialized Views, Stored Procedures, and Scheduled Queries could be a better choice than to solely rely on Scheduled Queries by itself. To create a Scheduled Query, the initial step is to ensure your SQL is accurately entered in the Query Editor.

SQL

SQL Database Database Administration Data Lakes

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

Image generated with Midjourney In today’s fast-paced world of data science, building impactful machine learning models relies on much more than selecting the best algorithm for the job. Data scientists and machine learning engineers need to collaborate to make sure that together with the model, they develop robust data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Here’s the structured equivalent of this same data in tabular form: With structured data, you can use query languages like SQL to extract and interpret information. In contrast, such traditional query languages struggle to interpret unstructured data. This text has a lot of information, but it is not structured.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

Data pipeline orchestration. Support for languages and SQL. Moving/integrating data in the cloud/data exploration and quality assessment. It’s not a simple definition. Migration leaders would be wise to enable all the enhancements a cloud environment offers, including: Special requirements for AI/ML.

Data Governance

Data Governance ML ML Cloud Data

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

A legacy data stack usually refers to the traditional relational database management system (RDBMS), which uses a structured query language (SQL) to store and process data. While an RDBMS can still be used in a modern data stack, it is not as common because it is not as well-suited for managing big data.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

To establish trust between the data producers and data consumers, SageMaker Catalog also integrates the data quality metrics and data lineage events to track and drive transparency in data pipelines. Create a SageMaker Unified Studio domain and three projects using the SQL analytics project profile.

SQL

SQL Data Analyst Data Warehouse AWS

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Support for Numerous Data Sources: Fivetran supports over 200 data sources, including popular databases, applications, and cloud platforms like Salesforce, Google Analytics, SQL Server, Snowflake, and many more. Additionally, unsupported data sources can be integrated using Fivetran’s cloud function connectors.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

What Orchestration Tools Help Data Engineers in Snowflake

phData

AUGUST 17, 2023

Data pipeline orchestration tools are designed to automate and manage the execution of data pipelines. These tools help streamline and schedule data movement and processing tasks, ensuring efficient and reliable data flow. What are Orchestration Tools?

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Some modern CDPs are starting to incorporate these concepts, allowing for more flexible and evolving customer data models. It also requires a shift in how we query our customer data. Instead of simple SQL queries, we often need to use more complex temporal query languages or rely on derived views for simpler querying.

Data Models

Data Models Data Modeling Apache Kafka Data Lakes

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

AWS Machine Learning Blog

OCTOBER 24, 2024

The agent can generate SQL queries using natural language questions using a database schema DDL (data definition language for SQL) and execute them against a database instance for the database tier. Make sure to add a semicolon after the end of the SQL statement generated. Generate UI and backend code with LLMs.

AWS

AWS SQL Database AI

Data science

Dataconomy

MARCH 19, 2025

Data science is an interdisciplinary field that utilizes advanced analytics techniques to extract meaningful insights from vast amounts of data. This helps facilitate data-driven decision-making for businesses, enabling them to operate more efficiently and identify new opportunities.

Data Science

Data Science Citizen Data Scientist Data Scientist Machine Learning

Data Science Current

Build Data Pipelines: Comprehensive Step-by-Step Guide

Advanced Snowflake Features in Coalesce

Webinars

Trending Sources

Definite Guide to Building a Machine Learning Platform

Webinars

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Journeying into the realms of ML engineers and data scientists

40 Must-Know Data Science Skills and Frameworks for 2023

Self-Service Analytics for Google Cloud, now with Looker and Tableau

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

phData Toolkit February 2023 Update

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

phData Toolkit August 2023 Update

phData Toolkit June 2023 Update

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Experimenting with GenAI: Building Self-Healing CI/CD Pipelines for dbt Cloud

10 Best Data Engineering Books [Beginners to Advanced]

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

How to Optimize Power BI and Snowflake for Advanced Analytics

Generative AI in Software Development

Schema Detection and Evolution in Snowflake

How to Load Google Analytics 4 Dataset into Snowflake with BigQuery & Azure Data Factory

Beginner’s Guide To GCP BigQuery (Part 2)

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

How to Manage Unstructured Data in AI and Machine Learning Projects

The Cloud Connection: How Governance Supports Security

The Modern Data Stack Explained: What The Future Holds

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

The Ultimate Modern Data Stack Migration Guide

What Orchestration Tools Help Data Engineers in Snowflake

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

Data science

Stay Connected