Blog, Data Pipeline and SQL - Data Science Current

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. They transform data into a consistent format for users to consume.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis.

ETL

ETL Data Warehouse Analytics Analytics

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow. With a multicloud data strategy, organizations need to optimize for data gravity and data locality.

Data Pipeline

Data Pipeline ETL SQL Database

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

Managing and retrieving the right information can be complex, especially for data analysts working with large data lakes and complex SQL queries. This tool converts questions from data analysts asked in natural language (such as “Which table contains customer address information?”)

SQL

SQL Data Lakes Data Analyst AWS

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

AWS Machine Learning Blog

DECEMBER 6, 2023

To overcome these limitations, we propose a solution that combines RAG with metadata and entity extraction, SQL querying, and LLM agents, as described in the following sections. Typically, these analytical operations are done on structured data, using tools such as pandas or SQL engines.

SQL

SQL AWS Analytics Analytics

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

The blog post explains how the Internal Cloud Analytics team leveraged cloud resources like Code-Engine to improve, refine, and scale the data pipelines. Background One of the Analytics teams tasks is to load data from multiple sources and unify it into a data warehouse. Thus, it has only a minimal footprint.

ETL

ETL Data Pipeline Database Data Warehouse

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. Introduction Data pipelines play a pivotal role in modern data architecture by seamlessly transporting and transforming raw data into valuable insights.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineer Data Engineering

Understanding ETL Tools as a Data-Centric Organization

Smart Data Collective

SEPTEMBER 8, 2021

A lot of Open-Source ETL tools house a graphical interface for executing and designing Data Pipelines. It can be used to manipulate, store, and analyze data of any structure. It generates Java code for the Data Pipelines instead of running Pipeline configurations through an ETL Engine. Conclusion.

ETL

ETL Hadoop Data Warehouse Data Pipeline

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

A provisioned or serverless Amazon Redshift data warehouse. Basic knowledge of a SQL query editor. Implementation steps Load data to the Amazon Redshift cluster Connect to your Amazon Redshift cluster using Query Editor v2. For this post we’ll use a provisioned Amazon Redshift cluster. A SageMaker domain.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

Using structured data to answer questions requires a way to effectively extract data that’s relevant to a user’s query. We formulated a text-to-SQL approach where by a user’s natural language query is converted to a SQL statement using an LLM. The SQL is run by Amazon Athena to return the relevant data.

SQL

SQL AWS AI AI

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

Ultimately, the goal of a CI/CD pipeline is to ensure the safe deployment of new changes to both Snowflake’s non-production and production environments. In this blog, we will explore the benefits of enabling the CI/CD pipeline for database platforms.

Data Pipeline

Data Pipeline Database SQL Data Engineer

How to Translate SQL Scripts Into Matillion Jobs

phData

JULY 12, 2023

Unlike traditional methods that rely on complex SQL queries for orchestration, Matillion Jobs provides a more streamlined approach. By converting SQL scripts into Matillion Jobs , users can take advantage of the platform’s advanced features for job orchestration, scheduling, and sharing. With that, let’s dive in!

SQL

SQL ETL Database Data Pipeline

How to Translate SQL Scripts Into Matillion Jobs

phData

APRIL 21, 2023

Unlike traditional methods that rely on complex SQL queries for orchestration, Matillion Jobs provide a more streamlined approach. By converting SQL scripts into Matillion Jobs , users can take advantage of the platform’s advanced features for job orchestration, scheduling, and sharing. In our case, this table is “orders.”

SQL

SQL ETL Database Data Pipeline

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

In this blog, we will explore the arena of data science bootcamps and lay down a guide for you to choose the best data science bootcamp. What do Data Science Bootcamps Offer? Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Advanced Snowflake Features in Coalesce

phData

JULY 4, 2024

Because it runs Snowflake SQL from an easy-to-use, code-first GUI interface, it can take advantage of everything Snowflake offers, even if the feature is brand new. This blog will cover creating customized nodes in Coalesce, what new advanced features can already be used as nodes, and how to create them as part of your data pipeline.

SQL

SQL Data Pipeline Data Engineer Data Engineering

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

In our previous blog, Top 5 Fivetran Connectors for Financial Services , we explored Fivetran’s capabilities that address the data integration needs of the finance industry. Now, let’s cover the healthcare industry, which also has a surging demand for data and analytics, along with the underlying processes to make it happen.

SQL

SQL Data Warehouse Azure Cloud Data

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

In order to train a model using data stored outside of the three supported storage services, the data first needs to be ingested into one of these services (typically Amazon S3). This requires building a data pipeline (using tools such as Amazon SageMaker Data Wrangler ) to move data into Amazon S3.

ML

ML ML AWS Python

What Are Snowflake’s Best Features for Data Transformation?

phData

AUGUST 8, 2024

Putting the T for Transformation in ELT (ETL) is essential to any data pipeline. After extracting and loading your data into the Snowflake AI Data Cloud , you may wonder how best to transform it. Luckily, Snowflake answers this question with many features designed to transform your data for all your analytic use cases.

SQL

SQL Data Pipeline Python ETL

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. You can use query_string to filter your dataset by SQL and unload it to Amazon S3.

ML

ML ML AWS Data Warehouse

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

Fivetran, a cloud-based automated data integration platform, has emerged as a leading choice among businesses looking for an easy and cost-effective way to unify their data from various sources. Building data pipelines manually is an expensive and time-consuming process. Why Use Fivetran?

Data Warehouse

Data Warehouse Data Engineer Data Engineering Data Engineering

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage. Data Visualization: Matplotlib, Seaborn, Tableau, etc.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

phData

AUGUST 2, 2024

Snowflake AI Data Cloud is one of the most powerful platforms, including storage services supporting complex data. Integrating Snowflake with dbt adds another layer of automation and control to the data pipeline. In this blog, we’ll explore: Overview of Snowflake Stored Procedures & dbt Hooks.

Data Pipeline

Data Pipeline Python Database SQL

phData Toolkit August 2023 Update

phData

SEPTEMBER 7, 2023

Hello, and welcome to our August update of the phData Toolkit blog series! Over the last month, we’ve been heavily focused on adding additional support for SQL translations to our SQL Translations tool. Over the last month, we’ve been heavily focused on adding additional support for SQL translations to our SQL Translations tool.

SQL

SQL Data Profiling Data Pipeline Database

phData Toolkit March 2023 Update

phData

MARCH 31, 2023

Hello and welcome to the next monthly installation of the phData Toolkit blog series! We’re excited to talk through the changes we’ve brought into the platform and how it has enabled our customers to build data products with confidence. Added a default name (the folder or file name) when generating SQL reports.

SQL

SQL Data Profiling Data Pipeline Database

How to Setup a Project in Snowpark Using a Python IDE

phData

JULY 2, 2024

Snowpark, offered by the Snowflake AI Data Cloud , consists of libraries and runtimes that enable secure deployment and processing of non-SQL code, such as Python, Java, and Scala. In this blog, we’ll cover the steps to get started, including: How to set up an existing Snowpark project on your local system using a Python IDE.

Python

Python SQL Data Pipeline ML

Upcoming Snowflake Features

phData

JULY 1, 2024

The recent Snowflake Summit 2024 brought plenty of exciting upcoming features, GA announcements, strategic partnerships, and many more opportunities for customers on the Snowflake AI Data Cloud to innovate. If you are new to Snowflake Cortex AI, check out this introductory blog. schemas["my_schema"].tables.create(my_table)

Python

Python Database Data Pipeline SQL

phData Toolkit June 2023 Update

phData

JUNE 26, 2023

Welcome to the latest installment of the phData Toolkit blog series! in this June episode of the blog. While many of our customers leverage our UI for tools like our SQL Translation or Privilege Audit tooling, there are limitations when it comes to using a UI. We’re already halfway through the year (crazy, right?!)

SQL

SQL Data Profiling Data Pipeline Data Governance

What are Snowflake Dynamic Tables?

phData

NOVEMBER 2, 2023

Managing data pipelines efficiently is paramount for any organization. The Snowflake Data Cloud has introduced a groundbreaking feature that promises to simplify and supercharge this process: Snowflake Dynamic Tables. What are Snowflake Dynamic Tables?

Data Pipeline

Data Pipeline SQL Data Warehouse Data Engineer

phData Toolkit July 2023 Update

phData

JULY 29, 2023

Hello, and welcome to the latest installment of the phData Toolkit blog series. We’ve been focusing on two key areas: Microsoft SQL Server to Snowflake Data Cloud SQL translations and our new Advisor tool within the phData Toolkit. We hope you had a great time being with family and friends. Let’s dive in.

SQL

SQL Database Data Pipeline

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

This blog was originally written by Erik Hyrkas and updated for 2024 by Justin Delisi This isn’t meant to be a technical how-to guide — most of those details are readily available via a quick Google search — but rather an opinionated review of key processes and potential approaches. And once again, for loading data, do not use SQL Inserts.

Clustering

Clustering Database SQL Data Pipeline

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

In this blog, our focus will be on exploring the data lifecycle along with several Design Patterns, delving into their benefits and constraints. Data architects can leverage these patterns as starting points or reference models when designing and implementing data vault architectures.

SQL

SQL Data Observability Data Quality Data Pipeline

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

It could help you detect and prevent data pipeline failures, data drift, and anomalies. Montecarlo offers data quality checks, profiling, and monitoring capabilities to ensure high-quality and reliable data for machine learning and analytics. Flyte Flyte is a platform for orchestrating ML pipelines at scale.

Machine Learning

Machine Learning Machine Learning ML ML

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

Best 8 data version control tools for 2023 (Source: DagsHub ) Introduction With business needs changing constantly and the growing size and structure of datasets, it becomes challenging to efficiently keep track of the changes made to the data, which leads to unfortunate scenarios such as inconsistencies and errors in data.

Machine Learning

Machine Learning Machine Learning Data Lakes Database

Where Does Fivetran Fit into The Modern Data Stack?

phData

JULY 17, 2023

In order to fully leverage this vast quantity of collected data, companies need a robust and scalable data infrastructure to manage it. This is where Fivetran and the Modern Data Stack come in. The modern data stack is important because its suite of tools is designed to solve all of the core data challenges companies face.

Data Warehouse

Data Warehouse Data Pipeline Cloud Data ETL

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

And you should have experience working with big data platforms such as Hadoop or Apache Spark. Additionally, data science requires experience in SQL database coding and an ability to work with unstructured data of various types, such as video, audio, pictures and text.

Data Science

Data Science Analytics Analytics Data Scientist

What are the Biggest Challenges with Migrating to Snowflake?

phData

FEBRUARY 5, 2024

In this blog, we’re going to answer these questions and more. Walking you through the biggest challenges we have found when migrating our customer’s data from a legacy system to Snowflake. You’re in luck because this blog is for anyone ready to move or thinking about moving to Snowflake who wants to know what’s in store for them.

SQL

SQL Database Data Quality Data Warehouse

Apache Kafka and Apache Flink: An open-source match made in heaven

IBM Journey to AI blog

NOVEMBER 3, 2023

In this spirit, IBM introduced IBM Event Automation with an intuitive, easy to use, no code format that enables users with little to no training in SQL, java, or python to leverage events, no matter their role. Explore Apache Flink today The post Apache Kafka and Apache Flink: An open-source match made in heaven appeared first on IBM Blog.

Apache Kafka

Apache Kafka Data Warehouse Data Pipeline Big Data

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently. What Are Matillion Jobs and Why Do They Matter?

ETL

ETL Data Warehouse SQL Database

How to Build an End-to-End Energy Price Forecasting Solution with Snowflake

phData

JANUARY 31, 2024

In this blog, we’ll show you how to build a robust energy price forecasting solution within the Snowflake Data Cloud ecosystem. We’ll cover how to get the data via the Snowflake Marketplace, how to apply machine learning with Snowpark , and then bring it all together to create an automated ML model to forecast energy prices.

Machine Learning

Machine Learning Machine Learning Python Data Scientist

Snowflake ETL Face-Off: Alteryx Designer vs. Matillion ETL

phData

MARCH 14, 2024

Matillion ETL is purpose-built for the cloud, operating smoothly on top of your chosen data warehouse. In this blog, we’ll conduct a comparative analysis of these platforms, emphasizing usability, performance, and the unique strengths each brings to the table when working with Snowflake.

ETL

ETL SQL Data Warehouse Data Pipeline

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

At the heart of this process lie ETL Tools—Extract, Transform, Load—a trio that extracts data, tweaks it, and loads it into a destination. Choosing the right ETL tool is crucial for smooth data management. This blog will delve into ETL Tools, exploring the top contenders and their roles in modern data integration.

ETL

ETL Data Quality Data Pipeline Data Warehouse

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

The power of remote engine execution for ETL/ELT data pipelines

Webinars

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Serverless High Volume ETL data processing on Code Engine

Build Data Pipelines: Comprehensive Step-by-Step Guide

How to Build Effective Data Pipelines in Snowpark

Understanding ETL Tools as a Data-Centric Organization

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

How to Translate SQL Scripts Into Matillion Jobs

How to Translate SQL Scripts Into Matillion Jobs

A Guide to Choose the Best Data Science Bootcamp

Advanced Snowflake Features in Coalesce

Comparing Tools For Data Processing Pipelines

Top 5 Fivetran Connectors for Healthcare

Use Snowflake as a data source to train ML models with Amazon SageMaker

What Are Snowflake’s Best Features for Data Transformation?

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

What Is Fivetran and How Much Does It Cost?

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

phData Toolkit August 2023 Update

phData Toolkit March 2023 Update

How to Setup a Project in Snowpark Using a Python IDE

Upcoming Snowflake Features

phData Toolkit June 2023 Update

What are Snowflake Dynamic Tables?

phData Toolkit July 2023 Update

Getting Started With Snowflake: Best Practices For Launching

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

MLOps Landscape in 2023: Top Tools and Platforms

Best 8 Data Version Control Tools for Machine Learning 2024

Where Does Fivetran Fit into The Modern Data Stack?

Data science vs data analytics: Unpacking the differences

What are the Biggest Challenges with Migrating to Snowflake?

Apache Kafka and Apache Flink: An open-source match made in heaven

Best Practices When Developing Matillion Jobs

How to Build an End-to-End Energy Price Forecasting Solution with Snowflake

Snowflake ETL Face-Off: Alteryx Designer vs. Matillion ETL

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Stay Connected