Data Engineer, Data Pipeline and SQL

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. They transform data into a consistent format for users to consume.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

Data engineering tools are software applications or frameworks specifically designed to facilitate the process of managing, processing, and transforming large volumes of data. Essential data engineering tools for 2023 Top 10 data engineering tools to watch out for in 2023 1.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Interacting with Remote Databases – PostgreSQL and DBAPIs

Analytics Vidhya

SEPTEMBER 22, 2022

This article was published as a part of the Data Science Blogathon. Introduction When creating data pipelines, Software Engineers and Data Engineers frequently work with databases using Database Management Systems like PostgreSQL.

Database

Database Data Pipeline Data Engineering Data Engineer

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Prophecy’s generative AI assistant ushers in a new era of data pipeline automation

Flipboard

JUNE 22, 2023

Data engineering startup Prophecy is giving a new turn to data pipeline creation. Known for its low-code SQL tooling, the California-based company today announced data copilot, a generative AI assistant that can create trusted data pipelines from natural language prompts and improve pipeline quality …

Data Pipeline

Data Pipeline SQL Data Engineering Data Engineer

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

These experiences facilitate professionals from ingesting data from different sources into a unified environment and pipelining the ingestion, transformation, and processing of data to developing predictive models and analyzing the data by visualization in interactive BI reports.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Airbyte: The ultimate workhorse for all your ELT pipelines

Data Science Dojo

JANUARY 27, 2023

Data Science Dojo is offering Airbyte for FREE on Azure Marketplace packaged with a pre-configured web environment enabling you to quickly start the ELT process rather than spending time setting up the environment. Free to use. Conclusion  There are a ton of small services that aren’t supported on traditional data pipeline platforms.

Azure

Azure Data Science Data Pipeline Data Engineering

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

Managing and retrieving the right information can be complex, especially for data analysts working with large data lakes and complex SQL queries. This tool converts questions from data analysts asked in natural language (such as “Which table contains customer address information?”)

SQL

SQL Data Lakes Data Analyst AWS

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow. With a multicloud data strategy, organizations need to optimize for data gravity and data locality.

Data Pipeline

Data Pipeline ETL SQL Database

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

This tool democratizes data access across the organization, enabling even nontechnical users to gain valuable insights. A standout application is the SQL-to-natural language capability, which translates complex SQL queries into plain English and vice versa, bridging the gap between technical and business teams.

AWS

AWS Data Governance Data Silos SQL

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

The blog post explains how the Internal Cloud Analytics team leveraged cloud resources like Code-Engine to improve, refine, and scale the data pipelines. Background One of the Analytics teams tasks is to load data from multiple sources and unify it into a data warehouse. Thus, it has only a minimal footprint.

ETL

ETL Data Pipeline Database Data Warehouse

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Unfolding the difference between data engineer, data scientist, and data analyst. Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. Data Visualization: Matplotlib, Seaborn, Tableau, etc.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

Aspiring and experienced Data Engineers alike can benefit from a curated list of books covering essential concepts and practical techniques. These 10 Best Data Engineering Books for beginners encompass a range of topics, from foundational principles to advanced data processing methods. What is Data Engineering?

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

Data engineering is a rapidly growing field, and there is a high demand for skilled data engineers. If you are a data scientist, you may be wondering if you can transition into data engineering. In this blog post, we will discuss how you can become a data engineer if you are a data scientist.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

In recent years, data engineering teams working with the Snowflake Data Cloud platform have embraced the continuous integration/continuous delivery (CI/CD) software development process to develop data products and manage ETL/ELT workloads more efficiently. What Are the Benefits of CI/CD Pipeline For Snowflake?

Data Pipeline

Data Pipeline Database SQL Data Engineering

The 2021 Executive Guide To Data Science and AI

Applied Data Science

AUGUST 2, 2021

Automation Automating data pipelines and models ➡️ 6. Team Building the right data science team is complex. With a range of role types available, how do you find the perfect balance of Data Scientists , Data Engineers and Data Analysts to include in your team? Big Ideas What to look out for in 2022 1.

Data Science

Data Science Data Scientist ML ML

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

ODSC - Open Data Science

FEBRUARY 17, 2023

Cloud Computing, APIs, and Data Engineering NLP experts don’t go straight into conducting sentiment analysis on their personal laptops. Data Engineering Platforms Spark is still the leader for data pipelines but other platforms are gaining ground. Knowing some SQL is also essential.

Deep Learning

Deep Learning Deep Learning Data Science Natural Language Processing

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

DataSeries

AUGUST 15, 2024

Enrich data engineering skills by building problem-solving ability with real-world projects, teaming with peers, participating in coding challenges, and more. Globally several organizations are hiring data engineers to extract, process and analyze information, which is available in the vast volumes of data sets.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

It allows organizations to easily connect their disparate data sources without having to manage any infrastructure. Fivetran’s automated data movement platform simplifies the ETL (extract, transform, load) process by automating most of the time-consuming tasks of ETL that data engineers would typically do.

Data Warehouse

Data Warehouse Data Engineer Data Engineering Data Engineering

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python. Databases and SQL : Managing and querying relational databases using SQL, as well as working with NoSQL databases like MongoDB.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Computer Science and Computer Engineering Similar to knowing statistics and math, a data scientist should know the fundamentals of computer science as well. While knowing Python, R, and SQL are expected, you’ll need to go beyond that. Big Data As datasets become larger and more complex, knowing how to work with them will be key.

Data Science

Data Science Data Scientist Computer Science Computer Science

How to Translate SQL Scripts Into Matillion Jobs

phData

JULY 12, 2023

Unlike traditional methods that rely on complex SQL queries for orchestration, Matillion Jobs provides a more streamlined approach. By converting SQL scripts into Matillion Jobs , users can take advantage of the platform’s advanced features for job orchestration, scheduling, and sharing. In our case, this table is “orders.”

SQL

SQL ETL Database Data Pipeline

How to Translate SQL Scripts Into Matillion Jobs

phData

APRIL 21, 2023

Unlike traditional methods that rely on complex SQL queries for orchestration, Matillion Jobs provide a more streamlined approach. By converting SQL scripts into Matillion Jobs , users can take advantage of the platform’s advanced features for job orchestration, scheduling, and sharing. In our case, this table is “orders.”

SQL

SQL ETL Database Data Pipeline

Advanced Snowflake Features in Coalesce

phData

JULY 4, 2024

Because it runs Snowflake SQL from an easy-to-use, code-first GUI interface, it can take advantage of everything Snowflake offers, even if the feature is brand new. This blog will cover creating customized nodes in Coalesce, what new advanced features can already be used as nodes, and how to create them as part of your data pipeline.

SQL

SQL Data Pipeline Data Engineering Data Engineer

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 12, 2024

Data is presented to the personas that need access using a unified interface. For example, it can be used to answer questions such as “If patients have a propensity to have their wearables turned off and there is no clinical telemetry data available, can the likelihood that they are hospitalized still be accurately predicted?”

ML

ML ML AWS AI

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Most Frequently Asked Azure Data Factory Interview Questions

Analytics Vidhya

FEBRUARY 20, 2023

Introduction Azure data factory (ADF) is a cloud-based data ingestion and ETL (Extract, Transform, Load) tool. The data-driven workflow in ADF orchestrates and automates data movement and data transformation.

Azure

Azure ETL Analytics Analytics

Where Does Fivetran Fit into The Modern Data Stack?

phData

JULY 17, 2023

This is where Fivetran and the Modern Data Stack come in. Fivetran is a fully-automated, zero-maintenance data pipeline tool that automates the ETL process from data sources to your cloud warehouse. Centralize Many Different Data Sources Into a Single Cloud-Based Target (i.e. What is Fivetran?

Data Warehouse

Data Warehouse Data Pipeline Cloud Data ETL

How to Setup a Project in Snowpark Using a Python IDE

phData

JULY 2, 2024

Snowpark, offered by the Snowflake AI Data Cloud , consists of libraries and runtimes that enable secure deployment and processing of non-SQL code, such as Python, Java, and Scala. Developers can seamlessly build data pipelines, ML models, and data applications with User-Defined Functions and Stored Procedures.

Python

Python SQL Data Pipeline ML

What are Snowflake Dynamic Tables?

phData

NOVEMBER 2, 2023

Managing data pipelines efficiently is paramount for any organization. The Snowflake Data Cloud has introduced a groundbreaking feature that promises to simplify and supercharge this process: Snowflake Dynamic Tables. Flexibility: Dynamic tables allow batch and streaming pipelines to be specified in the same way.

Data Pipeline

Data Pipeline SQL Data Warehouse Data Engineering

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

phData

AUGUST 2, 2024

Snowflake AI Data Cloud is one of the most powerful platforms, including storage services supporting complex data. Integrating Snowflake with dbt adds another layer of automation and control to the data pipeline. Snowflake stored procedures and dbt Hooks are essential to modern data engineering and analytics workflows.

Data Pipeline

Data Pipeline Python Database SQL

The Data Engineer’s Roadmap

Dataversity

SEPTEMBER 28, 2022

Data engineering is a fascinating and fulfilling career – you are at the helm of every business operation that requires data, and as long as users generate data, businesses will always need data engineers. The journey to becoming a successful data engineer […].

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

Copy Into When loading data into Snowflake, the very first and most important rule to follow is: do not load data with SQL inserts! Loading small amounts of data is cumbersome and costly: Each insert is slow — and time is credits. And once again, for loading data, do not use SQL Inserts.

Clustering

Clustering Database SQL Data Pipeline

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. For example, neptune.ai

Machine Learning

Machine Learning Machine Learning ML ML

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Alation

APRIL 4, 2023

However, the race to the cloud has also created challenges for data users everywhere, including: Cloud migration is expensive, migrating sensitive data is risky, and navigating between on-prem sources is often confusing for users. This empowers users to judge data’s quality and fitness for purpose quickly.

DataOps

DataOps Data Engineering Data Engineer Data Engineering

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

That said, dbt provides the ability to generate data vault models and also allows you to write your data transformations using SQL and code-reusable macros powered by Jinja2 to run your data pipelines in a clean and efficient way. The most important reason for using DBT in Data Vault 2.0

SQL

SQL Data Observability Data Quality Data Pipeline

Upcoming Snowflake Features

phData

JULY 1, 2024

Below are several of the upcoming features announced during the Summit: Cortex Analyst : this upcoming feature, built with Meta’s Llama 3 and Mistral Large models, allows business users to chat with their data on Snowflake. Snowflake Copilot, soon-to-be GA, allows technical users to convert questions into SQL. schemas["my_schema"].tables.create(my_table)

Python

Python Database Data Pipeline SQL

AI-Powered Digital Transformation: Get Your Data and AI Ready

Precisely

AUGUST 15, 2024

Key Players in AI Development Enterprises increasingly rely on AI to automate and enhance their data engineering workflows, making data more ready for building, training, and deploying AI applications. Data engineers focus on tasks like cleansing and managing data, ensuring its quality and readiness for AI applications.

AI

AI AI Data Quality Data Engineering

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Integration: Airflow integrates seamlessly with other data engineering and Data Science tools like Apache Spark and Pandas. IBM Infosphere DataStage IBM Infosphere DataStage is an enterprise-level ETL tool that enables users to design, develop, and run data pipelines. How to drop a database in SQL server?

ETL

ETL Data Quality Data Pipeline Data Warehouse

Retail & CPG Questions phData Can Answer with Data

phData

JUNE 26, 2024

Cleaning and preparing the data Raw data typically shouldn’t be used in machine learning models as it’ll throw off the prediction. Data engineers can prepare the data by removing duplicates, dealing with outliers, standardizing data types and precision between data sets, and joining data sets together.

Machine Learning

Machine Learning Machine Learning Data Engineering Data Engineer

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Essential data engineering tools for 2023: Empowering for management and analysis

Webinars

Trending Sources

Interacting with Remote Databases – PostgreSQL and DBAPIs

Webinars

Prophecy’s generative AI assistant ushers in a new era of data pipeline automation

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Airbyte: The ultimate workhorse for all your ELT pipelines

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

The power of remote engine execution for ETL/ELT data pipelines

Shaping the future: OMRON’s data-driven journey with AWS

Serverless High Volume ETL data processing on Code Engine

Discover the Most Important Fundamentals of Data Engineering

How to Build Effective Data Pipelines in Snowpark

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

10 Best Data Engineering Books [Beginners to Advanced]

How to Shift from Data Science to Data Engineering

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

The 2021 Executive Guide To Data Science and AI

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

What Is Fivetran and How Much Does It Cost?

A Guide to Choose the Best Data Science Bootcamp

40 Must-Know Data Science Skills and Frameworks for 2023

How to Translate SQL Scripts Into Matillion Jobs

How to Translate SQL Scripts Into Matillion Jobs

Advanced Snowflake Features in Coalesce

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Most Frequently Asked Azure Data Factory Interview Questions

Where Does Fivetran Fit into The Modern Data Stack?

How to Setup a Project in Snowpark Using a Python IDE

What are Snowflake Dynamic Tables?

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

The Data Engineer’s Roadmap

Getting Started With Snowflake: Best Practices For Launching

MLOps Landscape in 2023: Top Tools and Platforms

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

Upcoming Snowflake Features

AI-Powered Digital Transformation: Get Your Data and AI Ready

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Retail & CPG Questions phData Can Answer with Data

Stay Connected