2023, Data Pipeline and ETL - Data Science Current

2023

Data Pipeline

ETL

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

These tools provide data engineers with the necessary capabilities to efficiently extract, transform, and load (ETL) data, build data pipelines, and prepare data for analysis and consumption by other applications. It allows data engineers to store, manage, and analyze large datasets efficiently.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Navigating the World of Data Engineering: A Beginners Guide.

Towards AI

MARCH 21, 2023

Last Updated on March 21, 2023 by Editorial Team Author(s): Data Science meets Cyber Security Originally published on Towards AI. Navigating the World of Data Engineering: A Beginner’s Guide. A GLIMPSE OF DATA ENGINEERING ❤ IMAGE SOURCE: BY AUTHOR Data or data? What are ETL and data pipelines?

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Trending Sources

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Heartbeat

NOVEMBER 6, 2023

Image Source — Pixel Production Inc In the previous article, you were introduced to the intricacies of data pipelines, including the two major types of existing data pipelines. You might be curious how a simple tool like Apache Airflow can be powerful for managing complex data pipelines.

Data Pipeline

Data Pipeline Clean Data ETL Python

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Fivetran Modern Data Stack Conference 2023: Key Takeaways

Alation

APRIL 14, 2023

Last week, the Alation team had the privilege of joining IT professionals, business leaders, and data analysts and scientists for the Modern Data Stack Conference in San Francisco. So, how can a data catalog support the critical project of building data pipelines?

Data Pipeline

Data Pipeline Data Warehouse Cloud Data ETL

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. ETL is vital for ensuring data quality and integrity.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

On December 6 th -8 th 2023, the non-profit organization, Tech to the Rescue , in collaboration with AWS, organized the world’s largest Air Quality Hackathon – aimed at tackling one of the world’s most pressing health and environmental challenges, air pollution.

AWS

AWS AI Python AI

The Role of RTOS in the Future of Big Data Processing

ODSC - Open Data Science

JUNE 19, 2023

It is the preferred operating system for data processing heavy operations for many reasons (more on this below). Around 70 percent of embedded systems use this OS and the RTOS market is expected to grow by 23 percent CAGR within the 2023–2030 forecast period, reaching a market value of over $2.5

Big Data

Big Data Big Data Artificial Intelligence Artificial Intelligence

Getting Started With Matillion Data Productivity Cloud

phData

NOVEMBER 28, 2023

In July 2023, Matillion launched their fully SaaS platform called Data Productivity Cloud, aiming to create a future-ready, everyone-ready, and AI-ready environment that companies can easily adopt and start automating their data pipelines coding, low-coding, or even no-coding at all. Why Does it Matter?

Data Warehouse

Data Warehouse Data Pipeline ETL Azure

How to Unlock Real-Time Analytics with Snowflake?

phData

MAY 3, 2024

What is Apache Kafka, and How is it Used in Building Real-time Data Pipelines? It is capable of handling high-volume and high-velocity data. It can deliver a high volume of data with latency as low as two milliseconds. Its use cases range from real-time analytics, fraud detection, messaging, and ETL pipelines.

Apache Kafka

Apache Kafka Analytics Analytics ETL

How to Translate SQL Scripts Into Matillion Jobs

phData

JULY 12, 2023

In this blog, we’ll explore how Matillion Jobs can simplify the data transformation process by allowing users to visualize the data flow of a job from start to finish. What is Matillion ETL? Whether you’re new to Matillion or just looking to improve your ETL skills, keep reading to learn more!

SQL

SQL ETL Database Data Pipeline

How to Translate SQL Scripts Into Matillion Jobs

phData

APRIL 21, 2023

In this blog, we’ll explore how Matillion Jobs can simplify the data transformation process by allowing users to visualize the data flow of a job from start to finish. With that, let’s dive in What is Matillion ETL? Read Components These are the components that define the source of data that is to be transformed.

SQL

SQL ETL Database Data Pipeline

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Context In early 2023, Zeta’s machine learning (ML) teams shifted from traditional vertical teams to a more dynamic horizontal structure, introducing the concept of pods comprising diverse skill sets. Though it’s worth mentioning that Airflow isn’t used at runtime as is usual for extract, transform, and load (ETL) tasks.

AWS

AWS Machine Learning Machine Learning ML

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

It truly is an all-in-one data lake solution. HPCC Systems and Spark also differ in that they work with distinct parts of the big data pipeline. Spark is more focused on data science, ingestion, and ETL, while HPCC Systems focuses on ETL and data delivery and governance. Save the Date!

Data Lakes

Data Lakes Clustering Big Data Big Data

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

Dolt LakeFS Delta Lake Pachyderm Git-like versioning Database tool Data lake Data pipelines Experiment tracking Integration with cloud platforms Integrations with ML tools Examples of data version control tools in ML DVC Data Version Control DVC is a version control system for data and machine learning teams.

ML ML Data Lakes Machine Learning

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

The next generation of Db2 Warehouse SaaS and Netezza SaaS on AWS fully support open formats such as Parquet and Iceberg table format, enabling the seamless combination and sharing of data in watsonx.data without the need for duplication or additional ETL. Savings may vary depending on configurations, workloads and vendor.

AI AI Machine Learning Machine Learning

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

JANUARY 26, 2024

Reference table for which technologies to use for your FTI pipelines for each ML system. Related article How to Build ETL Data Pipelines for ML See also MLOps and FTI pipelines testing Once you have built an ML system, you have to operate, maintain, and update it.

Machine Learning

Machine Learning Machine Learning ML ML

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

Traditionally, answering this question would involve multiple data exports, complex extract, transform, and load (ETL) processes, and careful data synchronization across systems. Users can write data to managed RMS tables using Iceberg APIs, Amazon Redshift, or Zero-ETL ingestion from supported data sources.

SQL

SQL Data Analyst Data Warehouse AWS

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

In transitional modeling, we’d add new atoms: Subject: Customer#1234 Predicate: hasEmailAddress Object: "john.new@example.com" Timestamp: 2023-07-24T10:00:00Z The old email address atoms are still there, giving us a complete history of how to contact John. Both persistent staging and data lakes involve storing large amounts of raw data.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Experimenting with GenAI: Building Self-Healing CI/CD Pipelines for dbt Cloud

phData

AUGUST 22, 2024

Consider a data pipeline that detects its own failures, diagnoses the issue, and recommends the fix—all automatically. This is the potential of self-healing pipelines, and this blog explores how to implement them using dbt, Snowflake Cortex , and GitHub Actions. This output is less helpful.

SQL

SQL Data Quality Python Data Warehouse

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

The Ultimate Modern Data Stack Migration Guide phData Marketing July 18, 2023 This guide was co-written by a team of data experts, including Dakota Kelley, Ahmad Aburia, Sam Hall, and Sunny Yan. Imagine a world where all of your data is organized, easily accessible, and routinely leveraged to drive impactful outcomes.

Data Warehouse

Data Warehouse Analytics Analytics SQL

The Rise of Streaming Data Architectures: What You Need to Know

Precisely

JANUARY 6, 2025

If youre like many modern organizations, you may be managing data across an increasingly complex landscape of on-premises platforms, cloud services, and legacy systems and facing challenges in doing so. According to the 2023 Gartner Cloud End-User Behavior Survey, 81% of respondents use multiple cloud providers.

Data Pipeline

Data Pipeline ETL Analytics Analytics

Generate training data and cost-effectively train categorical models with Amazon Bedrock

AWS Machine Learning Blog

MARCH 27, 2025

model_id = "anthropic.claude-3-5-sonnet-20240620-v1:0" # Load the prompt from a file (showed and explained later in the blog) with open('prompt.txt', 'r') as file: data = file.read() def callBedrock(body): # Format the request payload using the model's native structure. The same ETL workflows were running fine before the upgrade.

AWS

AWS ETL ML ML

Essential data engineering tools for 2023: Empowering for management and analysis

Navigating the World of Data Engineering: A Beginners Guide.

Webinars

Trending Sources

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Webinars

Fivetran Modern Data Stack Conference 2023: Key Takeaways

Discover the Most Important Fundamentals of Data Engineering

Improving air quality with generative AI

The Role of RTOS in the Future of Big Data Processing

Getting Started With Matillion Data Productivity Cloud

How to Unlock Real-Time Analytics with Snowflake?

How to Translate SQL Scripts Into Matillion Jobs

How to Translate SQL Scripts Into Matillion Jobs

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Drowning in Data? A Data Lake May Be Your Lifesaver

How to Version Control Data in ML for Various Data Sources

Exploring the AI and data capabilities of watsonx

How to Build Machine Learning Systems With a Feature Store

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Experimenting with GenAI: Building Self-Healing CI/CD Pipelines for dbt Cloud

The Ultimate Modern Data Stack Migration Guide

The Rise of Streaming Data Architectures: What You Need to Know

Generate training data and cost-effectively train categorical models with Amazon Bedrock

Stay Connected