Data Warehouse, ETL and ML - Data Science Current

What Is a Lakebase?

databricks

JUNE 11, 2025

Deeply integrated with the lakehouse, Lakebase simplifies operational data workflows. It eliminates fragile ETL pipelines and complex infrastructure, enabling teams to move faster and deliver intelligent applications on a unified data platform In this blog, we propose a new architecture for OLTP databases called a lakebase.

Database

Database Data Lakes ETL Analytics

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

databricks

JUNE 11, 2025

" — James Lin, Head of AI ML Innovation, Experian The Path Forward: From Lab to Production in Days, Not Months Early customers are already experiencing the transformation Agent Bricks delivers – accuracy improvements that double performance benchmarks and reduce development timelines from weeks to a single day.

Analytics

Analytics Analytics AI AI

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Flipboard

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Learn more about the AWS zero-ETL future with newly launched AWS databases integrations with Amazon Redshift.

AWS

AWS Data Warehouse ETL SQL

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Mosaic AI Announcements at Data + AI Summit 2025

databricks

JUNE 11, 2025

Bring your real-time online ML workloads to Databricks, and let us handle the infrastructure and reliability challenges so you can focus on the AI model development. Our enhanced Model Serving infrastructure now supports over 250,000 queries per second (QPS).

AI

AI AI SQL Data Science

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier.

ETL

ETL Data Pipeline ML ML

Exploring the Power of Data Warehouse Functionality

Pickl AI

JUNE 11, 2024

Summary: A data warehouse is a central information hub that stores and organizes vast amounts of data from different sources within an organization. Unlike operational databases focused on daily tasks, data warehouses are designed for analysis, enabling historical trend exploration and informed decision-making.

Data Warehouse

Data Warehouse ETL Data Mining Data Mining

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Summary: This article explores the significance of ETL Data in Data Management. It highlights key components of the ETL process, best practices for efficiency, and future trends like AI integration and real-time processing, ensuring organisations can leverage their data effectively for strategic decision-making.

ETL

ETL Data Warehouse Data Quality Data Governance

Introducing Databricks One

databricks

JUNE 12, 2025

Skip to main content Login Why Databricks Discover For Executives For Startups Lakehouse Architecture Mosaic Research Customers Customer Stories Partners Cloud Providers Databricks on AWS, Azure, GCP, and SAP Consulting & System Integrators Experts to build, deploy and migrate to Databricks Technology Partners Connect your existing tools to your (..)

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Organizations are building data-driven applications to guide business decisions, improve agility, and drive innovation. Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Choose Continue. Review the input, and choose Create project.

SQL

SQL AWS Data Lakes AI

AWS at Databricks Data + AI Summit 2025

databricks

JUNE 4, 2025

Skip to main content Login Why Databricks Discover For Executives For Startups Lakehouse Architecture Mosaic Research Customers Customer Stories Partners Cloud Providers Databricks on AWS, Azure, GCP, and SAP Consulting & System Integrators Experts to build, deploy and migrate to Databricks Technology Partners Connect your existing tools to your (..)

AWS

AWS AI AI Data Science

ETL Pipelines With Python Azure Functions

Mlearning.ai

JULY 8, 2023

In this article we’re going to check what is an Azure function and how we can employ it to create a basic extract, transform and load (ETL) pipeline with minimal code. Extract, transform and Load Before we begin, let’s shed some light on what an ETL pipeline essentially is. ELT stands for extract, load and transform.

ETL

ETL Azure Python Internet of Things

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

The rules in this engine were predefined and written in SQL, which aside from posing a challenge to manage, also struggled to cope with the proliferation of data from TR’s various integrated data source. TR customer data is changing at a faster rate than the business rules can evolve to reflect changing customer needs.

AWS

AWS Data Warehouse ML ML

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

Amazon Lookout for Metrics is a fully managed service that uses machine learning (ML) to detect anomalies in virtually any time-series business or operational metrics—such as revenue performance, purchase transactions, and customer acquisition and retention rates—with no ML experience required. To learn more, see the documentation.

AWS

AWS ML ML Data Quality

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment.

AWS

AWS Machine Learning Machine Learning ML

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Db2 Warehouse fully supports open formats such as Parquet, Avro, ORC and Iceberg table format to share data and extract new insights across teams without duplication or additional extract, transform, load (ETL). This allows you to scale all analytics and AI workloads across the enterprise with trusted data. 

AWS

AWS Database ETL AI

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

AWS Machine Learning Blog

JUNE 25, 2024

Although these traditional machine learning (ML) approaches might perform decently in terms of accuracy, there are several significant advantages to adopting generative AI approaches. The following table compares the generative approach (generative AI) with the discriminative approach (traditional ML) across multiple aspects.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

JANUARY 26, 2024

Luckily, we have tried and trusted tools and architectural patterns that provide a blueprint for reliable ML systems. In this article, I’ll introduce you to a unified architecture for ML systems built around the idea of FTI pipelines and a feature store as the central component. But what is an ML pipeline?

Machine Learning

Machine Learning Machine Learning ML ML

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Mlearning.ai

MAY 16, 2023

This article discusses five commonly used architectural design patterns in data engineering and their use cases. ETL Design Pattern The ETL (Extract, Transform, Load) design pattern is a commonly used pattern in data engineering. Finally, the transformed data is loaded into the target system.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Data platform trinity: Competitive or complementary?

IBM Journey to AI blog

JANUARY 18, 2023

They defined it as : “ A data lakehouse is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data. ”.

Data Lakes

Data Lakes Data Warehouse Azure Apache Hadoop

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

Dolt LakeFS Delta Lake Pachyderm Git-like versioning Database tool Data lake Data pipelines Experiment tracking Integration with cloud platforms Integrations with ML tools Examples of data version control tools in ML DVC Data Version Control DVC is a version control system for data and machine learning teams.

ML

ML ML Data Lakes Machine Learning

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

This includes the tools and techniques we used to streamline the ML model development and deployment processes, as well as the measures taken to monitor and maintain models in a production environment. Costs: Oftentimes, cost is the most important aspect of any ML model deployment. This includes data quality, privacy, and compliance.

AWS

AWS ETL ML ML

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. It is supported by querying, governance, and open data formats to access and share data across the hybrid cloud. IBM watsonx.ai

AI

AI AI Machine Learning Machine Learning

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

How to Learn Machine Learning

APRIL 26, 2025

Data Cleaning and Preparation The tasks of cleaning and preparing the data take place before the analysis. This includes duplicate removal, missing value treatment, variable transformation, and normalization of data. Data Architect Designs complex databases and blueprints for data management systems.

Data Science

Data Science Data Analyst Data Scientist Machine Learning

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Why Migrate to a Modern Data Stack? With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. Legacy tools force users to manually build out processes that can be automated by the Modern Data Stack.

Data Warehouse

Data Warehouse Analytics Analytics SQL

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

Data fabric Data fabric architectures are designed to connect data platforms with the applications where users interact with information for simplified data access in an organization and self-service data consumption. Then, it applies these insights to automate and orchestrate the data lifecycle.

Data Lakes

Data Lakes AI AI Data Governance

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Alation

MAY 24, 2022

Alation has been leading the evolution of the data catalog to a platform for data intelligence. Higher data intelligence drives higher confidence in everything related to analytics and AI/ML. The Lineage & Dataflow API is a good example enabling customers to add ETL transformation logic to the lineage graph.

Data Quality

Data Quality Data Governance ETL Data Observability

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them.

SQL

SQL AWS Database Data Scientist

How to Maximize Time to Value with Fivetran and dbt

phData

OCTOBER 17, 2023

The story is all too common – a business user requests some data, the data team creates/prioritizes a ticket, and said ticket is completed after some number of months (or weeks if you’re lucky) – just to have the data be wrong, and the whole process starts again. Those are scary for data teams to change.

ETL

ETL Data Pipeline Data Engineer Data Engineering

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

Data pipeline stages But before delving deeper into the technical aspects of these tools, let’s quickly understand the core components of a data pipeline succinctly captured in the image below: Data pipeline stages | Source: Author What does a good data pipeline look like?

Data Pipeline

Data Pipeline ETL Data Quality SQL

How Investment Banks and Asset Managers Should Be Leveraging Data in Snowflake

phData

APRIL 18, 2023

Faced with these challenges, asset servicers have acquired numerous technologies over time to meet their risk management, fund analytics, and settlement needs, leading to data fragmentation and inheriting complex data flows. Data movements lead to high costs of ETL and rising data management TCO.

Data Silos

Data Silos ETL Clustering Analytics

Unlocking the 12 Ways to Improve Data Quality

Pickl AI

OCTOBER 19, 2023

Data Quality Assurance Team Establish a dedicated data quality assurance team. Their role is to oversee and enforce data quality standards, conduct audits, and drive continuous improvement. Here’s how: Data Profiling Start by analyzing your data to understand its quality.

Data Quality

Data Quality Data Governance Data Warehouse Machine Learning

How to Effectively Handle Unstructured Data Using AI

DagsHub

NOVEMBER 11, 2024

We use data-specific preprocessing and ML algorithms suited to each modality to filter out noise and inconsistencies in unstructured data. NLP cleans and refines content for text data, while audio data benefits from signal processing to remove background noise. Tools like Unstructured.io

AI

AI AI Data Lakes Database

AI that’s ready for business starts with data that’s ready for AI

IBM Journey to AI blog

JULY 3, 2024

To power AI and analytics workloads across your transactional and purpose-built databases, you must ensure they can seamlessly integrate with an open data lakehouse architecture without duplication or additional extract, transform, load (ETL) processes.

AI

AI Data Quality AI Database

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Tools such as Python’s Pandas library, Apache Spark, or specialised data cleaning software streamline these processes, ensuring data integrity before further transformation. Step 3: Data Transformation Data transformation focuses on converting cleaned data into a format suitable for analysis and storage.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

Azure Data Engineer Jobs

Pickl AI

APRIL 6, 2023

Having a solid understanding of ML principles and practical knowledge of statistics, algorithms, and mathematics. Which service would you use to create Data Warehouse in Azure? Answer : Azure Synapse is a service that offers limitless analytics that unifies Big Data Analytics and Enterprise Data Warehousing.

Azure

Azure Data Engineer Data Engineering Data Engineering

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Nevertheless, many data scientists will agree that they can be really valuable – if used well. And that’s what we’re going to focus on in this article, which is the second in my series on Software Patterns for Data Science & ML Engineering. in a pandas DataFrame) but in the company’s data warehouse (e.g.,

SQL

SQL Database Data Scientist Python

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

It’d be difficult to exaggerate the importance of data in today’s global marketplace, especially for firms which are going through digital transformation (DT). Using bad data, or the incorrect data can generate devastating results. This is where a reverse ETL process is needed.

Data Governance

Data Governance ETL Machine Learning Natural Language Processing

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

This “analysis” is made possible in large part through machine learning (ML); the patterns and connections ML detects are then served to the data catalog (and other tools), which these tools leverage to make people- and machine-facing recommendations about data management and data integrations.

DataOps

DataOps SQL ML ML

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

It’d be difficult to exaggerate the importance of data in today’s global marketplace, especially for firms which are going through digital transformation (DT). Using bad data, or the incorrect data can generate devastating results. This is where a reverse ETL process is needed.

Data Governance

Data Governance ETL Machine Learning Natural Language Processing

Beginner’s Guide To GCP BigQuery (Part 1)

Mlearning.ai

JULY 10, 2023

In my 7 years of Data Science journey, I’ve been exposed to a number of different databases including but not limited to Oracle Database, MS SQL, MySQL, EDW, and Apache Hadoop. A lot of you who are already in the data science field must be familiar with BigQuery and its advantages.

SQL

SQL Database Apache Hadoop Data Science

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

databricks

JUNE 12, 2025

Sample Dataflow Graph Declarative APIs make ETL simpler and more maintainable Through years of working with real-world Spark users, we’ve seen common challenges emerge when building production pipelines: Too much time spent wiring together pipelines with “glue code” to handle incremental ingestion or deciding when to materialize datasets.

SQL

SQL Data Engineer Data Engineering Data Engineering

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

Flipboard

NOVEMBER 22, 2024

Organizations run millions of Apache Spark applications each month to prepare, move, and process their data for analytics and machine learning (ML). During development, data engineers often spend hours sifting through log files, analyzing execution plans, and making configuration changes to resolve issues. Choose your job.

AWS

AWS AI AI Data Engineer

Fivetran Modern Data Stack Conference 2023: Key Takeaways

Alation

APRIL 14, 2023

Last week, the Alation team had the privilege of joining IT professionals, business leaders, and data analysts and scientists for the Modern Data Stack Conference in San Francisco. In “The modern data stack is dead, long live the modern data stack!” Cloud costs are growing prohibitive.

Data Pipeline

Data Pipeline Data Warehouse Cloud Data ETL

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

If the event log is your customer’s diary, think of persistent staging as their scrapbook – a place where raw customer data is collected, organized, and kept for future reference. In traditional ETL (Extract, Transform, Load) processes in CDPs, staging areas were often temporary holding pens for data.

Data Models

Data Models Data Modeling Apache Kafka Data Lakes

What Is a Lakebase?

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

Webinars

Trending Sources

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Webinars

Mosaic AI Announcements at Data + AI Summit 2025

How to Build ETL Data Pipeline in ML

Exploring the Power of Data Warehouse Functionality

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Introducing Databricks One

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

AWS at Databricks Data + AI Summit 2025

ETL Pipelines With Python Azure Functions

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Transitioning off Amazon Lookout for Metrics

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Tackling AI’s data challenges with IBM databases on AWS

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

How to Build Machine Learning Systems With a Feature Store

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Data platform trinity: Competitive or complementary?

How to Version Control Data in ML for Various Data Sources

How to Build a CI/CD MLOps Pipeline [Case Study]

Exploring the AI and data capabilities of watsonx

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

The Ultimate Modern Data Stack Migration Guide

Data democratization: How data architecture can drive business decisions and AI initiatives

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

How to Maximize Time to Value with Fivetran and dbt

Comparing Tools For Data Processing Pipelines

How Investment Banks and Asset Managers Should Be Leveraging Data in Snowflake

Unlocking the 12 Ways to Improve Data Quality

How to Effectively Handle Unstructured Data Using AI

AI that’s ready for business starts with data that’s ready for AI

Build Data Pipelines: Comprehensive Step-by-Step Guide

Azure Data Engineer Jobs

How to Use Exploratory Notebooks [Best Practices]

5 Data Governance Mistakes to Avoid

What Is a Data Fabric and How Does a Data Catalog Support It?

5 Data Governance Mistakes to Avoid

Beginner’s Guide To GCP BigQuery (Part 1)

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

Fivetran Modern Data Stack Conference 2023: Key Takeaways

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Stay Connected