Document, ETL and SQL - Data Science Current

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. Create dbt models in dbt Cloud.

ETL

ETL Data Warehouse Analytics Analytics

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

By Santhosh Kumar Neerumalla , Niels Korschinsky & Christian Hoeboer Introduction This blogpost describes how to manage and orchestrate high volume Extract-Transform-Load (ETL) loads using a serverless process based on Code Engine. Thus, we use an Extract-Transform-Load (ETL) process to ingest the data.

ETL

ETL Data Pipeline Database Data Warehouse

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

AWS Machine Learning Blog

NOVEMBER 20, 2024

Whether it’s structured data in databases or unstructured content in document repositories, enterprises often struggle to efficiently query and use this wealth of information. Create and load sample data In this post, we use two sample datasets: a total sales dataset CSV file and a sales target document in PDF format. Choose Next.

Database

Database AWS SQL ETL

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

This brings reliability to data ETL (Extract, Transform, Load) processes, query performances, and other critical data operations. Documentation and Disaster Recovery Made Easy Data is the lifeblood of any organization, and losing it can be catastrophic. So why using IaC for Cloud Data Infrastructures?

Data Warehouse

Data Warehouse Azure SQL Database

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

Summary: This guide explores the top list of ETL tools, highlighting their features and use cases. To harness this data effectively, businesses rely on ETL (Extract, Transform, Load) tools to extract, transform, and load data into centralized systems like data warehouses. What is ETL? What are ETL Tools?

ETL

ETL Data Warehouse AWS Business Intelligence

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Summary: This article explores the significance of ETL Data in Data Management. It highlights key components of the ETL process, best practices for efficiency, and future trends like AI integration and real-time processing, ensuring organisations can leverage their data effectively for strategic decision-making.

ETL

ETL Data Warehouse Data Quality Data Governance

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

A Matillion pipeline is a collection of jobs that extract, load, and transform (ETL/ELT) data from various sources into a target system, such as a cloud data warehouse like Snowflake. Intuitive Workflow Design Workflows should be easy to follow and visually organized, much like clean, well-structured SQL or Python code.

AI

AI AI SQL ETL

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Summary: Choosing the right ETL tool is crucial for seamless data integration. At the heart of this process lie ETL Tools—Extract, Transform, Load—a trio that extracts data, tweaks it, and loads it into a destination. Choosing the right ETL tool is crucial for smooth data management. What is ETL?

ETL

ETL Data Quality Data Pipeline Data Warehouse

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Flipboard

NOVEMBER 24, 2023

This use case highlights how large language models (LLMs) are able to become a translator between human languages (English, Spanish, Arabic, and more) and machine interpretable languages (Python, Java, Scala, SQL, and so on) along with sophisticated internal reasoning. Room for improvement!

Database

Database AWS ETL SQL

Top 10 Big Data CRM Tools To Increase Business Sales

Smart Data Collective

JULY 20, 2021

This tool is designed to connect various data sources, enterprise applications and perform analytics and ETL processes. This ETL integration software allows you to build integrations anytime and anywhere without requiring any coding. Moreover, it allows you to explore the data in SQL and view it in any analytics tool efficiently.

Big Data

Big Data Big Data ETL Analytics

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

Extraction, Transform, Load (ETL). Redshift is the product for data warehousing, and Athena provides SQL data analytics. It has useful features, such as an in-browser SQL editor for queries and data analysis, various data connectors for easy data ingestion, and automated data prepossessing and ingestion. Master data management.

Data Warehouse

Data Warehouse SQL Azure ETL

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

Fivetran’s automated data movement platform simplifies the ETL (extract, transform, load) process by automating most of the time-consuming tasks of ETL that data engineers would typically do. For more information and examples of the MAR calculation, see the official documentation here.

Data Warehouse

Data Warehouse Data Engineering Data Engineering Data Engineer

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

To learn more, see the documentation. Using Amazon Redshift ML for anomaly detection Amazon Redshift ML makes it easy to create, train, and apply machine learning models using familiar SQL commands in Amazon Redshift data warehouses. To learn more, see the documentation. To learn more, see the documentation.

AWS

AWS ML ML Data Quality

How to Better Plan Your Snowflake Migration

phData

SEPTEMBER 26, 2023

With numerous approaches and patterns to consider, items and processes to document, target states to plan and architect, all while keeping your current day-to-day processes and business decisions operating smoothly—we understand that migrating an entire data platform is no small task. SQL Server Agent jobs).

SQL

SQL Database ETL Data Models

What Are Snowflake’s Best Features for Data Transformation?

phData

AUGUST 8, 2024

Putting the T for Transformation in ELT (ETL) is essential to any data pipeline. They let you create virtual tables from the results of an SQL query. Stored Procedures In any data warehousing solution, stored procedures encapsulate SQL logic into repeatable routines, but Snowflake has some tricks up its sleeve.

SQL

SQL Data Pipeline Python ETL

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Alation

MAY 24, 2022

SmartSuggestions — In Compose, Alation’s SQL editor, AI-powered suggestions actively show query writers relevant data to use as they query. The Lineage & Dataflow API is a good example enabling customers to add ETL transformation logic to the lineage graph. for the popular database SQL Server. Open Data Quality Initiative.

Data Quality

Data Quality Data Governance ETL Data Observability

Who is a BI Developer: Role, Responsibilities & Skills

Pickl AI

JULY 3, 2023

Here are steps you can follow to pursue a career as a BI Developer: Acquire a solid foundation in data and analytics: Start by building a strong understanding of data concepts, relational databases, SQL (Structured Query Language), and data modeling.

Business Intelligence

Business Intelligence Business Intelligence SQL Data Visualization

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

That said, dbt provides the ability to generate data vault models and also allows you to write your data transformations using SQL and code-reusable macros powered by Jinja2 to run your data pipelines in a clean and efficient way. Macros can be called in models and then generated into additional SQL snippets or even the entire SQL code.

SQL

SQL Data Observability Data Quality Data Pipeline

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Though it’s worth mentioning that Airflow isn’t used at runtime as is usual for extract, transform, and load (ETL) tasks. Additional Resources For those looking to dive deeper, we recommend exploring the official documentation and tutorials for each tool: Airflow , Feast , dbt , MLflow ) and Amazon ECS.

AWS

AWS Machine Learning Machine Learning ML

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

References : Links to internal or external documentation with background information or specific information used within the analysis presented in the notebook. You could link this section to any other piece of documentation. documentation. In those cases, most of the data exploration and wrangling will be done through SQL.

SQL

SQL Database Data Scientist Python

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Understanding Fivetran Fivetran is a popular Software-as-a-Service platform that enables users to automate the movement of data and ETL processes across diverse sources to a target destination. For a longer overview, along with insights and best practices, please feel free to jump back to the previous blog.

SQL

SQL Data Warehouse Azure Cloud Data

How to Pull Data From On-prem Systems Using Fivetran’s HVA Connectors

phData

OCTOBER 20, 2023

Some of the databases supported by Fivetran are: Snowflake Data Cloud (BETA) MySQL PostgreSQL SAP ERP SQL Server Oracle In this blog, we will review how to pull Data from on-premise Systems using Fivetran to a specific target or destination. You can find more information about them in their official documentation.

Database

Database SQL ETL Data Warehouse

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

AWS Machine Learning Blog

JUNE 25, 2024

In addition, the generative business intelligence (BI) capabilities of QuickSight allow you to ask questions about customer feedback using natural language, without the need to write SQL queries or learn a BI tool. The following diagram illustrates the architecture and workflow of the proposed solution.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Hierarchies in Dimensional Modelling

Pickl AI

AUGUST 9, 2024

Document Hierarchy Structures Maintain thorough documentation of hierarchy designs, including definitions, relationships, and data sources. This documentation is invaluable for future reference and modifications. Simplify hierarchies where possible and provide clear documentation to help users understand the structure.

Data Warehouse

Data Warehouse Data Quality ETL Business Intelligence

Ultimate Guide to Data Lineage Directly in Snowflake

phData

JUNE 23, 2023

While traditional methods of tracking data lineage often involve manual documentation and complex processes, the Snowflake Data Cloud offers a powerful and streamlined solution. Traditional methods for tracking data lineage typically involve manual documentation and reliance on stakeholders’ knowledge.

Data Quality

Data Quality Data Governance ETL Database

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

Reverse ETL tools. The modern data stack is also the consequence of a shift in analysis workflow, fromextract, transform, load (ETL) to extract, load, transform (ELT). A Note on the Shift from ETL to ELT. In the past, data movement was defined by ETL: extract, transform, and load. Extract, load, Transform (ELT) tools.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. We document these custom models in Alation Data Catalog and publish common queries that other teams can use for operational use cases or reporting needs. How does this help the end user?

Data Analyst

Data Analyst Data Scientist Analytics Analytics

What Free Tools Pair Well With The Snowflake AI Data Cloud?

phData

OCTOBER 17, 2024

Apache Airflow Airflow is an open-source ETL software that is very useful when paired with Snowflake. dbt offers a SQL-first transformation workflow that lets teams build data transformation pipelines while following software engineering best practices like CI/CD, modularity, and documentation.

AI

AI AI SQL Data Quality

Understanding Business Intelligence Architecture: Key Components

Pickl AI

JANUARY 28, 2025

documents and images). This involves several key processes: Extract, Transform, Load (ETL): The ETL process extracts data from different sources, transforms it into a suitable format by cleaning and enriching it, and then loads it into a data warehouse or data lake. Data can be structured (e.g., databases), semi-structured (e.g.,

Business Intelligence

Business Intelligence Business Intelligence ETL Data Lakes

dbt and Sigma Integration

phData

JUNE 27, 2023

Using SQL-centric transformations to model data to be deployed. dbt is also great for data lineage and documentation to empower business analysts to make informed decisions on their data. The focus is on SQL, which is easier to learn but can still prove to be a barrier. It is a compiler and a runner.

SQL

SQL Database Data Quality Data Warehouse

Beginner’s Guide To GCP BigQuery (Part 1)

Mlearning.ai

JULY 10, 2023

In my 7 years of Data Science journey, I’ve been exposed to a number of different databases including but not limited to Oracle Database, MS SQL, MySQL, EDW, and Apache Hadoop. Views Views in GCP BigQuery are virtual tables defined by SQL query that can display the results of a query or be used as the base for other queries.

SQL

SQL Database Apache Hadoop Data Science

Turn the face of your business from chaos to clarity

Dataconomy

JULY 28, 2023

Data preprocessing is essential for preparing textual data obtained from sources like Twitter for sentiment classification ( Image Credit ) Influence of data preprocessing on text classification Text classification is a significant research area that involves assigning natural language text documents to predefined categories.

Power BI

Power BI Data Preparation Exploratory Data Analysis Machine Learning

Data Lineage Through the Decades: Where It’s Going (And Where It’s Been)

Alation

FEBRUARY 7, 2023

They sought documentation to help them locate the source of the data from the warehouse. The developers spent time looking for a tool that could scan all the SQL code and Microsoft SSIS packages because that was the ETL tool being used.

Data Warehouse

Data Warehouse ETL Business Intelligence SQL

What is ThoughtSpot? Everything You Need to Know

phData

SEPTEMBER 4, 2024

ThoughtSpot is a cloud-based AI-powered analytics platform that uses natural language processing (NLP) or natural language query (NLQ) to quickly query results and generate visualizations without the user needing to know any SQL or table relations. Suppose your business requires more robust capabilities across your technology stack.

Analytics

Analytics Analytics SQL ETL

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Here’s the structured equivalent of this same data in tabular form: With structured data, you can use query languages like SQL to extract and interpret information. For instance, if the collected data was a text document in the form of a PDF, the data preprocessing—or preparation stage —can extract tables from this document.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

Metadata Management can be performed manually by creating spreadsheets and documents notating information about the various datasets. There are tools designed specifically to analyze your data lake files, determine the schema, and allow for SQL statements to be run directly off this data.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

These encoder-only architecture models are fast and effective for many enterprise NLP tasks, such as classifying customer feedback and extracting information from large documents. With multiple families in plan, the first release is the Slate family of models, which represent an encoder-only architecture.

AI

AI AI Machine Learning Machine Learning

Driving Progress with Open Data Science: Trends, Tools, and Opportunities

ODSC - Open Data Science

DECEMBER 9, 2024

Notebooks like Jupyter have also emerged as essential tools by combining documentation, code execution, and visualization in a single interactive interface. Other notebooks like Apache Zeppelin provide similar document-coding capabilities across multiple languages.

Data Science

Data Science Machine Learning Machine Learning Python

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Spark is more focused on data science, ingestion, and ETL, while HPCC Systems focuses on ETL and data delivery and governance. It’s not a widely known programming language like Java, Python, or SQL. ECL sounds compelling, but it is a new programming language and has fewer users than languages like Python or SQL.

Data Lakes

Data Lakes Clustering Big Data Big Data

Best Practices for Fact Tables in Dimensional Models

Pickl AI

AUGUST 11, 2024

Document and Communicate Maintain thorough documentation of fact table designs, including definitions, calculations, and relationships. ETL Tools Informatica, Talend, and Apache Airflow enable the extraction of data from source systems, transformation into the desired format, and loading into the dimensional model.

Data Quality

Data Quality Data Warehouse Data Governance Analytics

How to Combat the Lack of Standardization in Snowflake

phData

FEBRUARY 22, 2023

Thankfully there are open-source projects that don’t make you parse SQL into grammars yourself (ain’t nobody got time for that!), SQL Linting saves tons of time and ensures your team is looking for deeper logical issues in the PR instead of basic naming and formatting mistakes. such as SQLFluff. What is a Pattern?

SQL

SQL Data Quality Database ETL

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

This also means that it comes with a large community and comprehensive documentation. Thanks to its various operators, it is integrated with Python, Spark, Bash, SQL, and more. Flexibility: Its use cases are wider than just machine learning; for example, we can use it to set up ETL pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

This typically results in long-running ETL pipelines that cause decisions to be made on stale or old data. Business-Focused Operation Model: Teams can shed countless hours of managing long-running and complex ETL pipelines that do not scale.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Tips When Considering Streamsets Data Collector: As a Snowflake partner, Streamsets includes very intricate documentation on using Data Collector with Snowflake, including this book you can read here. Data Collector also offers replication and Change Data Capture (CDC) to be able to accurately and efficiently get your data into Snowflake.

Data Warehouse

Data Warehouse Azure AWS Database

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Serverless High Volume ETL data processing on Code Engine

Webinars

Trending Sources

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

Webinars

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

List of ETL Tools: Explore the Top ETL Tools for 2025

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Top ETL Tools: Unveiling the Best Solutions for Data Integration

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Top 10 Big Data CRM Tools To Increase Business Sales

The Best Data Management Tools For Small Businesses

What Is Fivetran and How Much Does It Cost?

Transitioning off Amazon Lookout for Metrics

How to Better Plan Your Snowflake Migration

What Are Snowflake’s Best Features for Data Transformation?

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Who is a BI Developer: Role, Responsibilities & Skills

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

How to Use Exploratory Notebooks [Best Practices]

Top 5 Fivetran Connectors for Healthcare

How to Pull Data From On-prem Systems Using Fivetran’s HVA Connectors

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

Hierarchies in Dimensional Modelling

Ultimate Guide to Data Lineage Directly in Snowflake

The Modern Data Stack Explained: What The Future Holds

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

What Free Tools Pair Well With The Snowflake AI Data Cloud?

Understanding Business Intelligence Architecture: Key Components

dbt and Sigma Integration

Beginner’s Guide To GCP BigQuery (Part 1)

Turn the face of your business from chaos to clarity

Data Lineage Through the Decades: Where It’s Going (And Where It’s Been)

What is ThoughtSpot? Everything You Need to Know

How to Manage Unstructured Data in AI and Machine Learning Projects

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Exploring the AI and data capabilities of watsonx

Driving Progress with Open Data Science: Trends, Tools, and Opportunities

Drowning in Data? A Data Lake May Be Your Lifesaver

Best Practices for Fact Tables in Dimensional Models

How to Combat the Lack of Standardization in Snowflake

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

The Ultimate Modern Data Stack Migration Guide

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Stay Connected