Data Pipeline, Document and SQL - Data Science Current

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

KDnuggets

JULY 15, 2025

By Josep Ferrer , KDnuggets AI Content Specialist on July 15, 2025 in Data Science Image by Author Delivering the right data at the right time is a primary need for any organization in the data-driven society. But lets be honest: creating a reliable, scalable, and maintainable data pipeline is not an easy task.

Data Pipeline

Data Pipeline Natural Language Processing Data Science SQL

8 Ways to Scale your Data Science Workloads

KDnuggets

JULY 22, 2025

With just a few lines of authentication code, you can run SQL queries right from a notebook and pull the results into a Python DataFrame for analysis. Get Started: BigQuery Sandbox Documentation Example Notebook: Use BigQuery in Colab 3. MemoryError exceptions are all too common, forcing you to downsample your data early on.

Data Science

Data Science Natural Language Processing Machine Learning Machine Learning

Data pipelines

Dataconomy

JUNE 3, 2025

Data pipelines are essential in our increasingly data-driven world, enabling organizations to automate the flow of information from diverse sources to analytical platforms. What are data pipelines? Purpose of a data pipeline Data pipelines serve various essential functions within an organization.

Data Pipeline

Data Pipeline ETL Analytics Analytics

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

databricks

JULY 15, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Azure

Azure Power BI AI AI

What’s New: Lakeflow Jobs Provides More Efficient Data Orchestration

databricks

JULY 24, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Data Pipeline

Data Pipeline Data Engineering Data Engineer Data Engineering

A Complete Guide to Matplotlib: From Basics to Advanced Plots

KDnuggets

JULY 21, 2025

Whether you’re visualizing climate data or plotting sales trends, the goal is clarity. The key is to start simple, iterate often, and don’t fear the documentation. Remember, even experts Google “how to add a second y-axis” sometimes.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

10 Free Online Courses to Master Python in 2025

KDnuggets

JULY 24, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 10 Free Online Courses to Master Python in 2025 How can you master Python for free?

Python

Python Data Science Natural Language Processing Machine Learning

What’s New in Lakeflow Declarative Pipelines: July 2025

databricks

JULY 22, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! Preview coming soon.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

The Ultimate Guide to Vibe Coding: 6 Powerful Frameworks Transforming Software Development

Data Science Dojo

JULY 24, 2025

Instead of sweating the syntax, you describe the “ vibe ” of what you want—be it a data pipeline, a web app, or an analytics automation script—and frameworks like Replit, GitHub Copilot, Gemini Code Assist, and others do the heavy lifting. Document Your Work : Maintain clear documentation for future maintenance.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

What’s New: Zerobus and Other Announcements Improve Data Ingestion for Lakeflow Connect

databricks

JULY 23, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Database

Database Data Warehouse Data Engineering Data Engineer

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis.

ETL

ETL Data Warehouse Analytics Analytics

The Lifecycle of Feature Engineering: From Raw Data to Model-Ready Inputs

Flipboard

JULY 16, 2025

Document Everything : Keep clear and versioned documentation of how each feature is created, transformed, and validated. Use Automation : Use tools like feature stores, pipelines, and automated feature selection to maintain consistency and reduce manual errors.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

Hacker News

APRIL 7, 2025

Knowledge-intensive analytical applications retrieve context from both structured tabular data and unstructured, text-free documents for effective decision-making. Large language models (LLMs) have made it significantly easier to prototype such retrieval and reasoning data pipelines.

Data Pipeline

Data Pipeline SQL Analytics Analytics

Build a conversational data assistant, Part 2 – Embedding generative business intelligence with Amazon Q in QuickSight

AWS Machine Learning Blog

JULY 11, 2025

In Part 1 of this series, we explored how Amazon’s Worldwide Returns & ReCommerce (WWRR) organization built the Returns & ReCommerce Data Assist (RRDA)—a generative AI solution that transforms natural language questions into validated SQL queries using Amazon Bedrock Agents.

Business Intelligence

Business Intelligence Business Intelligence SQL AWS

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

AWS Machine Learning Blog

DECEMBER 4, 2024

They have structured data such as sales transactions and revenue metrics stored in databases, alongside unstructured data such as customer reviews and marketing reports collected from various channels. Use Amazon Athena SQL queries to provide insights.

AWS

AWS AI AI SQL

Data integration

Dataconomy

JUNE 18, 2025

Feeding data for analytics Integrated data is essential for populating data warehouses, data lakes, and lakehouses, ensuring that analysts have access to complete datasets for their work. Data integration tools and techniques The landscape of data integration is constantly evolving, driven by technological advancements.

Data Warehouse

Data Warehouse Data Silos Big Data ETL

5 Fun Generative AI Projects for Absolute Beginners

Flipboard

JULY 23, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 5 Fun Generative AI Projects for Absolute Beginners New to generative AI?

Natural Language Processing

Natural Language Processing Data Science Python Machine Learning

Smarter Decisions at Scale: How Lotus’s Uses AI and NLQ to Empower 3,000+ Stores with Real-Time Intelligence

databricks

JULY 28, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

AI

AI AI Data Science Artificial Intelligence

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

AWS Machine Learning Blog

JANUARY 15, 2025

Based on the customer query and context, the system dynamically generates text-to-SQL queries, summarizes knowledge base results using semantic search , and creates personalized vehicle brochures based on the customers preferences. This seamless process is facilitated by Retrieval Augmentation Generation (RAG) and a text-to-SQL framework.

AWS

AWS SQL AI AI

The Practitioner’s Ultimate Guide to Scalable Logging

databricks

JULY 25, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Clustering

Clustering Azure AWS Python

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

AWS Machine Learning Blog

JULY 3, 2025

This fragmented approach consumed valuable time and introduced the risk of human error in data interpretation and analysis. The initial implementation established basic RAG functionality by feeding the Amazon Bedrock knowledge base with tabular data and documentation. The solution architecture evolved through several iterations.

AWS

AWS AI AI SQL

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

With all this packaged into a well-governed platform, Snowflake continues to set the standard for data warehousing and beyond. Snowflake supports data sharing and collaboration across organizations without the need for complex data pipelines.

Machine Learning

Machine Learning Machine Learning Data Science Data Preparation

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

AWS Machine Learning Blog

OCTOBER 24, 2024

The agent can generate SQL queries using natural language questions using a database schema DDL (data definition language for SQL) and execute them against a database instance for the database tier. We use Amazon Bedrock Agents with two knowledge bases for this assistant. Create, invoke, test, and deploy the agent.

AWS

AWS SQL Database AI

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

This tool democratizes data access across the organization, enabling even nontechnical users to gain valuable insights. A standout application is the SQL-to-natural language capability, which translates complex SQL queries into plain English and vice versa, bridging the gap between technical and business teams.

AWS

AWS Data Governance Data Silos SQL

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

Intuitive Workflow Design Workflows should be easy to follow and visually organized, much like clean, well-structured SQL or Python code. WHERE d.name = 'Sales'; Matillion is designed as a no/low-code ELT tool, so lets leave the SQL deep dive for another time and focus on making workflows as clean and intuitive as possible!

AI

AI AI SQL ETL

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

The blog post explains how the Internal Cloud Analytics team leveraged cloud resources like Code-Engine to improve, refine, and scale the data pipelines. Background One of the Analytics teams tasks is to load data from multiple sources and unify it into a data warehouse. Thus, it has only a minimal footprint.

ETL

ETL Data Pipeline Database Data Warehouse

Big Data vs. Data Science: Demystifying the Buzzwords

Pickl AI

APRIL 21, 2025

Semi-Structured Data: Data that has some organizational properties but doesn’t fit a rigid database structure (like emails, XML files, or JSON data used by websites). Unstructured Data: Data with no predefined format (like text documents, social media posts, images, audio files, videos).

Big Data

Big Data Big Data Data Science Machine Learning

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

databricks

JUNE 12, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! With the recent release of Apache Spark 4.0,

SQL

SQL Data Engineering Data Engineer Data Engineering

Ask HN: Who wants to be hired? (July 2025)

Hacker News

JULY 1, 2025

Remote: Open Willing to relocate: Yes Technologies: Java, Scala, Python, Angular, React, Apache Spark, SQL & NoSQL, Dart, Typescript/Javascript, HTML/CSS/SCSS Résumé/CV: https://drive.google.com/file/d/1tNLYIjtH8dgBSMGPbVg3qA0-6q_. Email: hoglan (dot) jd (at) gmail Hello! Prefer time in person.

Python

Python AWS SQL ML

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Tools like Python, SQL, Apache Spark, and Snowflake help engineers automate workflows and improve efficiency. Learning these tools is crucial for building scalable data pipelines.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Ask HN: Who is hiring? (July 2025)

Hacker News

JULY 1, 2025

Good at Go, Kubernetes (Understanding how to manage stateful services in a multi-cloud environment) We have a Python service in our Recommendation pipeline, so some ML/Data Science knowledge would be good. Queries everywhere – SQL lives in Slack snippets, BI folders, dusty Git repos, and copy-pasted Notion pages.

Python

Python AWS ML ML

Ask HN: What Are You Working On? (June 2025)

Hacker News

JUNE 29, 2025

This is exactly the kind of thing I've had in mind as one of the offshoots for PRQL for processing data beyond just generating SQL. Do you know if the FPGA and/or hardware communities use any type of formalism for design or documentation of state machines? Happy to chat if you're into VMs, query engines, or DSLs.

AI

AI AI Database Python

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

AWS Machine Learning Blog

DECEMBER 6, 2023

However, they can’t generalize well to enterprise-specific questions because, to generate an answer, they rely on the public data they were exposed to during pre-training. However, the popular RAG design pattern with semantic search can’t answer all types of questions that are possible on documents.

SQL

SQL AWS Analytics Analytics

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

Using structured data to answer questions requires a way to effectively extract data that’s relevant to a user’s query. We formulated a text-to-SQL approach where by a user’s natural language query is converted to a SQL statement using an LLM. The SQL is run by Amazon Athena to return the relevant data.

SQL

SQL AWS AI AI

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

A Few Proven Suggestions for Handling Large Data Sets

Smart Data Collective

SEPTEMBER 26, 2021

The raw data can be fed into a database or data warehouse. An analyst can examine the data using business intelligence tools to derive useful information. . To arrange your data and keep it raw, you need to: Make sure the data pipeline is simple so you can easily move data from point A to point B.

Database

Database Data Visualization Big Data Big Data

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Great Expectations GitHub | Website Great Expectations (GX) helps data teams build a shared understanding of their data through quality testing, documentation, and profiling. With Great Expectations , data teams can express what they “expect” from their data using simple assertions.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

User support arrangements Consider the availability and quality of support from the provider or vendor, including documentation, tutorials, forums, customer service, etc. Kubeflow integrates with popular ML frameworks, supports versioning and collaboration, and simplifies the deployment and management of ML pipelines on Kubernetes clusters.

Machine Learning

Machine Learning Machine Learning ML ML

What Are Snowflake’s Best Features for Data Transformation?

phData

AUGUST 8, 2024

Putting the T for Transformation in ELT (ETL) is essential to any data pipeline. After extracting and loading your data into the Snowflake AI Data Cloud , you may wonder how best to transform it. Luckily, Snowflake answers this question with many features designed to transform your data for all your analytic use cases.

SQL

SQL Data Pipeline Python ETL

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

By using Fivetran, businesses can reduce the time and resources required for data integration, enabling them to focus on extracting insights from the data rather than managing the ELT process. Building data pipelines manually is an expensive and time-consuming process. Why Use Fivetran?

Data Warehouse

Data Warehouse Data Engineering Data Engineer Data Engineering

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Flipboard

NOVEMBER 24, 2023

This use case highlights how large language models (LLMs) are able to become a translator between human languages (English, Spanish, Arabic, and more) and machine interpretable languages (Python, Java, Scala, SQL, and so on) along with sophisticated internal reasoning.

Database

Database AWS ETL SQL

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

That said, dbt provides the ability to generate data vault models and also allows you to write your data transformations using SQL and code-reusable macros powered by Jinja2 to run your data pipelines in a clean and efficient way. The most important reason for using DBT in Data Vault 2.0

SQL

SQL Data Observability Data Quality Data Pipeline

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

Our continued investments in connectivity with Google technologies help ensure your data is secure, governed, and scalable. Tableau’s lightning-fast Google BigQuery connector allows customers to engineer optimized data pipelines with direct connections that power business-critical reporting. Direct connection to Google BigQuery.

Tableau

Tableau Analytics Analytics Machine Learning

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

8 Ways to Scale your Data Science Workloads

Webinars

Trending Sources

Data pipelines

Webinars

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

What’s New: Lakeflow Jobs Provides More Efficient Data Orchestration

A Complete Guide to Matplotlib: From Basics to Advanced Plots

10 Free Online Courses to Master Python in 2025

What’s New in Lakeflow Declarative Pipelines: July 2025

The Ultimate Guide to Vibe Coding: 6 Powerful Frameworks Transforming Software Development

Go vs. Python for Modern Data Workflows: Need Help Deciding?

What’s New: Zerobus and Other Announcements Improve Data Ingestion for Lakeflow Connect

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

The Lifecycle of Feature Engineering: From Raw Data to Model-Ready Inputs

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

Build a conversational data assistant, Part 2 – Embedding generative business intelligence with Amazon Q in QuickSight

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Data integration

5 Fun Generative AI Projects for Absolute Beginners

Smarter Decisions at Scale: How Lotus’s Uses AI and NLQ to Empower 3,000+ Stores with Real-Time Intelligence

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

The Practitioner’s Ultimate Guide to Scalable Logging

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

How Dataiku and Snowflake Strengthen the Modern Data Stack

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

Shaping the future: OMRON’s data-driven journey with AWS

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Serverless High Volume ETL data processing on Code Engine

Big Data vs. Data Science: Demystifying the Buzzwords

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

Ask HN: Who wants to be hired? (July 2025)

Best Data Engineering Tools Every Engineer Should Know

Ask HN: Who is hiring? (July 2025)

Ask HN: What Are You Working On? (June 2025)

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

How to Build Effective Data Pipelines in Snowpark

A Few Proven Suggestions for Handling Large Data Sets

11 Open Source Data Exploration Tools You Need to Know in 2023

MLOps Landscape in 2023: Top Tools and Platforms

What Are Snowflake’s Best Features for Data Transformation?

What Is Fivetran and How Much Does It Cost?

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Stay Connected