Data Lakes, Data Science and SQL - Data Science Current

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

Data lakes vs. data warehouses: Decoding the data storage debate

Data Science Dojo

JANUARY 12, 2023

When it comes to data, there are two main types: data lakes and data warehouses. What is a data lake? An enormous amount of raw data is stored in its original format in a data lake until it is required for analytics applications. Which one is right for your business?

Data Lakes

Data Lakes Data Warehouse Hadoop Machine Learning

What Is a Lakebase?

databricks

JUNE 11, 2025

Separation of storage and compute : Lakebases store their data in modern data lakes (object stores) in open formats, which enables scaling compute and storage separately, leading to lower TCO and eliminating lock-in. At zero, the cost of the lakebase is just the cost of storing the data on cheap data lakes.

Database

Database Data Lakes ETL Analytics

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. It offers full BI-Stack Automation, from source to data warehouse through to frontend.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Data lakehouse

Dataconomy

JUNE 18, 2025

Data Lakehouse has emerged as a significant innovation in data management architecture, bridging the advantages of both data lakes and data warehouses. By enabling organizations to efficiently store various data types and perform analytics, it addresses many challenges faced in traditional data ecosystems.

Data Lakes

Data Lakes Data Warehouse Business Intelligence Business Intelligence

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools.

Data Science

Data Science AWS Hadoop Data Scientist

The IKEA of Data: How to Bring Modular Thinking to Your Data Architecture (and Why It Works)

IBM Data Science in Practice

MAY 19, 2025

The Data Dilemma: From Chaos toClarity In the world of data management, weve all beenthere: A simple request spirals into a maze of thingslike: A CSV labeled final_v2_final_final.csv A Parquet file in a forgotten S3folder A table with brokenlineage An abandoned SQL notebook from yearsago What starts as a data lake often becomes a dataswamp!

Data Lakes

Data Lakes SQL Data Science Data Engineering

Cloud Data Science News Beta #1

Data Science 101

NOVEMBER 11, 2019

Welcome to the first beta edition of Cloud Data Science News. This will cover major announcements and news for doing data science in the cloud. Azure Synapse Analytics This is the future of data warehousing. Azure Synapse Analytics This is the future of data warehousing. Microsoft Azure.

Cloud Data

Cloud Data Data Science Azure Clustering

Data Version Control for Data Lakes: Handling the Changes in Large Scale

ODSC - Open Data Science

SEPTEMBER 27, 2023

In the ever-evolving world of big data, managing vast amounts of information efficiently has become a critical challenge for businesses across the globe. As data lakes gain prominence as a preferred solution for storing and processing enormous datasets, the need for effective data version control mechanisms becomes increasingly evident.

Data Lakes

Data Lakes Data Warehouse Database Big Data

Data Science News from Microsoft Ignite 2019

Data Science 101

NOVEMBER 7, 2019

Microsoft just held one of its largest conferences of the year, and a few major announcements were made which pertain to the cloud data science world. Azure Synapse Analytics can be seen as a merge of Azure SQL Data Warehouse and Azure Data Lake. Those are the big data science announcements of the week.

Data Science

Data Science Azure SQL Machine Learning

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

Managing and retrieving the right information can be complex, especially for data analysts working with large data lakes and complex SQL queries. This post highlights how Twilio enabled natural language-driven data exploration of business intelligence (BI) data with RAG and Amazon Bedrock.

SQL

SQL Data Lakes Data Analyst AWS

Big Data vs. Data Science: Demystifying the Buzzwords

Pickl AI

APRIL 21, 2025

Summary: Big Data refers to the vast volumes of structured and unstructured data generated at high speed, requiring specialized tools for storage and processing. Data Science, on the other hand, uses scientific methods and algorithms to analyses this data, extract insights, and inform decisions.

Big Data

Big Data Big Data Data Science Machine Learning

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

With this full-fledged solution, you don’t have to spend all your time and effort combining different services or duplicating data. Overview of One Lake Fabric features a lake-centric architecture, with a central repository known as OneLake. Now, we can save the data as delta tables to use later for sales analytics.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. Today, generative AI can enable people without SQL knowledge. This generative AI task is called text-to-SQL, which generates SQL queries from natural language processing (NLP) and converts text into semantically correct SQL.

SQL

SQL AWS Database ML

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Here’s what we found for both skills and platforms that are in demand for data scientist jobs. Data Science Skills and Competencies Aside from knowing particular frameworks and languages, there are various topics and competencies that any data scientist should know. Joking aside, this does infer particular skills.

Data Science

Data Science Data Scientist Computer Science Computer Science

How to modernize data lakes with a data lakehouse architecture

IBM Journey to AI blog

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lakes

Data Lakes Data Warehouse Data Governance Analytics

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena. The following figure shows a search query that was translated to SQL and run. The challenge is to assure quality.

SQL

SQL Database AWS Machine Learning

Data integration

Dataconomy

JUNE 18, 2025

Common use cases for data integration Data integration demonstrates its value across a variety of applications. Feeding data for analytics Integrated data is essential for populating data warehouses, data lakes, and lakehouses, ensuring that analysts have access to complete datasets for their work.

Data Warehouse

Data Warehouse Data Silos ETL Big Data

Sneak peek at Microsoft Fabric price and its promising features

Dataconomy

JUNE 1, 2023

Unified data storage : Fabric’s centralized data lake, Microsoft OneLake, eliminates data silos and provides a unified storage system, simplifying data access and retrieval. OneLake is designed to store a single copy of data in a unified location, leveraging the open-source Apache Parquet format.

Power BI

Power BI Data Lakes Azure Data Silos

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to specific questions.

Data Science

Data Science Analytics Analytics Data Scientist

How Databricks and Tableau customers are fueling innovation with data lakehouse architecture

Tableau

JUNE 8, 2021

In many of the conversations we have with IT and business leaders, there is a sense of frustration about the speed of time-to-value for big data and data science projects. We often hear that organizations have invested in data science capabilities but are struggling to operationalize their machine learning models.

Tableau

Tableau Data Lakes Data Warehouse SQL

Azure Machine Learning – Empowering Your Data Science Journey

How to Learn Machine Learning

MAY 2, 2025

Best Practices for Azure Machine Learning Projects To get the most out of Azure Machine Learning, consider these best practices: Data Management Use Azure Data Stores : Connect to various data sources including Azure Blob Storage, Azure Data Lake, and Azure SQL Database for efficient data access.

Azure

Azure Machine Learning Machine Learning Data Science

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

Pickl AI

NOVEMBER 15, 2023

Discover the nuanced dissimilarities between Data Lakes and Data Warehouses. Data management in the digital age has become a crucial aspect of businesses, and two prominent concepts in this realm are Data Lakes and Data Warehouses. It acts as a repository for storing all the data.

Data Lakes

Data Lakes Data Warehouse Database ETL

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Data management problems can also lead to data silos; disparate collections of databases that don’t communicate with each other, leading to flawed analysis based on incomplete or incorrect datasets. One way to address this is to implement a data lake: a large and complex database of diverse datasets all stored in their original format.

Data Lakes

Data Lakes Clustering Big Data Big Data

Simplifying Time Series Analysis for Data Scientists

ODSC - Open Data Science

SEPTEMBER 12, 2023

Most data scientists are familiar with the concept of time series data and work with it often. The time series database (TSDB) , however, is still an underutilized tool in the data science community. Typically, time series analysis is performed either on CSV files or data lakes.

Data Scientist

Data Scientist Database Data Lakes Data Science

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP. Additionally, Amazon Simple Storage Service (Amazon S3) served as the central data lake, providing a scalable and cost-effective storage solution for the diverse data types collected from different systems.

AWS

AWS Data Governance Data Silos SQL

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

It offers extensibility and integration with various data engineering tools. dbt (Data Build Tool): dbt is an open-source data transformation and modeling tool. It allows data engineers to build, test, and maintain data pipelines in a version-controlled manner.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Flipboard

DECEMBER 6, 2023

In this post, we discuss a Q&A bot use case that Q4 has implemented, the challenges that numerical and structured datasets presented, and how Q4 concluded that using SQL may be a viable solution. This would have required a dedicated cross-disciplinary team with expertise in data science, machine learning, and domain knowledge.

SQL

SQL Database AWS Machine Learning

Building an Effective OSS Management Layer for Your Data Lake

ODSC - Open Data Science

OCTOBER 13, 2024

Be sure to check out her talk, “ Don’t Go Over the Deep End: Building an Effective OSS Management Layer for Your Data Lake ,” there! Managing a data lake can often feel like being lost at sea — especially when dealing with both structured and unstructured data.

Data Lakes

Data Lakes Database Data Pipeline SQL

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

The following points illustrates some of the main reasons why data versioning is crucial to the success of any data science and machine learning project: Storage space One of the reasons of versioning data is to be able to keep track of multiple versions of the same data which obviously need to be stored as well.

Machine Learning

Machine Learning Machine Learning Data Lakes Data Science

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

With Great Expectations , data teams can express what they “expect” from their data using simple assertions. Great Expectations provides support for different data backends such as flat file formats, SQL databases, Pandas dataframes and Sparks, and comes with built-in notification and data documentation functionality.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Amazon Simple Storage Service (Amazon S3) stores the model artifacts and creates a data lake to host the inference output, document analysis output, and other datasets in CSV format. SageMaker JumpStart provided deployable models that could be trained for object detection use cases with minimal data science knowledge and overhead.

AWS

AWS Data Lakes ML ML

Unleashing the power of Presto: The Uber case study

IBM Journey to AI blog

SEPTEMBER 25, 2023

Every day, millions of riders use the Uber app, unwittingly contributing to a complex web of data-driven decisions. This blog takes you on a journey into the world of Uber’s analytics and the critical role that Presto, the open source SQL query engine, plays in driving their success. What is Presto?

Data Lakes

Data Lakes Analytics Analytics Clustering

What is Snowpark — and Why Does it Matter? A phData Perspective

phData

SEPTEMBER 20, 2023

Snowpark is the set of libraries and runtimes in Snowflake that securely deploy and process non-SQL code, including Python , Java, and Scala. As a declarative language, SQL is very powerful in allowing users from all backgrounds to ask questions about data. What is Snowflake’s Snowpark? Why Does Snowpark Matter?

SQL

SQL Python Data Lakes Machine Learning

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

These professionals will work with their colleagues to ensure that data is accessible, with proper access. So let’s go through each step one by one, and help you build a roadmap toward becoming a data engineer. Identify your existing data science strengths. Stay on top of data engineering trends. Get more training!

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

AWS Machine Learning Blog

JUNE 5, 2023

Many teams are turning to Athena to enable interactive querying and analyze their data in the respective data stores without creating multiple data copies. Athena allows applications to use standard SQL to query massive amounts of data on an S3 data lake. Create a data lake with Lake Formation.

Machine Learning

Machine Learning Machine Learning AWS Data Lakes

How Databricks and Tableau customers are fueling innovation with data lakehouse architecture

Tableau

JUNE 8, 2021

In many of the conversations we have with IT and business leaders, there is a sense of frustration about the speed of time-to-value for big data and data science projects. We often hear that organizations have invested in data science capabilities but are struggling to operationalize their machine learning models.

Tableau

Tableau Data Lakes Data Warehouse SQL

Will They Blend? Theobald Meets HANA

Dataversity

MARCH 12, 2021

blog series, we experiment with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT […].

Data Lakes

Data Lakes SQL Database Data Science

Everything is Connected, Everything Changes

Alation

OCTOBER 7, 2021

In this essay, Jason reflects on the value of thinking spatially about data, showing how his experience as a graduate student influences his role as a data scientist today. The popularity of location data and GIS-styled analyses has amplified a common cry in GIS- turned-data-science circles: “Spatial isn’t special!”

Data Scientist

Data Scientist Data Lakes Data Science SQL

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

Will They Blend? Google BigQuery Meets Databricks

Dataversity

MAY 7, 2021

blog series, we experiment with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source DevOps on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT […].

Data Lakes

Data Lakes SQL Database Data Science

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Data lakes vs. data warehouses: Decoding the data storage debate

Trending Sources

What Is a Lakebase?

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data lakehouse

How Rocket Companies modernized their data science solution on AWS

The IKEA of Data: How to Bring Modular Thinking to Your Data Architecture (and Why It Works)

Cloud Data Science News Beta #1

Data Version Control for Data Lakes: Handling the Changes in Large Scale

Data Science News from Microsoft Ignite 2019

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Big Data vs. Data Science: Demystifying the Buzzwords

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

40 Must-Know Data Science Skills and Frameworks for 2023

How to modernize data lakes with a data lakehouse architecture

Imperva optimizes SQL generation from natural language using Amazon Bedrock

Data integration

Sneak peek at Microsoft Fabric price and its promising features

Data science vs data analytics: Unpacking the differences

How Databricks and Tableau customers are fueling innovation with data lakehouse architecture

Azure Machine Learning – Empowering Your Data Science Journey

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Drowning in Data? A Data Lake May Be Your Lifesaver

Simplifying Time Series Analysis for Data Scientists

Shaping the future: OMRON’s data-driven journey with AWS

Essential data engineering tools for 2023: Empowering for management and analysis

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Building an Effective OSS Management Layer for Your Data Lake

Best 8 Data Version Control Tools for Machine Learning 2024

11 Open Source Data Exploration Tools You Need to Know in 2023

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Unleashing the power of Presto: The Uber case study

What is Snowpark — and Why Does it Matter? A phData Perspective

How to Shift from Data Science to Data Engineering

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

How Databricks and Tableau customers are fueling innovation with data lakehouse architecture

Will They Blend? Theobald Meets HANA

Everything is Connected, Everything Changes

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Will They Blend? Google BigQuery Meets Databricks

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Stay Connected