Blog, Data Lakes and SQL - Data Science Current

Data lakes vs. data warehouses: Decoding the data storage debate

Data Science Dojo

JANUARY 12, 2023

When it comes to data, there are two main types: data lakes and data warehouses. What is a data lake? An enormous amount of raw data is stored in its original format in a data lake until it is required for analytics applications. Which one is right for your business?

Data Lakes

Data Lakes Data Warehouse Hadoop Machine Learning

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

It offers full BI-Stack Automation, from source to data warehouse through to frontend. It supports a holistic data model, allowing for rapid prototyping of various models. It also supports a wide range of data warehouses, analytical databases, data lakes, frontends, and pipelines/ETL. pipelines, Azure Data Bricks.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

Managing and retrieving the right information can be complex, especially for data analysts working with large data lakes and complex SQL queries. This post highlights how Twilio enabled natural language-driven data exploration of business intelligence (BI) data with RAG and Amazon Bedrock.

SQL

SQL Data Lakes Data Analyst AWS

Webinars

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Integrating AWS Data Lake and RDS MS SQL: A Guide to Writing and Retrieving Data Securely

Dataversity

MARCH 26, 2024

Writing data to an AWS data lake and retrieving it to populate an AWS RDS MS SQL database involves several AWS services and a sequence of steps for data transfer and transformation. This process leverages AWS S3 for the data lake storage, AWS Glue for ETL operations, and AWS Lambda for orchestration.

Data Lakes

Data Lakes SQL AWS ETL

Generate financial industry-specific insights using generative AI and in-context fine-tuning

AWS Machine Learning Blog

NOVEMBER 12, 2024

In this blog post, we demonstrate prompt engineering techniques to generate accurate and relevant analysis of tabular data using industry-specific language. This is done by providing large language models (LLMs) in-context sample data with features and labels in the prompt.

SQL

SQL AWS AI AI

How to modernize data lakes with a data lakehouse architecture

IBM Journey to AI blog

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lakes

Data Lakes Data Warehouse Data Governance Analytics

CI/CD für Datenpipelines – Ein Game-Changer mit AnalyticsCreator

Data Science Blog

JULY 20, 2024

Automatisierung: Erstellt SQL-Code, DACPAC-Dateien, SSIS-Pakete, Data Factory-ARM-Vorlagen und XMLA-Dateien. Vielfältige Unterstützung: Kompatibel mit verschiedenen Datenbankmanagementsystemen wie MS SQL Server und Azure Synapse Analytics. Data Lakes: Unterstützt MS Azure Blob Storage.

Azure

Azure SQL Power BI Data Lakes

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena. The following figure shows a search query that was translated to SQL and run. The challenge is to assure quality.

SQL

SQL Database AWS Machine Learning

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. Today, generative AI can enable people without SQL knowledge. This generative AI task is called text-to-SQL, which generates SQL queries from natural language processing (NLP) and converts text into semantically correct SQL.

SQL

SQL AWS Database ML

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data. option("multiLine", "true").option("header",

SQL

SQL AWS Data Lakes Analytics

Was ist ein Data Lakehouse?

Data Science Blog

MAY 15, 2023

tl;dr Ein Data Lakehouse ist eine moderne Datenarchitektur, die die Vorteile eines Data Lake und eines Data Warehouse kombiniert. Die Definition eines Data Lakehouse Ein Data Lakehouse ist eine moderne Datenspeicher- und -verarbeitungsarchitektur, die die Vorteile von Data Lakes und Data Warehouses vereint.

Data Warehouse

Data Warehouse Data Lakes Azure AWS

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

Pickl AI

NOVEMBER 15, 2023

Discover the nuanced dissimilarities between Data Lakes and Data Warehouses. Data management in the digital age has become a crucial aspect of businesses, and two prominent concepts in this realm are Data Lakes and Data Warehouses. It acts as a repository for storing all the data.

Data Lakes

Data Lakes Data Warehouse Database ETL

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Alation

FEBRUARY 20, 2020

For many enterprises, a hybrid cloud data lake is no longer a trend, but becoming reality. Due to these needs, hybrid cloud data lakes emerged as a logical middle ground between the two consumption models. Without business context, business users are less likely to use the data lake and insights will be hard to come by.

Data Lakes

Data Lakes Cloud Data AWS Tableau

Generating value from enterprise data: Best practices for Text2SQL and generative AI

AWS Machine Learning Blog

JANUARY 4, 2024

One such area that is evolving is using natural language processing (NLP) to unlock new opportunities for accessing data through intuitive SQL queries. Instead of dealing with complex technical code, business users and data analysts can ask questions related to data and insights in plain language.

SQL

SQL Database AI AI

How AWS sales uses Amazon Q Business for customer engagement

AWS Machine Learning Blog

DECEMBER 11, 2024

This enables sales teams to interact with our internal sales enablement collateral, including sales plays and first-call decks, as well as customer references, customer- and field-facing incentive programs, and content on the AWS website, including blog posts and service documentation.

AWS

AWS Database AI AI

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP. Additionally, Amazon Simple Storage Service (Amazon S3) served as the central data lake, providing a scalable and cost-effective storage solution for the diverse data types collected from different systems.

AWS

AWS Data Governance Data Silos SQL

Why Open Table Format Architecture is Essential for Modern Data Systems

phData

NOVEMBER 8, 2024

Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms. In this blog, we will discuss: What is the Open Table format (OTF)? It can also be integrated into major data platforms like Snowflake. Contact phData Today!

Data Lakes

Data Lakes Data Warehouse Database Azure

Will They Blend? Theobald Meets HANA

Dataversity

MARCH 12, 2021

blog series, we experiment with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT […].

Data Lakes

Data Lakes SQL Database Data Science

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

Best 8 data version control tools for 2023 (Source: DagsHub ) Introduction With business needs changing constantly and the growing size and structure of datasets, it becomes challenging to efficiently keep track of the changes made to the data, which leads to unfortunate scenarios such as inconsistencies and errors in data.

Machine Learning

Machine Learning Machine Learning Data Lakes Big Data

Unleashing the power of Presto: The Uber case study

IBM Journey to AI blog

SEPTEMBER 25, 2023

Every day, millions of riders use the Uber app, unwittingly contributing to a complex web of data-driven decisions. This blog takes you on a journey into the world of Uber’s analytics and the critical role that Presto, the open source SQL query engine, plays in driving their success. What is Presto?

Data Lakes

Data Lakes Analytics Analytics Clustering

Will They Blend? Google BigQuery Meets Databricks

Dataversity

MAY 7, 2021

blog series, we experiment with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source DevOps on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT […].

Data Lakes

Data Lakes SQL Database Data Science

Why companies need to accelerate data warehousing solution modernization

IBM Journey to AI blog

APRIL 24, 2023

A data lakehouse contains an organization’s data in a unstructured, structured, semi-structured form, which can be stored indefinitely for immediate or future use. This data is used by data scientists and engineers who study data to gain business insights.

Data Warehouse

Data Warehouse Data Lakes Database Big Data

What is Snowpark — and Why Does it Matter? A phData Perspective

phData

SEPTEMBER 20, 2023

This blog was originally written by Keith Smith and updated for 2023 by Nick Goble & Dominick Rocco. You’ve probably heard of the Snowflake Data Cloud , but did you know that Snowflake also offers a revolutionary set of libraries and runtimes called Snowpark? What is Snowflake’s Snowpark? Why Does Snowpark Matter?

SQL

SQL Python Data Lakes Machine Learning

Will They Blend? Microsoft SharePoint Meets Google Cloud Storage

Dataversity

FEBRUARY 2, 2022

blog series, we experiment with the most interesting blends of data and tools. In the “Will They Blend?”

Data Lakes

Data Lakes SQL Analytics Analytics

Will They Blend? Twitter Meets Azure – Sentiment Analysis via API

Dataversity

JULY 9, 2021

blog series, we experiment with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source DevOps on the cloud with protected internal legacy tools, SQL with NoSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT […].

Azure

Azure Data Lakes SQL ML

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Amazon Simple Storage Service (Amazon S3) stores the model artifacts and creates a data lake to host the inference output, document analysis output, and other datasets in CSV format. The model is then trained using a fully managed infrastructure, validated, and published to the Amazon SageMaker Model Registry.

AWS

AWS Data Lakes ML ML

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

AWS Machine Learning Blog

JUNE 5, 2023

Many teams are turning to Athena to enable interactive querying and analyze their data in the respective data stores without creating multiple data copies. Athena allows applications to use standard SQL to query massive amounts of data on an S3 data lake. Create a data lake with Lake Formation.

Machine Learning

Machine Learning Machine Learning AWS Data Lakes

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

AWS re:Invent 2024 Highlights: Top takeaways from Swami Sivasubramanian to help customers manage generative AI at scale

AWS Machine Learning Blog

DECEMBER 16, 2024

Collaborate and build faster from a unified studio using familiar AWS tools for model development, generative AI, data processing, and SQL analyticswith Amazon Q Developer assisting you along the way. Access all your data whether its stored in data lakes, data warehouses, third-party or federated data sources.

AWS

AWS AI AI Data Warehouse

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

AWS Machine Learning Blog

JUNE 13, 2023

The natural language capabilities allow non-technical users to query data through conversational English rather than complex SQL. The AI and language models must identify the appropriate data sources, generate effective SQL queries, and produce coherent responses with embedded results at scale.

Database

Database SQL AWS AI

Everything is Connected, Everything Changes

Alation

OCTOBER 7, 2021

By viewing data spatially, inferences can be made, and the imagination can be sparked. But in a world where so much data has a location, it’s essential to think spatially. From an ancient lake to a data lake: A paleo perspective. I’ve been getting my hands dirty with data for a long time now.

Data Scientist

Data Scientist Data Lakes Data Science SQL

Alation Announces 2021.4 Release: Interview on Column-Level Lineage with Jason Ma, Senior Director of Product Management

Alation

NOVEMBER 18, 2021

External Tables Create a Shared View of the Data Lake. We’ve seen external tables become popular with our customers, who use them to provide a normalized relational schema on top of their data lake. Essentially, external tables create a shared view of the data lake, a single pane of glass everyone can reference.

Data Lakes

Data Lakes Data Governance SQL AWS

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks. Apache Hive was used to provide a tabular interface to data stored in HDFS, and to integrate with Apache Spark SQL. This also led to a backlog of data that needed to be ingested.

Data Science

Data Science AWS Hadoop Data Scientist

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. You can use query_string to filter your dataset by SQL and unload it to Amazon S3.

ML

ML ML AWS Data Warehouse

????????????SAS Viya?Azure Synapse?????????????

SAS Software

NOVEMBER 29, 2023

Azure Data Lake Storage (ADLS) Gen2のストレージアカウントの作成 3-2.ストレージアカウントのデータストレージコンテナの作成 Azure SynapseのSQLデータベースをSASライブラリとして定義 4-3.Azure Azure Data Lake Storage (ADLS) Gen2のストレージアカウントの作成 3-2.ストレージアカウントのデータストレージコンテナの作成 Azure SynapseのSQLデータベースをSASライブラリとして定義 4-3.Azure Bulkload機能について 3.BULKLOAD機能を利用するためのAzure側で必要なサービスの作成

Azure

Azure Data Lakes SQL Analytics

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

In our previous blog, Top 5 Fivetran Connectors for Financial Services , we explored Fivetran’s capabilities that address the data integration needs of the finance industry. Now, let’s cover the healthcare industry, which also has a surging demand for data and analytics, along with the underlying processes to make it happen.

SQL

SQL Data Warehouse Azure Cloud Data

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Flipboard

JUNE 26, 2023

Companies are faced with the daunting task of ingesting all this data, cleansing it, and using it to provide outstanding customer experience. Typically, companies ingest data from multiple sources into their data lake to derive valuable insights from the data.

AWS

AWS ML ML ETL

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

By leveraging data services and APIs, a data fabric can also pull together data from legacy systems, data lakes, data warehouses and SQL databases, providing a holistic view into business performance. Then, it applies these insights to automate and orchestrate the data lifecycle.

Data Lakes

Data Lakes AI AI Data Governance

Data platform trinity: Competitive or complementary?

IBM Journey to AI blog

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. All phases of the data-information lifecycle.

Data Lakes

Data Lakes Data Warehouse Azure Apache Hadoop

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

And you should have experience working with big data platforms such as Hadoop or Apache Spark. Additionally, data science requires experience in SQL database coding and an ability to work with unstructured data of various types, such as video, audio, pictures and text.

Data Science

Data Science Analytics Analytics Data Scientist

Now available in Tableau 2021.1—Einstein Discovery in Tableau, quick LODs, a new unified notification experience, and more

Tableau

FEBRUARY 17, 2021

For more detail on each of these integrations, check out our Einstein Discovery in Tableau blog post. . You can now connect to your data in Azure SQL Database (with Azure Active Directory) and Azure Data Lake Gen 2. Stay on top of important updates with our new unified notification experience.

Tableau

Tableau Azure Data Quality ML

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 27, 2023

Configure the following scopes on your connected app: Manage user data via APIs ( api ). Perform ANSI SQL queries on Salesforce Data Cloud data (Data Cloud_query_api ). Manage Data Cloud profile data ( Data Cloud_profile_api ). Drag and drop the file, then choose Edit in SQL.

ML

ML ML AWS SQL

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

These tools may have their own versioning system, which can be difficult to integrate with a broader data version control system. For instance, our data lake could contain a variety of relational and non-relational databases, files in different formats, and data stored using different cloud providers. DVC Git LFS neptune.ai

ML

ML ML Data Lakes Machine Learning

Data lakes vs. data warehouses: Decoding the data storage debate

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Webinars

Trending Sources

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Webinars

Integrating AWS Data Lake and RDS MS SQL: A Guide to Writing and Retrieving Data Securely

Generate financial industry-specific insights using generative AI and in-context fine-tuning

How to modernize data lakes with a data lakehouse architecture

CI/CD für Datenpipelines – Ein Game-Changer mit AnalyticsCreator

Imperva optimizes SQL generation from natural language using Amazon Bedrock

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Was ist ein Data Lakehouse?

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Generating value from enterprise data: Best practices for Text2SQL and generative AI

How AWS sales uses Amazon Q Business for customer engagement

Shaping the future: OMRON’s data-driven journey with AWS

Why Open Table Format Architecture is Essential for Modern Data Systems

Will They Blend? Theobald Meets HANA

Best 8 Data Version Control Tools for Machine Learning 2024

Unleashing the power of Presto: The Uber case study

Will They Blend? Google BigQuery Meets Databricks

Why companies need to accelerate data warehousing solution modernization

What is Snowpark — and Why Does it Matter? A phData Perspective

Will They Blend? Microsoft SharePoint Meets Google Cloud Storage

Will They Blend? Twitter Meets Azure – Sentiment Analysis via API

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS re:Invent 2024 Highlights: Top takeaways from Swami Sivasubramanian to help customers manage generative AI at scale

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

Everything is Connected, Everything Changes

Alation Announces 2021.4 Release: Interview on Column-Level Lineage with Jason Ma, Senior Director of Product Management

How Rocket Companies modernized their data science solution on AWS

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

????????????SAS Viya?Azure Synapse?????????????

Top 5 Fivetran Connectors for Healthcare

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Data democratization: How data architecture can drive business decisions and AI initiatives

Data platform trinity: Competitive or complementary?

Data science vs data analytics: Unpacking the differences

Now available in Tableau 2021.1—Einstein Discovery in Tableau, quick LODs, a new unified notification experience, and more

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

How to Version Control Data in ML for Various Data Sources

Stay Connected