Data Lakes, ML and SQL - Data Science Current

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications.

SQL

SQL Data Lakes Data Analyst AWS

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data.

SQL

SQL AWS Data Lakes AI

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. Today, generative AI can enable people without SQL knowledge. This generative AI task is called text-to-SQL, which generates SQL queries from natural language processing (NLP) and converts text into semantically correct SQL.

SQL

SQL AWS Database ML

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Generate financial industry-specific insights using generative AI and in-context fine-tuning

AWS Machine Learning Blog

NOVEMBER 12, 2024

NOTE : Since we used an SQL query engine to query the dataset for this demonstration, the prompts and generated outputs mention SQL below. The question in the preceding example doesn’t require a lot of complex analysis on the data returned from the ETF dataset. A user can ask a business- or industry-related question for ETFs.

SQL

SQL AWS AI AI

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. You can use query_string to filter your dataset by SQL and unload it to Amazon S3.

ML

ML ML AWS Data Warehouse

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Flipboard

JUNE 26, 2023

Companies are faced with the daunting task of ingesting all this data, cleansing it, and using it to provide outstanding customer experience. Typically, companies ingest data from multiple sources into their data lake to derive valuable insights from the data. Run the AWS Glue ML transform job.

AWS

AWS ML ML ETL

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Flipboard

DECEMBER 18, 2023

Customers use Amazon Redshift as a key component of their data architecture to drive use cases from typical dashboarding to self-service analytics, real-time analytics, machine learning (ML), data sharing and monetization, and more. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

AWS

AWS Data Warehouse ETL SQL

Generating value from enterprise data: Best practices for Text2SQL and generative AI

AWS Machine Learning Blog

JANUARY 4, 2024

One such area that is evolving is using natural language processing (NLP) to unlock new opportunities for accessing data through intuitive SQL queries. Instead of dealing with complex technical code, business users and data analysts can ask questions related to data and insights in plain language.

SQL

SQL Database AI AI

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 27, 2023

SageMaker endpoints can be registered to the Salesforce Data Cloud to activate predictions in Salesforce. SageMaker Canvas provides a no-code experience to access data from Salesforce Data Cloud and build, test, and deploy models using just a few clicks. Manage Data Cloud profile data ( Data Cloud_profile_api ).

ML

ML ML AWS SQL

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP. Additionally, Amazon Simple Storage Service (Amazon S3) served as the central data lake, providing a scalable and cost-effective storage solution for the diverse data types collected from different systems.

AWS

AWS Data Governance Data Silos SQL

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks. Apache Hive was used to provide a tabular interface to data stored in HDFS, and to integrate with Apache Spark SQL. This also led to a backlog of data that needed to be ingested.

Data Science

Data Science AWS Hadoop Data Scientist

Cloud Data Science News Beta #1

Data Science 101

NOVEMBER 11, 2019

It combines data warehousing and data lakes into a simple query interface for a simple and fast analytics service. SQL Server 2019 SQL Server 2019 went Generally Available. Amazon Web Services.

Cloud Data

Cloud Data Data Science Azure Clustering

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Flipboard

DECEMBER 6, 2023

In this post, we discuss a Q&A bot use case that Q4 has implemented, the challenges that numerical and structured datasets presented, and how Q4 concluded that using SQL may be a viable solution. RAG with semantic search – Conventional RAG with semantic search was the last step before moving to SQL generation.

SQL

SQL Database AWS Machine Learning

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Solution overview Amazon SageMaker is a fully managed service that helps developers and data scientists build, train, and deploy machine learning (ML) models. Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images.

AWS

AWS Data Lakes ML ML

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

AWS Machine Learning Blog

JUNE 5, 2023

Data is the foundation for machine learning (ML) algorithms. One of the most common formats for storing large amounts of data is Apache Parquet due to its compact and highly efficient format. Athena allows applications to use standard SQL to query massive amounts of data on an S3 data lake.

Machine Learning

Machine Learning Machine Learning AWS Data Lakes

Azure Machine Learning – Empowering Your Data Science Journey

How to Learn Machine Learning

MAY 2, 2025

Azure Machine Learning is Microsoft’s enterprise-grade service that provides a comprehensive environment for data scientists and ML engineers to build, train, deploy, and manage machine learning models at scale. You can explore its capabilities through the official Azure ML Studio documentation. Awesome, right?

Azure

Azure Machine Learning Machine Learning Data Science

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

If you are a returning user to SageMaker Studio, in order to ensure Salesforce Data Cloud is enabled, upgrade to the latest Jupyter and SageMaker Data Wrangler kernels. This completes the setup to enable data access from Salesforce Data Cloud to SageMaker Studio to build AI and machine learning (ML) models.

ML

ML ML AWS AI

What is Snowpark — and Why Does it Matter? A phData Perspective

phData

SEPTEMBER 20, 2023

Snowpark is the set of libraries and runtimes in Snowflake that securely deploy and process non-SQL code, including Python , Java, and Scala. As a declarative language, SQL is very powerful in allowing users from all backgrounds to ask questions about data. Why Does Snowpark Matter? Who Should use Snowpark?

SQL

SQL Python Data Lakes Machine Learning

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

DVC Released in 2017, Data Version Control ( DVC for short) is an open-source tool created by iterative. DVC can be used for versioning data and models, to track experiments and compare any data, code, parameters models and graphical plots of performance. DVC can efficiently handle large files and machine learning models.

Machine Learning

Machine Learning Machine Learning Data Lakes Database

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

Why companies need to accelerate data warehousing solution modernization

IBM Journey to AI blog

APRIL 24, 2023

By running reports on historical data, a data warehouse can clarify what systems and processes are working and what methods need improvement. Data warehouse is the base architecture for artificial intelligence and machine learning (AI/ML) solutions as well.

Data Warehouse

Data Warehouse Data Lakes Database Big Data

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. Data is frequently kept in data lakes that can be managed by AWS Lake Formation , giving you the ability to implement fine-grained access control using a straightforward grant or revoke procedure.

AWS

AWS Data Lakes Clustering Data Preparation

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Its goal is to help with a quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Apache Superset GitHub | Website Apache Superset is a must-try project for any ML engineer, data scientist, or data analyst.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Just as a writer needs to know core skills like sentence structure, grammar, and so on, data scientists at all levels should know core data science skills like programming, computer science, algorithms, and so on. While knowing Python, R, and SQL are expected, you’ll need to go beyond that.

Data Science

Data Science Data Scientist Computer Science Computer Science

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

AWS Machine Learning Blog

JUNE 13, 2023

The natural language capabilities allow non-technical users to query data through conversational English rather than complex SQL. The AI and language models must identify the appropriate data sources, generate effective SQL queries, and produce coherent responses with embedded results at scale.

Database

Database SQL AWS AI

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

These tools may have their own versioning system, which can be difficult to integrate with a broader data version control system. For instance, our data lake could contain a variety of relational and non-relational databases, files in different formats, and data stored using different cloud providers. DVC Git LFS neptune.ai

ML

ML ML Data Lakes Machine Learning

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

ODSC - Open Data Science

MARCH 30, 2023

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a GPU to a Container Using Azure ML to Train a Serengeti Data Model for Animal Identification In this article, we will cover how you can train a model using Notebooks in Azure Machine Learning Studio.

Azure

Azure ML ML Data Modeling

Will They Blend? Twitter Meets Azure – Sentiment Analysis via API

Dataversity

JULY 9, 2021

blog series, we experiment with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source DevOps on the cloud with protected internal legacy tools, SQL with NoSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT […].

Azure

Azure Data Lakes SQL ML

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

AWS Machine Learning Blog

APRIL 18, 2023

Amazon Monitron is an end-to-end condition monitoring solution that enables you to start monitoring equipment health with the aid of machine learning (ML) in minutes, so you can implement predictive maintenance and reduce unplanned downtime. For the detailed Amazon Monitron installation guide, refer to Getting started with Amazon Monitron.

AWS

AWS ML ML Database

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

MPII is using a machine learning (ML) bid optimization engine to inform upstream decision-making processes in power asset management and trading. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. Data comes from disparate sources in a number of formats.

AWS

AWS Machine Learning Machine Learning Analytics

Google launches Differential Privacy for BigQuery

Mlearning.ai

JUNE 19, 2023

How you now anonymize Data more easily Photo by Dušan veverkolog on Unsplash Google has just announced the public preview of BigQuery differential privacy with SQL building blocks. You can use these functions to anonymize their data. Hence, with this feature you can also ensure that data is shared there securely.

Data Lakes

Data Lakes Data Warehouse SQL ML

Now available in Tableau 2021.1—Einstein Discovery in Tableau, quick LODs, a new unified notification experience, and more

Tableau

FEBRUARY 17, 2021

Powered by machine learning (ML), Einstein Discovery provides predictions and recommendations within Tableau workflows for accelerated and smarter decision-making. Here’s how: Einstein dashboard extension : Enjoy on-demand, interpretable ML predictions from Einstein Discovery natively integrated in Tableau dashboards.

Tableau

Tableau Azure Data Quality ML

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

By leveraging data services and APIs, a data fabric can also pull together data from legacy systems, data lakes, data warehouses and SQL databases, providing a holistic view into business performance. Then, it applies these insights to automate and orchestrate the data lifecycle.

Data Lakes

Data Lakes AI AI Data Governance

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

AWS Machine Learning Blog

JUNE 25, 2024

In addition, the generative business intelligence (BI) capabilities of QuickSight allow you to ask questions about customer feedback using natural language, without the need to write SQL queries or learn a BI tool. For more information, see Customize models in Amazon Bedrock with your own data using fine-tuning and continued pre-training.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Data platform trinity: Competitive or complementary?

IBM Journey to AI blog

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. All phases of the data-information lifecycle.

Data Lakes

Data Lakes Data Warehouse Azure Apache Hadoop

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot. As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers to build and deploy models at scale.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

The Importance of Domain-Specific LLMs, Jobs in Prompt Engineering, and Our Data Primer Series

ODSC - Open Data Science

AUGUST 24, 2023

Start Learning AI With the ODSC West Data Primer Series In this six-part series as part of the ODSC West mini-bootcamp, you’ll learn everything you need to know to get started with AI, including SQL, machine learning, and even LLMs. But it doesn’t stop there.

Data Lakes

Data Lakes Data Science Machine Learning Machine Learning

Top Data Analytics Skills and Platforms for 2023

ODSC - Open Data Science

APRIL 3, 2023

Data analysts often must go out and find their data, process it, clean it, and get it ready for analysis. This pushes into Big Data as well, as many companies now have significant amounts of data and large data lakes that need analyzing. Cloud Services: Google Cloud Platform, AWS, Azure.

Analytics

Analytics Analytics Data Analyst Data Science

Pictures and Highlights from ODSC Europe 2023

ODSC - Open Data Science

JULY 22, 2023

We had bigger sessions on getting started with machine learning or SQL, up to advanced topics in NLP, and how to make deepfakes.

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Science

Azure Data Engineer Jobs

Pickl AI

APRIL 6, 2023

Having a solid understanding of ML principles and practical knowledge of statistics, algorithms, and mathematics. Data Warehousing concepts and knowledge should be strong. Having experience using at least one end-to-end Azure data lake project. Hands-on experience working with SQLDW and SQL-DB. What is Polybase?

Azure

Azure Data Engineer Data Engineering Data Engineering

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. IBM watsonx.ai With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise.

AI

AI AI Machine Learning Machine Learning

Botnet Detection at Scale?—?Lessons Learned From Clustering Billions of Web Attacks Into Botnets

ODSC - Open Data Science

APRIL 24, 2023

To cluster the data we have to calculate distances between IPs — The number of all possible IP pairs is very large, and we had to solve the scale problem. Data Processing and Clustering Our data is stored in a Data Lake and we used PrestoDB as a query engine. AS ip_1, r.ip AND l.ip < r.ip

Clustering

Clustering SQL Algorithm Data Science

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Webinars

Trending Sources

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Webinars

Generate financial industry-specific insights using generative AI and in-context fine-tuning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Generating value from enterprise data: Best practices for Text2SQL and generative AI

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

Shaping the future: OMRON’s data-driven journey with AWS

How Rocket Companies modernized their data science solution on AWS

Cloud Data Science News Beta #1

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

Azure Machine Learning – Empowering Your Data Science Journey

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

What is Snowpark — and Why Does it Matter? A phData Perspective

Best 8 Data Version Control Tools for Machine Learning 2024

MLOps Landscape in 2023: Top Tools and Platforms

Why companies need to accelerate data warehousing solution modernization

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

11 Open Source Data Exploration Tools You Need to Know in 2023

40 Must-Know Data Science Skills and Frameworks for 2023

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

How to Version Control Data in ML for Various Data Sources

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

Will They Blend? Twitter Meets Azure – Sentiment Analysis via API

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Google launches Differential Privacy for BigQuery

Now available in Tableau 2021.1—Einstein Discovery in Tableau, quick LODs, a new unified notification experience, and more

Data democratization: How data architecture can drive business decisions and AI initiatives

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

Data platform trinity: Competitive or complementary?

Definite Guide to Building a Machine Learning Platform

The Importance of Domain-Specific LLMs, Jobs in Prompt Engineering, and Our Data Primer Series

Top Data Analytics Skills and Platforms for 2023

Pictures and Highlights from ODSC Europe 2023

Azure Data Engineer Jobs

Exploring the AI and data capabilities of watsonx

Botnet Detection at Scale?—?Lessons Learned From Clustering Billions of Web Attacks Into Botnets

Stay Connected