Artificial Intelligence, Data Lakes and Database

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

Best Practices for Data Lake Security

ODSC - Open Data Science

JUNE 22, 2023

While databases were the traditional way to store large amounts of data, a new storage method has developed that can store even more significant and varied amounts of data. These are called data lakes. What Are Data Lakes? However, even digital information has to be stored somewhere.

Data Lakes

Data Lakes Data Warehouse Database Data Science

Data Version Control for Data Lakes: Handling the Changes in Large Scale

ODSC - Open Data Science

SEPTEMBER 27, 2023

In the ever-evolving world of big data, managing vast amounts of information efficiently has become a critical challenge for businesses across the globe. As data lakes gain prominence as a preferred solution for storing and processing enormous datasets, the need for effective data version control mechanisms becomes increasingly evident.

Data Lakes

Data Lakes Data Warehouse Database Big Data

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

Artificial Intelligence (AI) is all the rage, and rightly so. This is of course an over-simplification of the data warehousing journey, but as data warehousing has moved to the cloud and business intelligence has evolved into powerful analytics and visualization platforms the foundational best practices shared here still apply today.

Data Warehouse

Data Warehouse Hadoop Data Governance Data Lakes

Sneak peek at Microsoft Fabric price and its promising features

Dataconomy

JUNE 1, 2023

Unified data storage : Fabric’s centralized data lake, Microsoft OneLake, eliminates data silos and provides a unified storage system, simplifying data access and retrieval. OneLake is designed to store a single copy of data in a unified location, leveraging the open-source Apache Parquet format.

Power BI

Power BI Data Lakes Azure Data Silos

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

AWS

AWS Database ETL AI

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Flipboard

NOVEMBER 17, 2023

Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content. Set up the database access and network access.

K-nearest Neighbors

K-nearest Neighbors AWS Clustering Database

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Data management problems can also lead to data silos; disparate collections of databases that don’t communicate with each other, leading to flawed analysis based on incomplete or incorrect datasets. The data lake can then refine, enrich, index, and analyze that data. and various countries in Europe.

Data Lakes

Data Lakes Clustering Big Data Big Data

Building an Effective OSS Management Layer for Your Data Lake

ODSC - Open Data Science

OCTOBER 13, 2024

Be sure to check out her talk, “ Don’t Go Over the Deep End: Building an Effective OSS Management Layer for Your Data Lake ,” there! Managing a data lake can often feel like being lost at sea — especially when dealing with both structured and unstructured data.

Data Lakes

Data Lakes Database Data Pipeline SQL

Why companies need to accelerate data warehousing solution modernization

IBM Journey to AI blog

APRIL 24, 2023

Data warehouse is the base architecture for artificial intelligence and machine learning (AI/ML) solutions as well. Benefits of new data warehousing technology Everything is data, regardless of whether it’s structured, semi-structured, or unstructured.

Data Warehouse

Data Warehouse Data Lakes Database Big Data

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

Simplifying Time Series Analysis for Data Scientists

ODSC - Open Data Science

SEPTEMBER 12, 2023

Be sure to check out his talk, “ What is a Time-series Database and Why do I Need One? Most data scientists are familiar with the concept of time series data and work with it often. The time series database (TSDB) , however, is still an underutilized tool in the data science community. at ODSC West 2023.

Data Scientist

Data Scientist Database Data Lakes Data Science

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. Artificial intelligence (AI). Well, let’s find out. Cost-effective.

AWS

AWS Cloud Computing Data Lakes Database

Exploring Open-Source Innovations: 13 Companies Offering Cutting-Edge Solutions

ODSC - Open Data Science

MARCH 21, 2025

Their contributions to AI and data science communities make it easier to integrate cutting-edge analytics into business strategies. Google CloudOpen-Source Database Solutions Google Cloud offers an array of open-source database solutions, from MySQL and PostgreSQL to Spanner.

Data Scientist

Data Scientist Data Visualization Data Science Data Lakes

Open Data Lakes, Safeguarding Images From AI, Free Data Viz Tools, and 50% Off ODSC East

ODSC - Open Data Science

FEBRUARY 15, 2024

The Future of the Single Source of Truth is an Open Data Lake Organizations that strive for high-performance data systems are increasingly turning towards the ELT (Extract, Load, Transform) model using an open data lake. To DIY you need to: host an API, build a UI, and run or rent a database.

Data Lakes

Data Lakes Data Visualization Machine Learning Machine Learning

5 Fast-Growing Data Management Trends in 2023

ODSC - Open Data Science

MAY 16, 2023

Comprehensive data privacy laws in at least four states are going into effect this year, and more than a dozen states have similar legislation in the works. Database management may become increasingly complex as organizations must account for more of these laws.

Database

Database Data Science Data Lakes Data Observability

How AWS sales uses Amazon Q Business for customer engagement

AWS Machine Learning Blog

DECEMBER 11, 2024

By moving our core infrastructure to Amazon Q, we no longer needed to choose a large language model (LLM) and optimize our use of it, manage Amazon Bedrock agents, a vector database and semantic search implementation, or custom pipelines for data ingestion and management.

AWS

AWS Database AI AI

Data Engineering for IoT Applications: Unleashing the Power of the Internet of Things

Data Science Connect

JULY 28, 2023

Data Collection and Integration Data engineers are responsible for designing robust data collection systems that gather information from various IoT devices and sensors. This data is then integrated into centralized databases for further processing and analysis.

Internet of Things

Internet of Things Data Engineer Data Engineering Data Engineering

5 Best Practices for Extracting, Analyzing, and Visualizing Data

Smart Data Collective

DECEMBER 13, 2022

There are several choices to consider, each with its own set of advantages and disadvantages: Data warehouses are used to store data that has been processed for a specific function from one or more sources. Data lakes hold raw data that has not yet been altered to meet a specific purpose.

Data Analysis

Data Analysis Data Analysis Analytics Analytics

Data fabric’s value to the enterprise

Tableau

MAY 11, 2022

Data fabrics are gaining momentum as the data management design for today’s challenging data ecosystems. At their most basic level, data fabrics leverage artificial intelligence and machine learning to unify and securely manage disparate data sources without migrating them to a centralized location.

Tableau

Tableau Data Warehouse Database Data Analyst

Data fabric’s value to the enterprise

Tableau

MAY 11, 2022

Data fabrics are gaining momentum as the data management design for today’s challenging data ecosystems. At their most basic level, data fabrics leverage artificial intelligence and machine learning to unify and securely manage disparate data sources without migrating them to a centralized location.

Tableau

Tableau Data Warehouse Database Data Analyst

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena. The problem Making data accessible to users through applications has always been a challenge.

SQL

SQL Database AWS Machine Learning

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Flipboard

DECEMBER 6, 2023

During the embeddings experiment, the dataset was converted into embeddings, stored in a vector database, and then matched with the embeddings of the question to extract context. The generated query is then run against the database to fetch the relevant context. Based on the initial tests, this method showed great results.

SQL

SQL Database AWS Machine Learning

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. The sample dataset Upload the dataset to Amazon S3 and crawl the data to create an AWS Glue database and tables.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

Why it’s challenging to process and manage unstructured data Unstructured data makes up a large proportion of the data in the enterprise that can’t be stored in a traditional relational database management systems (RDBMS). These services write the output to a data lake.

AWS

AWS ML ML Analytics

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. The fundamental objective is to build a manufacturer-agnostic database, leveraging generative AI’s ability to standardize sensor outputs, synchronize data, and facilitate precise corrections.

AWS

AWS AI AI Python

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

AWS Machine Learning Blog

JUNE 13, 2023

The combination of large language models (LLMs), including the ease of integration that Amazon Bedrock offers, and a scalable, domain-oriented data infrastructure positions this as an intelligent method of tapping into the abundant information held in various analytics databases and data lakes.

Database

Database SQL AWS AI

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

AWS Machine Learning Blog

MAY 30, 2024

Additionally, VitechIQ includes metadata from the vector database (for example, document URLs) in the model’s output, providing users with source attribution and enhancing trust in the generated answers. These vector embeddings are stored in an Aurora PostgreSQL database. The following diagram shows the solution architecture.

AI

AI AI AWS Database

Generative AI operating models in enterprise organizations with Amazon Bedrock

AWS Machine Learning Blog

JANUARY 29, 2025

However, even in a decentralized model, often LOBs must align with central governance controls and obtain approvals from the CCoE team for production deployment, adhering to global enterprise standards for areas such as access policies, model risk management, data privacy, and compliance posture, which can introduce governance complexities.

AWS

AWS AI AI Database

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

How OLAP and AI can enable better business

IBM Journey to AI blog

DECEMBER 7, 2023

Online analytical processing (OLAP) database systems and artificial intelligence (AI) complement each other and can help enhance data analysis and decision-making when used in tandem. Defining OLAP today OLAP database systems have significantly evolved since their inception in the early 1990s.

Data Preparation

Data Preparation Database Data Analysis Data Analysis

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

There are many well-known libraries and platforms for data analysis such as Pandas and Tableau, in addition to analytical databases like ClickHouse, MariaDB, Apache Druid, Apache Pinot, Google BigQuery, Amazon RedShift, etc. With Great Expectations , data teams can express what they “expect” from their data using simple assertions.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

In this post, we will explore the potential of using MongoDB’s time series data and SageMaker Canvas as a comprehensive solution. MongoDB Atlas MongoDB Atlas is a fully managed developer data platform that simplifies the deployment and scaling of MongoDB databases in the cloud. Setup the Database access and Network access.

Clustering

Clustering AWS Database ML

Data platform trinity: Competitive or complementary?

IBM Journey to AI blog

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. A data fabric is comprised of a network of data nodes (e.g.,

Data Lakes

Data Lakes Data Warehouse Azure Apache Hadoop

5 Recent Data Science and AI Webinars You Need to See

ODSC - Open Data Science

MARCH 23, 2023

Real-time Analytics & Built-in Machine Learning Models with a Single Database Akmal Chaudhri, Senior Technical Evangelist at SingleStore, explores the importance of delivering real-time experiences in today’s big data industry and how data models and algorithms rely on powerful and versatile data infrastructure.

Data Science

Data Science Data Lakes Machine Learning Machine Learning

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Overview: Data science vs data analytics Think of data science as the overarching umbrella that covers a wide range of tasks performed to find patterns in large datasets, structure data for use, train machine learning models and develop artificial intelligence (AI) applications.

Data Science

Data Science Analytics Analytics Data Scientist

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

AWS Machine Learning Blog

JUNE 21, 2024

To accomplish this, eSentire built AI Investigator, a natural language query tool for their customers to access security platform data by using AWS generative artificial intelligence (AI) capabilities. eSentire has over 2 TB of signal data stored in their Amazon Simple Storage Service (Amazon S3) data lake.

AWS

AWS AI AI Natural Language Processing

A Comprehensive Guide to the main components of Big Data

Pickl AI

DECEMBER 2, 2024

As organisations grapple with this vast amount of information, understanding the main components of Big Data becomes essential for leveraging its potential effectively. Key Takeaways Big Data originates from diverse sources, including IoT and social media. Data lakes and cloud storage provide scalable solutions for large datasets.

Big Data

Big Data Big Data Data Lakes Apache Hadoop

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. The solution in this post aims to bring enterprise analytics operations to the next level by shortening the path to your data using natural language. This table is used for finding the correct table, database, and attributes.

SQL

SQL AWS Database ML

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

This means that individuals can ask companies to erase their personal data from their systems and from the systems of any third parties with whom the data was shared. For Vector database , choose Quick create a new vector store – Recommended to set up an OpenSearch Serverless vector store on your behalf. Choose Next.

AWS

AWS Machine Learning Machine Learning Database

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

JuMa is tightly integrated with a range of BMW Central IT services, including identity and access management, roles and rights management, BMW Cloud Data Hub (BMW’s data lake on AWS) and on-premises databases.

ML

ML ML AWS AI

What is Data Ingestion? Understanding the Basics

Pickl AI

JULY 25, 2024

It’s the critical process of capturing, transforming, and loading data into a centralised repository where it can be processed, analysed, and leveraged. Data Ingestion Meaning At its core, It refers to the act of absorbing data from multiple sources and transporting it to a destination, such as a database, data warehouse, or data lake.

Apache Kafka

Apache Kafka Data Lakes Data Warehouse Data Quality

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

AWS Machine Learning Blog

APRIL 18, 2023

With the recently launched Amazon Monitron Kinesis data export v2 feature , your OT team can stream incoming measurement data and inference results from Amazon Monitron via Amazon Kinesis to AWS Simple Storage Service (Amazon S3) to build an Internet of Things (IoT) data lake. Choose Next.

AWS

AWS ML ML Database

How the Masters uses watsonx to manage its AI lifecycle

IBM Journey to AI blog

APRIL 9, 2024

This allows the Masters to scale analytics and AI wherever their data resides, through open formats and integration with existing databases and tools. “Hole distances and pin positions vary from round to round and year to year; these factors are important as we stage the data.”

AI

AI AI Machine Learning Machine Learning

Streaming Machine Learning Without a Data Lake

Best Practices for Data Lake Security

Webinars

Trending Sources

Data Version Control for Data Lakes: Handling the Changes in Large Scale

Webinars

Data Integrity for AI: What’s Old is New Again

Sneak peek at Microsoft Fabric price and its promising features

Tackling AI’s data challenges with IBM databases on AWS

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Drowning in Data? A Data Lake May Be Your Lifesaver

Building an Effective OSS Management Layer for Your Data Lake

Why companies need to accelerate data warehousing solution modernization

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Simplifying Time Series Analysis for Data Scientists

10 Things AWS Can Do for Your SaaS Company

Exploring Open-Source Innovations: 13 Companies Offering Cutting-Edge Solutions

Open Data Lakes, Safeguarding Images From AI, Free Data Viz Tools, and 50% Off ODSC East

5 Fast-Growing Data Management Trends in 2023

How AWS sales uses Amazon Q Business for customer engagement

Data Engineering for IoT Applications: Unleashing the Power of the Internet of Things

5 Best Practices for Extracting, Analyzing, and Visualizing Data

Data fabric’s value to the enterprise

Data fabric’s value to the enterprise

Imperva optimizes SQL generation from natural language using Amazon Bedrock

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Unstructured data management and governance using AWS AI/ML and analytics services

Improving air quality with generative AI

Reinventing the data experience: Use generative AI and modern data architecture to unlock insights

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

Generative AI operating models in enterprise organizations with Amazon Bedrock

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

How OLAP and AI can enable better business

11 Open Source Data Exploration Tools You Need to Know in 2023

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

Data platform trinity: Competitive or complementary?

5 Recent Data Science and AI Webinars You Need to See

Data science vs data analytics: Unpacking the differences

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

A Comprehensive Guide to the main components of Big Data

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

What is Data Ingestion? Understanding the Basics

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

How the Masters uses watsonx to manage its AI lifecycle

Stay Connected