AWS, Data Lakes and Document - Data Science Current

How AWS sales uses Amazon Q Business for customer engagement

AWS Machine Learning Blog

DECEMBER 11, 2024

Earlier this year, we published the first in a series of posts about how AWS is transforming our seller and customer journeys using generative AI. Field Advisor serves four primary use cases: AWS-specific knowledge search With Amazon Q Business, weve made internal data sources as well as public AWS content available in Field Advisors index.

AWS

AWS Database AI AI

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP.

AWS

AWS Data Governance Data Silos SQL

Precise Software Solutions implements ML as a service on AWS to save time and money for federal agency

Flipboard

JANUARY 6, 2025

Precise), an Amazon Web Services (AWS) Partner , participated in the AWS Think Big for Small Business Program (TBSB) to expand their AWS capabilities and to grow their business in the public sector. The platform helped the agency digitize and process forms, pictures, and other documents. Precise Software Solutions, Inc.

AWS

AWS ML ML Machine Learning

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

Text, images, audio, and videos are common examples of unstructured data. Most companies produce and consume unstructured data such as documents, emails, web pages, engagement center phone calls, and social media. Additionally, we show how to use AWS AI/ML services for analyzing unstructured data.

AWS

AWS ML ML Analytics

Integrate foundation models into your code with Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 6, 2024

Prerequisites Before you dive into the integration process, make sure you have the following prerequisites in place: AWS account – You’ll need an AWS account to access and use Amazon Bedrock. You can interact with Amazon Bedrock using AWS SDKs available in Python, Java, Node.js, and more.

AWS

AWS Python Machine Learning Machine Learning

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Lets assume that the question What date will AWS re:invent 2024 occur? The corresponding answer is also input as AWS re:Invent 2024 takes place on December 26, 2024. If the question was Whats the schedule for AWS events in December?, This setup uses the AWS SDK for Python (Boto3) to interact with AWS services.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

This archive, along with 765,933 varied-quality inspection photographs, some over 15 years old, presented a significant data processing challenge. Processing these images and scanned documents is not a cost- or time-efficient task for humans, and requires highly performant infrastructure that can reduce the time to value.

AWS

AWS Data Lakes ML ML

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 26, 2024

At AWS, we are transforming our seller and customer journeys by using generative artificial intelligence (AI) across the sales lifecycle. It will be able to answer questions, generate content, and facilitate bidirectional interactions, all while continuously using internal AWS and external data to deliver timely, personalized insights.

AWS

AWS AI AI K-nearest Neighbors

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Flipboard

MARCH 7, 2023

This post presents a solution that uses a workflow and AWS AI and machine learning (ML) services to provide actionable insights based on those transcripts. We use multiple AWS AI/ML services, such as Contact Lens for Amazon Connect and Amazon SageMaker , and utilize a combined architecture. im', 0.08224299065420558), ('jun 23.

ML

ML ML AWS AI

Generative AI operating models in enterprise organizations with Amazon Bedrock

AWS Machine Learning Blog

JANUARY 29, 2025

Intelligent document processing , translation and summarization, flexible and insightful responses for customer support agents, personalized marketing content, and image and code generation are a few use cases using generative AI that organizations are rolling out in production.

AWS

AWS AI AI Database

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Flipboard

NOVEMBER 17, 2023

Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content. As always, AWS welcomes feedback. Before testing, choose the gear icon.

K-nearest Neighbors

K-nearest Neighbors AWS Clustering Database

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. It will enable you to quickly transform and load the data results into Amazon S3 data lakes or JDBC data stores.

Apache Kafka

Apache Kafka ETL Data Lakes AWS

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

Simplify continuous learning of Amazon Comprehend custom models using Comprehend flywheel

AWS Machine Learning Blog

MARCH 1, 2023

Amazon Comprehend is a managed AI service that uses natural language processing (NLP) with ready-made intelligence to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document.

Data Lakes

Data Lakes AWS ML ML

Introducing the Amazon Comprehend flywheel for MLOps

AWS Machine Learning Blog

MARCH 1, 2023

Solution overview Amazon Comprehend is a fully managed service that uses natural language processing (NLP) to extract insights about the content of documents. This feature also allows you to automate model retraining after new datasets are ingested and available in the flywheel´s data lake.

Data Lakes

Data Lakes AWS ML ML

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

AWS Machine Learning Blog

MAY 30, 2024

To serve their customers, Vitech maintains a repository of information that includes product documentation (user guides, standard operating procedures, runbooks), which is currently scattered across multiple internal platforms (for example, Confluence sites and SharePoint folders).

AI

AI AI AWS Database

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

AWS Machine Learning Blog

NOVEMBER 22, 2023

The IDP Well-Architected Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build secure, efficient, and reliable IDP solutions on AWS. This post focuses on the Operational Excellence pillar of the IDP solution.

AWS

AWS ML ML Machine Learning

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. In this post, you will learn how Marubeni is optimizing market decisions by using the broad set of AWS analytics and ML services, to build a robust and cost-effective Power Bid Optimization solution.

AWS

AWS Machine Learning Machine Learning Analytics

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

AWS Machine Learning Blog

AUGUST 2, 2024

The Product Stewardship department is responsible for managing a large collection of regulatory compliance documents. Example questions might be “What are the restrictions for CMR substances?”, “How long do I need to keep the documents related to a toluene sale?”, or “What is the reach characterization ratio and how do I calculate it?”

AWS

AWS Machine Learning Machine Learning Database

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the Amazon Web Services (AWS) tools without having to manage infrastructure. Each embedding aims to capture the semantic or contextual meaning of the data.

AWS

AWS Machine Learning Machine Learning Database

Cloud Data Science News – Beta 6

Data Science 101

DECEMBER 16, 2019

It now also supports PDF documents. Azure Data Factory Preserves Metadata during File Copy When performing a File copy between Amazon S3, Azure Blob, and Azure Data Lake Gen 2, the metadata will be copied as well. Azure Tips and Tricks: Make your data Searchable A quick video to demonstrate Azure Search.

Cloud Data

Cloud Data Data Science Azure Natural Language Processing

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Text analytics: Text analytics, also known as text mining, deals with unstructured text data, such as customer reviews, social media comments, or documents. It uses natural language processing (NLP) techniques to extract valuable insights from textual data. Poor data integration can lead to inaccurate insights.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Third, despite the larger adoption of centralized analytics solutions like data lakes and warehouses, complexity rises with different table names and other metadata that is required to create the SQL for the desired sources. Our solution aims to address those challenges using Amazon Bedrock and AWS Analytics Services.

SQL

SQL AWS Database ML

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

AWS Machine Learning Blog

JUNE 25, 2024

When the automated content processing steps are complete, you can use the output for downstream tasks, such as to invoke different components in a customer service backend application, or to insert the generated tags into metadata of each document for product recommendation. The Step Functions workflow starts.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Build generative AI–powered Salesforce applications with Amazon Bedrock

AWS Machine Learning Blog

JULY 29, 2024

In Part 3 , we demonstrate how business analysts and citizen data scientists can create machine learning (ML) models, without code, in Amazon SageMaker Canvas and deploy trained models for integration with Salesforce Einstein Studio to create powerful business applications. For this post, we use the Anthropic Claude 3 Sonnet model.

AWS

AWS AI AI ML

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

In this post, we will explore the potential of using MongoDB’s time series data and SageMaker Canvas as a comprehensive solution. MongoDB Atlas MongoDB Atlas is a fully managed developer data platform that simplifies the deployment and scaling of MongoDB databases in the cloud. Note we have two folders.

Clustering

Clustering AWS Database ML

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena. The question is sent through a retrieval-augmented generation (RAG) process, which finds similar documents.

SQL

SQL Database AWS Machine Learning

Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra

AWS Machine Learning Blog

AUGUST 2, 2023

Amazon Kendra supports a variety of document formats , such as Microsoft Word, PDF, and text from various data sources. In this post, we focus on extending the document support in Amazon Kendra to make images searchable by their displayed content. Images can often be searched using supplemented metadata such as keywords.

AWS

AWS AI AI Machine Learning

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

AI

AI AI ML ML

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

Data scientists Perform data analysis, model development, model evaluation, and registering the models in a model registry. Governance officer Review the models performance including documentation, accuracy, bias and access, and provide final approval for models to be deployed.

ML

ML ML Data Scientist AWS

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

These encoder-only architecture models are fast and effective for many enterprise NLP tasks, such as classifying customer feedback and extracting information from large documents. While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases.

AI

AI AI Machine Learning Machine Learning

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

To combine the collected data, you can integrate different data producers into a data lake as a repository. A central repository for unstructured data is beneficial for tasks like analytics and data virtualization. Data Cleaning The next step is to clean the data after ingesting it into the data lake.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Informatica’s AI-powered automation helps streamline data pipelines and improve operational efficiency. Common use cases include integrating data across hybrid cloud environments, managing data lakes, and enabling real-time analytics for Business Intelligence platforms.

Data Quality

Data Quality AWS Machine Learning Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. User support arrangements Consider the availability and quality of support from the provider or vendor, including documentation, tutorials, forums, customer service, etc.

Machine Learning

Machine Learning Machine Learning ML ML

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

This includes operations like data validation, data cleansing, data aggregation, and data normalization. The goal is to ensure that the data is consistent and ready for analysis. Loading : Storing the transformed data in a target system like a data warehouse, data lake, or even a database.

ETL

ETL Data Warehouse AWS Business Intelligence

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Oracle – The Oracle connector, a database-type connector, enables real-time data transfer of large volumes of data from on-premises or cloud sources to the destination of choice, such as a cloud data lake or data warehouse. File – Fivetran offers several options to sync files to your destination.

SQL

SQL Data Warehouse Azure Cloud Data

How to Build a Full MLOps Solution For Computer Vision Using OSS

DagsHub

MARCH 21, 2024

It is suitable for a wide range of use cases, such as data lake storage, backup and recovery, and content delivery. Object Storage: In MinIO, data is organized as objects, where each object typically represents a file or a piece of data. Objects can be of any type, such as images, videos, documents, etc.

Machine Learning

Machine Learning Machine Learning AWS Data Visualization

How to Effectively Version Control Your Machine Learning Pipeline

phData

AUGUST 20, 2024

Implementing proper version control in ML pipelines is essential for efficient management of code, data, and models by ensuring reproducibility and collaboration. Reproducibility ensures that experiments can be reliably reproduced by tracking changes in code, data, and model hyperparameters. Pachyderm : Data driven pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

Alation 2022.1: Customize Your Data Catalog

Alation

MARCH 1, 2022

Airline Reporting Corporation (ARC) sells data products to travel agencies and airlines. Lineage helps them identify the source of bad data to fix the problem fast. Manual lineage will give ARC a fuller picture of how data was created between AWS S3 data lake, Snowflake cloud data warehouse and Tableau (and how it can be fixed).

Data Warehouse

Data Warehouse Data Lakes Cloud Data Database

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Data Collector also offers replication and Change Data Capture (CDC) to be able to accurately and efficiently get your data into Snowflake. Data Collector can use Snowflake’s native Snowpipe in its pipelines. For those looking to migrate to Snowflake who prefer using AWS services, DMS is a great solution.

Data Warehouse

Data Warehouse Azure AWS Database

How to Effectively Handle Unstructured Data Using AI

DagsHub

NOVEMBER 11, 2024

So, we must understand the different unstructured data types and effectively process them to uncover hidden patterns. Textual Data Textual data is one of the most common forms of unstructured data and can be in the format of documents, social media posts, emails, web pages, customer reviews, or conversation logs.

AI

AI AI Data Lakes Database

External & Directory Tables in Snowflake 101

phData

JULY 10, 2023

Why External Tables are Important Data Ingestion: External tables allow you to easily load data into Snowflake from various external data sources without the need to first stage the data within Snowflake. Data Integration: Snowflake supports seamless integration with other data processing systems and data lakes.

Data Lakes

Data Lakes Azure Database AWS

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

If your organization runs its workloads on AWS , it might be worth it to leverage AWS SageMaker. Solution Data lakes and warehouses are the two key components of any data pipeline. The data lake is a platform where any kind or amount of data can be stored, processed, and analyzed.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

The Ultimate Guide to Data Preparation for Machine Learning

DagsHub

FEBRUARY 29, 2024

Only once you form a clear definition and understanding of the business problem , goals, and the necessity of machine learning should you move forward to the next stage of data preparation. In large ML organizations, there is typically a dedicated team for all the above aspects of data preparation.

Data Preparation

Data Preparation Machine Learning Machine Learning Data Governance

How AWS sales uses Amazon Q Business for customer engagement

Shaping the future: OMRON’s data-driven journey with AWS

Webinars

Trending Sources

Precise Software Solutions implements ML as a service on AWS to save time and money for federal agency

Webinars

Unstructured data management and governance using AWS AI/ML and analytics services

Integrate foundation models into your code with Amazon Bedrock

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Generative AI operating models in enterprise organizations with Amazon Bedrock

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Simplify continuous learning of Amazon Comprehend custom models using Comprehend flywheel

Introducing the Amazon Comprehend flywheel for MLOps

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Cloud Data Science News – Beta 6

Beyond data: Cloud analytics mastery for business brilliance

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

Build generative AI–powered Salesforce applications with Amazon Bedrock

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

Imperva optimizes SQL generation from natural language using Amazon Bedrock

Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

Exploring the AI and data capabilities of watsonx

How to Manage Unstructured Data in AI and Machine Learning Projects

Popular Data Transformation Tools: Importance and Best Practices

MLOps Landscape in 2023: Top Tools and Platforms

List of ETL Tools: Explore the Top ETL Tools for 2025

Top 5 Fivetran Connectors for Healthcare

How to Build a Full MLOps Solution For Computer Vision Using OSS

How to Effectively Version Control Your Machine Learning Pipeline

Alation 2022.1: Customize Your Data Catalog

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

How to Effectively Handle Unstructured Data Using AI

External & Directory Tables in Snowflake 101

Definite Guide to Building a Machine Learning Platform

The Ultimate Guide to Data Preparation for Machine Learning

Stay Connected