AWS, Data Lakes and Events - Data Science Current

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP.

AWS

AWS Data Governance Data Silos SQL

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

The Hadoop environment was hosted on Amazon Elastic Compute Cloud (Amazon EC2) servers, managed in-house by Rockets technology team, while the data science experience infrastructure was hosted on premises. Communication between the two systems was established through Kerberized Apache Livy (HTTPS) connections over AWS PrivateLink.

Data Science

Data Science AWS Hadoop Data Scientist

Integrate foundation models into your code with Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 6, 2024

Prerequisites Before you dive into the integration process, make sure you have the following prerequisites in place: AWS account – You’ll need an AWS account to access and use Amazon Bedrock. You can interact with Amazon Bedrock using AWS SDKs available in Python, Java, Node.js, and more.

AWS

AWS Python Machine Learning Machine Learning

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Lets assume that the question What date will AWS re:invent 2024 occur? The corresponding answer is also input as AWS re:Invent 2024 takes place on December 26, 2024. If the question was Whats the schedule for AWS events in December?, This setup uses the AWS SDK for Python (Boto3) to interact with AWS services.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Yes, the AWS re:Invent season is upon us and as always, the place to be is Las Vegas! And although generative AI has appeared in previous events, this year we’re taking it to the next level. And although generative AI has appeared in previous events, this year we’re taking it to the next level.

AWS

AWS ML ML AI

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Recent events including Tropical Cyclone Gabrielle have highlighted the susceptibility of the grid to extreme weather and emphasized the need for climate adaptation with resilient infrastructure. About the authors Scott Patterson is a Senior Solutions Architect at AWS. The sunburst graph below is a visualization of this classification.

AWS

AWS Data Lakes ML ML

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 26, 2024

At AWS, we are transforming our seller and customer journeys by using generative artificial intelligence (AI) across the sales lifecycle. It will be able to answer questions, generate content, and facilitate bidirectional interactions, all while continuously using internal AWS and external data to deliver timely, personalized insights.

AWS

AWS AI AI K-nearest Neighbors

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. However, implementing security, data privacy, and governance controls are still key challenges faced by customers when implementing ML workloads at scale.

ML

ML ML AWS Data Lakes

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Alation

FEBRUARY 20, 2020

For many enterprises, a hybrid cloud data lake is no longer a trend, but becoming reality. Due to these needs, hybrid cloud data lakes emerged as a logical middle ground between the two consumption models. earthquake, flood, or fire), where the data collected does not need to be as tightly controlled.

Data Lakes

Data Lakes Cloud Data AWS Tableau

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. In this post, you will learn how Marubeni is optimizing market decisions by using the broad set of AWS analytics and ML services, to build a robust and cost-effective Power Bid Optimization solution.

AWS

AWS Machine Learning Machine Learning Analytics

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. It will enable you to quickly transform and load the data results into Amazon S3 data lakes or JDBC data stores.

Apache Kafka

Apache Kafka ETL Data Lakes AWS

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

Whether logs are coming from Amazon Web Services (AWS), other cloud providers, on-premises, or edge devices, customers need to centralize and standardize security data. Solution overview Figure 1 – Solution Architecture Enable Amazon Security Lake with AWS Organizations for AWS accounts, AWS Regions, and external IT environments.

AWS

AWS ML ML Algorithm

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

AWS Machine Learning Blog

NOVEMBER 22, 2023

The IDP Well-Architected Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build secure, efficient, and reliable IDP solutions on AWS. Set up regular game days to test workload and team responses to simulated events.

AWS

AWS ML ML Machine Learning

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts. With this launch, account owners can grant access to select feature groups by other accounts using AWS Resource Access Manager (AWS RAM). This provides an audit trail required for governance and compliance.

AWS

AWS ML ML Machine Learning

AWS re:Invent Recap: The Future of Cloud

Alation

DECEMBER 14, 2021

Alation recently attended AWS re:invent 2021 … in person! AWS Keynote: “Still Early Days” for Cloud. Adam Selipsky, CEO of AWS, brought this energy in his opening keynote, welcoming a packed room and looking back on the progress of AWS. Re:Invent 2021 Keynote by AWS CEO Adam Selipsky. AWS’ Top Cloud Challenges.

AWS

AWS Data Lakes Data Warehouse Machine Learning

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the Amazon Web Services (AWS) tools without having to manage infrastructure.

AWS

AWS Machine Learning Machine Learning Database

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

AWS Machine Learning Blog

SEPTEMBER 11, 2024

In this post, we demonstrate how to build a robust real-time anomaly detection solution for streaming time series data using Amazon Managed Service for Apache Flink and other AWS managed services. It offers an AWS CloudFormation template for straightforward deployment in an AWS account.

AWS

AWS ML ML Apache Kafka

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Diagnostic analytics: Diagnostic analytics goes a step further by analyzing historical data to determine why certain events occurred. By understanding the “why” behind past events, organizations can make informed decisions to prevent or replicate them. Ensure that data is clean, consistent, and up-to-date.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

Traditional maintenance activities rely on a sizable workforce distributed across key locations along the BHS dispatched by operators in the event of an operational fault. With this service, industrial sensors, smart meters, and OPC UA servers can be connected to an AWS data lake with just a few clicks.

AWS

AWS ML ML Machine Learning

Insiders Cite The Wondrous Benefits Of Big Data In Fortnite

Smart Data Collective

AUGUST 9, 2019

Even Forbes Tech Council has written about the benefits of data lakes in Fortnite. The game’s parent company, Epic Games, processes millions of events each minute, and its mountain of data grows steadily. Processing and analyzing this data — petabytes worth — must happen somewhere.

Big Data

Big Data Big Data Data Lakes Machine Learning

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

AWS Machine Learning Blog

APRIL 18, 2023

It includes sensor devices to capture vibration and temperature data, a gateway device to securely transfer data to the AWS Cloud, the Amazon Monitron service that analyzes the data for anomalies with ML, and a companion mobile app to track potential failures in your machinery.

AWS

AWS ML ML Database

Promote pipelines in a multi-environment setup using Amazon SageMaker Model Registry, HashiCorp Terraform, GitHub, and Jenkins CI/CD

AWS Machine Learning Blog

NOVEMBER 9, 2023

Central model registry – Amazon SageMaker Model Registry is set up in a separate AWS account to track model versions generated across the dev and prod environments. with administrative privileges installed on AWS Terraform version 1.5.5 After the key is provisioned, it should be visible on the AWS KMS console.

AWS

AWS ML ML Machine Learning

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Imperva Cloud WAF protects hundreds of thousands of websites against cyber threats and blocks billions of security events every day. Counters and insights based on security events are calculated daily and used by users from multiple departments. The data is stored in a data lake and retrieved by SQL using Amazon Athena.

SQL

SQL Database AWS Machine Learning

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

sales-train-data is used to store data extracted from MongoDB Atlas, while sales-forecast-output contains predictions from Canvas. The triggers need to be scheduled to write the data to S3 at a period frequency based on the business need for training the models. Note we have two folders.

Clustering

Clustering AWS Database ML

How Thomson Reuters built an AI platform using Amazon SageMaker to accelerate delivery of ML projects

AWS Machine Learning Blog

JANUARY 13, 2023

To fulfill these requirements, TR built the Enterprise AI platform around the following five pillars: a data service, experimentation workspace, central model registry, model deployment service, and model monitoring. Amazon Simple Storage Service (Amazon S3) object storage acts as a content data lake.

ML

ML ML AWS Data Scientist

Use weather data to improve forecasts with Amazon SageMaker Canvas

AWS Machine Learning Blog

JUNE 12, 2024

Examples include seasonality, marketing promotions, pricing, and in-stock availability for retail sales, or temperature, length of daylight, or special events for utility demand. Local, regional, and world factors such as commodity prices, financial markets, and events such as COVID-19 can also change demand trajectory.

ML

ML ML AWS Data Lakes

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

This account manages templates for setting up new ML Dev Accounts, as well as SageMaker Projects templates for model development and deployment, in AWS Service Catalog. It also hosts a model registry to store ML models developed by data science teams, and provides a single location to approve models for deployment.

ML

ML ML Data Scientist AWS

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 27, 2023

Configure OAuth settings for the Salesforce Data Cloud connector SageMaker Canvas uses AWS Secrets Manager to securely store connection information from the Salesforce connected app. For Data Source , choose Salesforce Data Cloud and Add Connection to import the data lake object.

ML

ML ML AWS SQL

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Recognizing these specific needs, Fivetran has developed a range of connectors, including dedicated applications, databases, files, and events, which can accommodate the diverse formats used by healthcare systems. Addressing these needs may pose challenges that lead to the implementation of custom solutions rather than a uniform approach.

SQL

SQL Data Warehouse Azure Cloud Data

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Flipboard

DECEMBER 6, 2023

These datasets are often a mix of numerical and text data, at times structured, unstructured, or semi-structured. needed to address some of these challenges in one of their many AI use cases built on AWS. The dataset Our structured dataset can reside in a SQL database, data lake, or data warehouse as long as we have support for SQL.

SQL

SQL Database AWS Machine Learning

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

Enterprise data architects, data engineers, and business leaders from around the globe gathered in New York last week for the 3-day Strata Data Conference , which featured new technologies, innovations, and many collaborative ideas. 2) When data becomes information, many (incremental) use cases surface.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

What Are the Best Third-Party Data Ingestion Tools for Snowflake? Fivetran Fivetran is a tool dedicated to replicating applications, databases, events, and files into a high-performance data warehouse, such as Snowflake. For those looking to migrate to Snowflake who prefer using AWS services, DMS is a great solution.

Data Warehouse

Data Warehouse Azure AWS Database

Mainframe Optimization: 5 Best Practices to Implement Now

Precisely

JANUARY 25, 2024

There are three potential approaches to mainframe modernization: Data Replication creates a duplicate copy of mainframe data in a cloud data warehouse or data lake, enabling high-performance analytics virtually in real time, without negatively impacting mainframe performance. Best Practice 5.

Data Governance

Data Governance Database Cloud Data Data Lakes

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Pickl AI

SEPTEMBER 18, 2024

Read More: How Airbnb Uses Big Data and Machine Learning to Offer World-Class Service Netflix’s Big Data Infrastructure Netflix’s data infrastructure is one of the most sophisticated globally, built primarily on cloud technology. petabytes of data. What Technologies Does Netflix Use for Its Big Data Infrastructure?

Big Data

Big Data Big Data Apache Kafka Big Data Analytics

Alation 2022.1: Customize Your Data Catalog

Alation

MARCH 1, 2022

Airline Reporting Corporation (ARC) sells data products to travel agencies and airlines. Lineage helps them identify the source of bad data to fix the problem fast. Manual lineage will give ARC a fuller picture of how data was created between AWS S3 data lake, Snowflake cloud data warehouse and Tableau (and how it can be fixed).

Data Warehouse

Data Warehouse Data Lakes Cloud Data Database

Enterprise data compliance and security review: Snorkel Flow 2024.R3

Snorkel AI

OCTOBER 9, 2024

Data ingress and egress Snorkel enables multiple paths to bring data into and out of Snorkel Flow, including but not limited to: Upload from and download to your local computer Data connectors with common third-party data lakes such as Databricks, Snowflake, Google Big Query as well as S3, GCS, and Azure buckets.

Azure

Azure AWS Data Lakes Clustering

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable. These professionals will work with their colleagues to ensure that data is accessible, with proper access. The reason this is an important skill is that ETL is a critical process for data warehousing and business intelligence.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Role of Data Engineers in the Data Ecosystem Data Engineers play a crucial role in the data ecosystem by bridging the gap between raw data and actionable insights. They are responsible for building and maintaining data architectures, which include databases, data warehouses, and data lakes.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. SageMaker Studio offers built-in algorithms, automated model tuning, and seamless integration with AWS services, making it a powerful platform for developing and deploying machine learning solutions at scale.

Machine Learning

Machine Learning Machine Learning ML ML

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

To combine the collected data, you can integrate different data producers into a data lake as a repository. A central repository for unstructured data is beneficial for tasks like analytics and data virtualization. Data Cleaning The next step is to clean the data after ingesting it into the data lake.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Adapting to Change: Finding Opportunity in Crucible Moments

Alation

JUNE 7, 2023

But how do the unfolding events impact your business? So, ARC worked to make data more accessible across domains while capturing tribal knowledge in the data catalog; this reduced the subject-matter-expertise bottlenecks during product development and accelerated higher quality analysis.

Data Silos

Data Silos Data Lakes Data Governance Business Intelligence

External & Directory Tables in Snowflake 101

phData

JULY 10, 2023

Why External Tables are Important Data Ingestion: External tables allow you to easily load data into Snowflake from various external data sources without the need to first stage the data within Snowflake. Data Integration: Snowflake supports seamless integration with other data processing systems and data lakes.

Data Lakes

Data Lakes Azure Database AWS

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process. Data Ingestion : Involves raw data collection from origin and storage using architectures such as batch, streaming or event-driven.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Shaping the future: OMRON’s data-driven journey with AWS

How Rocket Companies modernized their data science solution on AWS

Webinars

Trending Sources

Integrate foundation models into your code with Amazon Bedrock

Webinars

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Your guide to generative AI and ML at AWS re:Invent 2023

How Northpower used computer vision with AWS to automate safety inspection risk assessments

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Driving Business Value and ROI from a Hybrid Cloud Data Lake

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS re:Invent Recap: The Future of Cloud

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink

Beyond data: Cloud analytics mastery for business brilliance

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

Insiders Cite The Wondrous Benefits Of Big Data In Fortnite

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

Promote pipelines in a multi-environment setup using Amazon SageMaker Model Registry, HashiCorp Terraform, GitHub, and Jenkins CI/CD

Imperva optimizes SQL generation from natural language using Amazon Bedrock

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

How Thomson Reuters built an AI platform using Amazon SageMaker to accelerate delivery of ML projects

Use weather data to improve forecasts with Amazon SageMaker Canvas

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

Top 5 Fivetran Connectors for Healthcare

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

3 Major Trends at Strata New York 2017

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Mainframe Optimization: 5 Best Practices to Implement Now

How Netflix Applies Big Data Across Business Verticals: Insights and Strategies

Alation 2022.1: Customize Your Data Catalog

Enterprise data compliance and security review: Snorkel Flow 2024.R3

How to Shift from Data Science to Data Engineering

Discover the Most Important Fundamentals of Data Engineering

MLOps Landscape in 2023: Top Tools and Platforms

How to Manage Unstructured Data in AI and Machine Learning Projects

Adapting to Change: Finding Opportunity in Crucible Moments

External & Directory Tables in Snowflake 101

Comparing Tools For Data Processing Pipelines

Stay Connected