AWS, Data Lakes and Data Science - Data Science Current

A Guide to Build your Data Lake in AWS

Analytics Vidhya

APRIL 25, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon. Introduction Data Lake architecture for different use cases – Elegant. The post A Guide to Build your Data Lake in AWS appeared first on Analytics Vidhya.

Data Lakes

Data Lakes AWS Data Science Analytics

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Flipboard

NOVEMBER 22, 2024

For example, in the bank marketing use case, the management account would be responsible for setting up the organizational structure for the bank’s data and analytics teams, provisioning separate accounts for data governance, data lakes, and data science teams, and maintaining compliance with relevant financial regulations.

Data Governance

Data Governance ML ML Data Lakes

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools.

Data Science

Data Science AWS Hadoop Data Scientist

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Interview – Business Intelligence und Process Mining ohne Vendor Lock-in!

Data Science Blog

FEBRUARY 7, 2023

Für Data Science ja sowieso. Was gerade zum Trend wird, ist der Aufbau eines Data Lakehouses. Ein Lakehouse inkludiert auch clevere Art und Weise auch einen Data Lake. Die Inhalte des Data Lakes sind bestenfalls etwas vorsortiert, aber eigentlich hofft man ja nicht, da wieder irgendwas drin wiederfinden zu müssen.

Business Intelligence

Business Intelligence Business Intelligence Data Warehouse Data Lakes

Cloud Data Science News Beta #1

Data Science 101

NOVEMBER 11, 2019

Welcome to the first beta edition of Cloud Data Science News. This will cover major announcements and news for doing data science in the cloud. Azure Synapse Analytics This is the future of data warehousing. Azure Synapse Analytics This is the future of data warehousing. Microsoft Azure. Google Cloud.

Cloud Data

Cloud Data Data Science Azure Clustering

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

However, with the help of AI and machine learning (ML), new software tools are now available to unearth the value of unstructured data. In this post, we discuss how AWS can help you successfully address the challenges of extracting insights from unstructured data. The solution integrates data in three tiers.

AWS

AWS ML ML Analytics

Data Science News from Microsoft Ignite 2019

Data Science 101

NOVEMBER 7, 2019

Microsoft just held one of its largest conferences of the year, and a few major announcements were made which pertain to the cloud data science world. Azure Synapse Analytics can be seen as a merge of Azure SQL Data Warehouse and Azure Data Lake. Those are the big data science announcements of the week.

Data Science

Data Science Azure SQL Machine Learning

Was ist ein Data Lakehouse?

Data Science Blog

MAY 15, 2023

tl;dr Ein Data Lakehouse ist eine moderne Datenarchitektur, die die Vorteile eines Data Lake und eines Data Warehouse kombiniert. Die Definition eines Data Lakehouse Ein Data Lakehouse ist eine moderne Datenspeicher- und -verarbeitungsarchitektur, die die Vorteile von Data Lakes und Data Warehouses vereint.

Data Warehouse

Data Warehouse Data Lakes Azure AWS

Cloud Data Science News – Beta 6

Data Science 101

DECEMBER 16, 2019

Even though Amazon is taking a break from announcements (probably focusing on Christmas shoppers), there are still some updates in the cloud data science world. Azure Tips and Tricks: Make your data Searchable A quick video to demonstrate Azure Search. Here they are. It now also supports PDF documents. Courses and Learning.

Cloud Data

Cloud Data Data Science Azure Natural Language Processing

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

AWS Machine Learning Blog

DECEMBER 7, 2023

In this post, we describe the end-to-end workforce management system that begins with location-specific demand forecast, followed by courier workforce planning and shift assignment using Amazon Forecast and AWS Step Functions. AWS Step Functions automatically initiate and monitor these workflows by simplifying error handling.

AWS

AWS Algorithm Data Science Machine Learning

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

AWS Machine Learning Blog

DECEMBER 4, 2023

In this post, we explain how we built an end-to-end product category prediction pipeline to help commercial teams by using Amazon SageMaker and AWS Batch , reducing model training duration by 90%. An important aspect of our strategy has been the use of SageMaker and AWS Batch to refine pre-trained BERT models for seven different languages.

AWS

AWS Predictive Analytics ML ML

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Here’s what we found for both skills and platforms that are in demand for data scientist jobs. Data Science Skills and Competencies Aside from knowing particular frameworks and languages, there are various topics and competencies that any data scientist should know. Joking aside, this does infer particular skills.

Data Science

Data Science Data Scientist Computer Science Computer Science

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. However, implementing security, data privacy, and governance controls are still key challenges faced by customers when implementing ML workloads at scale.

ML

ML ML AWS Data Lakes

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Amazon Simple Storage Service (Amazon S3) stores the model artifacts and creates a data lake to host the inference output, document analysis output, and other datasets in CSV format. SageMaker JumpStart provided deployable models that could be trained for object detection use cases with minimal data science knowledge and overhead.

AWS

AWS Data Lakes ML ML

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 5, 2023

In this post, we show how the Carrier and AWS teams applied ML to predict faults across large fleets of equipment using a single model. We first highlight how we use AWS Glue for highly parallel data processing. This dramatically reduces the size of data while capturing features that characterize the equipment’s behavior.

AWS

AWS ML ML Machine Learning

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Flipboard

MARCH 7, 2023

This post presents a solution that uses a workflow and AWS AI and machine learning (ML) services to provide actionable insights based on those transcripts. We use multiple AWS AI/ML services, such as Contact Lens for Amazon Connect and Amazon SageMaker , and utilize a combined architecture.

ML

ML ML AWS AI

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

AWS Machine Learning Blog

JUNE 5, 2023

To simplify access to Parquet files, Amazon SageMaker Canvas has added data import capabilities from over 40 data sources , including Amazon Athena , which supports Apache Parquet. Canvas provides connectors to AWS data sources such as Amazon Simple Storage Service (Amazon S3), Athena, and Amazon Redshift.

Machine Learning

Machine Learning Machine Learning AWS Data Lakes

8 Data Lake Vendors to Make Your Data Life Easier in 2023

ODSC - Open Data Science

JUNE 7, 2023

To make your data management processes easier, here’s a primer on data lakes, and our picks for a few data lake vendors worth considering. What is a data lake? First, a data lake is a centralized repository that allows users or an organization to store and analyze large volumes of data.

Data Lakes

Data Lakes Azure Data Warehouse Hadoop

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

Amazon Redshift: Amazon Redshift is a cloud-based data warehousing service provided by Amazon Web Services (AWS). Amazon Redshift allows data engineers to analyze large datasets quickly using massively parallel processing (MPP) architecture. Airflow An open-source platform for building and scheduling data pipelines.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Manager Data Science at Marubeni Power International. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. Data collection and ingestion The data collection and ingestion layer connects to all upstream data sources and loads the data into the data lake.

AWS

AWS Machine Learning Machine Learning Analytics

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS.

ML

ML ML AWS AI

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

phData

APRIL 4, 2023

Fivetran today announced support for Amazon Simple Storage Service (Amazon S3) with Apache Iceberg data lake format. Amazon S3 is an object storage service from Amazon Web Services (AWS) that offers industry-leading scalability, data availability, security, and performance.

Data Lakes

Data Lakes Data Warehouse Cloud Data AWS

Introducing the Amazon Comprehend flywheel for MLOps

AWS Machine Learning Blog

MARCH 1, 2023

MLOps focuses on the intersection of data science and data engineering in combination with existing DevOps practices to streamline model delivery across the ML development lifecycle. MLOps requires the integration of software development, operations, data engineering, and data science.

Data Lakes

Data Lakes AWS ML ML

Building an Effective OSS Management Layer for Your Data Lake

ODSC - Open Data Science

OCTOBER 13, 2024

Be sure to check out her talk, “ Don’t Go Over the Deep End: Building an Effective OSS Management Layer for Your Data Lake ,” there! Managing a data lake can often feel like being lost at sea — especially when dealing with both structured and unstructured data.

Data Lakes

Data Lakes Database Data Pipeline SQL

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

AWS Machine Learning Blog

MARCH 30, 2023

Therefore, it’s no surprise that determining the proficiency of goalkeepers in preventing the ball from entering the net is considered one of the most difficult tasks in football data analysis. Bundesliga and AWS have collaborated to perform an in-depth examination to study the quantification of achievements of Bundesliga’s keepers.

Machine Learning

Machine Learning Machine Learning AWS ML

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Third, despite the larger adoption of centralized analytics solutions like data lakes and warehouses, complexity rises with different table names and other metadata that is required to create the SQL for the desired sources. Our solution aims to address those challenges using Amazon Bedrock and AWS Analytics Services.

SQL

SQL AWS Database ML

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts. With this launch, account owners can grant access to select feature groups by other accounts using AWS Resource Access Manager (AWS RAM). Review the access policy to understand permissions granted.

AWS

AWS ML ML Machine Learning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. If you’re familiar with SageMaker and writing Spark code, option B could be your choice.

ML

ML ML AWS Data Warehouse

Promote pipelines in a multi-environment setup using Amazon SageMaker Model Registry, HashiCorp Terraform, GitHub, and Jenkins CI/CD

AWS Machine Learning Blog

NOVEMBER 9, 2023

Building out a machine learning operations (MLOps) platform in the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) for organizations is essential for seamlessly bridging the gap between data science experimentation and deployment while meeting the requirements around model performance, security, and compliance.

AWS

AWS ML ML Machine Learning

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

Whether logs are coming from Amazon Web Services (AWS), other cloud providers, on-premises, or edge devices, customers need to centralize and standardize security data. Solution overview Figure 1 – Solution Architecture Enable Amazon Security Lake with AWS Organizations for AWS accounts, AWS Regions, and external IT environments.

AWS

AWS ML ML Algorithm

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

These professionals will work with their colleagues to ensure that data is accessible, with proper access. So let’s go through each step one by one, and help you build a roadmap toward becoming a data engineer. Identify your existing data science strengths. Stay on top of data engineering trends. Get more training!

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

AWS Machine Learning Blog

NOVEMBER 22, 2023

The IDP Well-Architected Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build secure, efficient, and reliable IDP solutions on AWS. AWS might periodically update the service limits based on various factors.

AWS

AWS ML ML Machine Learning

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena. The solution: A data science approach In data science, it’s common to develop a model and fine tune it using experimentation.

SQL

SQL Database AWS Machine Learning

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

AI

AI AI ML ML

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

Data science teams often face challenges when transitioning models from the development environment to production. Usually, there is one lead data scientist for a data science group in a business unit, such as marketing. ML Dev Account This is where data scientists perform their work.

ML

ML ML Data Scientist AWS

MLOps and DevOps: Why Data Makes It Different

O'Reilly Media

OCTOBER 19, 2021

Similarly, it would be pointless to pretend that a data-intensive application resembles a run-off-the-mill microservice which can be built with the usual software toolchain consisting of, say, GitHub, Docker, and Kubernetes. Adapted from the book Effective Data Science Infrastructure. Data Science Layers.

ML

ML ML Data Scientist AWS

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

In this post, we describe how AWS Partner Airis Solutions used Amazon Lookout for Equipment , AWS Internet of Things (IoT) services, and CloudRail sensor technologies to provide a state-of-the-art solution to address these challenges. It’s an easy way to run analytics on IoT data to gain accurate insights.

AWS

AWS ML ML Machine Learning

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Downtime, like the AWS outage in 2017 that affected several high-profile websites, can disrupt business operations. Data integration: Integrate data from various sources into a centralized cloud data warehouse or data lake. Ensure that data is clean, consistent, and up-to-date.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

The 5 Best Cloud Platforms for Hosting AI Applications

Towards AI

MARCH 3, 2025

AWS is like that overachieving student who excels at everything. Complex pricing structure (seriously, who can predict AWS bills?).Steeper Join thousands of data leaders on the AI newsletter. , and even facing performance bottlenecks, I finally cracked the code. Lets dive in!

Data Lakes

Data Lakes AI AI AWS

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

AWS Machine Learning Blog

APRIL 18, 2023

It includes sensor devices to capture vibration and temperature data, a gateway device to securely transfer data to the AWS Cloud, the Amazon Monitron service that analyzes the data for anomalies with ML, and a companion mobile app to track potential failures in your machinery.

AWS

AWS ML ML Database

A Guide to Build your Data Lake in AWS

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Webinars

Trending Sources

How Rocket Companies modernized their data science solution on AWS

Webinars

Interview – Business Intelligence und Process Mining ohne Vendor Lock-in!

Cloud Data Science News Beta #1

Unstructured data management and governance using AWS AI/ML and analytics services

Data Science News from Microsoft Ignite 2019

Was ist ein Data Lakehouse?

Cloud Data Science News – Beta 6

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

40 Must-Know Data Science Skills and Frameworks for 2023

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

How Northpower used computer vision with AWS to automate safety inspection risk assessments

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

8 Data Lake Vendors to Make Your Data Life Easier in 2023

Essential data engineering tools for 2023: Empowering for management and analysis

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

How Marubeni is optimizing market decisions using AWS machine learning and analytics

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

Introducing the Amazon Comprehend flywheel for MLOps

Building an Effective OSS Management Layer for Your Data Lake

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Promote pipelines in a multi-environment setup using Amazon SageMaker Model Registry, HashiCorp Terraform, GitHub, and Jenkins CI/CD

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

How to Shift from Data Science to Data Engineering

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

Imperva optimizes SQL generation from natural language using Amazon Bedrock

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

MLOps and DevOps: Why Data Makes It Different

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

Beyond data: Cloud analytics mastery for business brilliance

The 5 Best Cloud Platforms for Hosting AI Applications

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

Stay Connected