AWS, Data Lakes and Data Scientist - Data Science Current

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Flipboard

NOVEMBER 22, 2024

For example, in the bank marketing use case, the management account would be responsible for setting up the organizational structure for the bank’s data and analytics teams, provisioning separate accounts for data governance, data lakes, and data science teams, and maintaining compliance with relevant financial regulations.

Data Governance

Data Governance ML ML Data Lakes

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

The Hadoop environment was hosted on Amazon Elastic Compute Cloud (Amazon EC2) servers, managed in-house by Rockets technology team, while the data science experience infrastructure was hosted on premises. Communication between the two systems was established through Kerberized Apache Livy (HTTPS) connections over AWS PrivateLink.

Data Science

Data Science AWS Hadoop Data Scientist

Precise Software Solutions implements ML as a service on AWS to save time and money for federal agency

Flipboard

JANUARY 6, 2025

Precise), an Amazon Web Services (AWS) Partner , participated in the AWS Think Big for Small Business Program (TBSB) to expand their AWS capabilities and to grow their business in the public sector. Precise Software Solutions, Inc. The platform helped the agency digitize and process forms, pictures, and other documents.

AWS

AWS ML ML Machine Learning

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. However, implementing security, data privacy, and governance controls are still key challenges faced by customers when implementing ML workloads at scale.

ML

ML ML AWS Data Lakes

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

AWS Machine Learning Blog

DECEMBER 7, 2023

In this post, we describe the end-to-end workforce management system that begins with location-specific demand forecast, followed by courier workforce planning and shift assignment using Amazon Forecast and AWS Step Functions. AWS Step Functions automatically initiate and monitor these workflows by simplifying error handling.

AWS

AWS Algorithm Data Science Machine Learning

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

AWS Machine Learning Blog

DECEMBER 4, 2023

In this post, we explain how we built an end-to-end product category prediction pipeline to help commercial teams by using Amazon SageMaker and AWS Batch , reducing model training duration by 90%. An important aspect of our strategy has been the use of SageMaker and AWS Batch to refine pre-trained BERT models for seven different languages.

AWS

AWS Predictive Analytics ML ML

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data.

SQL

SQL AWS Data Lakes AI

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Solution overview Amazon SageMaker is a fully managed service that helps developers and data scientists build, train, and deploy machine learning (ML) models. About the authors Scott Patterson is a Senior Solutions Architect at AWS. The sunburst graph below is a visualization of this classification.

AWS

AWS Data Lakes ML ML

Data Warehouse vs. Data Lake

Precisely

MARCH 9, 2023

Data warehouse vs. data lake, each has their own unique advantages and disadvantages; it’s helpful to understand their similarities and differences. In this article, we’ll focus on a data lake vs. data warehouse. It is often used as a foundation for enterprise data lakes.

Data Lakes

Data Lakes Data Warehouse Hadoop Big Data

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 26, 2024

At AWS, we are transforming our seller and customer journeys by using generative artificial intelligence (AI) across the sales lifecycle. It will be able to answer questions, generate content, and facilitate bidirectional interactions, all while continuously using internal AWS and external data to deliver timely, personalized insights.

AWS

AWS AI AI K-nearest Neighbors

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

AWS Machine Learning Blog

NOVEMBER 15, 2023

They are processing data across channels, including recorded contact center interactions, emails, chat and other digital channels. Solution requirements Principal provides investment services through Genesys Cloud CX, a cloud-based contact center that provides powerful, native integrations with AWS.

AWS

AWS Analytics Analytics ML

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In an increasingly digital and rapidly changing world, BMW Group’s business and product development strategies rely heavily on data-driven decision-making. With that, the need for data scientists and machine learning (ML) engineers has grown significantly.

ML

ML ML AWS AI

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. Solution overview In this section, we provide an overview of three personas: the data admin, data publisher, and data scientist.

Machine Learning

Machine Learning Machine Learning Data Governance ML

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 5, 2023

In this post, we show how the Carrier and AWS teams applied ML to predict faults across large fleets of equipment using a single model. We first highlight how we use AWS Glue for highly parallel data processing. This dramatically reduces the size of data while capturing features that characterize the equipment’s behavior.

AWS

AWS ML ML Machine Learning

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape. Let’s delve into the database portfolio from IBM available on AWS. 

AWS

AWS Database ETL AI

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Models Data Modeling Data Warehouse

8 Data Lake Vendors to Make Your Data Life Easier in 2023

ODSC - Open Data Science

JUNE 7, 2023

To make your data management processes easier, here’s a primer on data lakes, and our picks for a few data lake vendors worth considering. What is a data lake? First, a data lake is a centralized repository that allows users or an organization to store and analyze large volumes of data.

Data Lakes

Data Lakes Azure Data Warehouse Hadoop

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. In this post, you will learn how Marubeni is optimizing market decisions by using the broad set of AWS analytics and ML services, to build a robust and cost-effective Power Bid Optimization solution.

AWS

AWS Machine Learning Machine Learning Analytics

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

AWS Machine Learning Blog

MARCH 30, 2023

Therefore, it’s no surprise that determining the proficiency of goalkeepers in preventing the ball from entering the net is considered one of the most difficult tasks in football data analysis. Bundesliga and AWS have collaborated to perform an in-depth examination to study the quantification of achievements of Bundesliga’s keepers.

Machine Learning

Machine Learning Machine Learning AWS ML

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts. With this launch, account owners can grant access to select feature groups by other accounts using AWS Resource Access Manager (AWS RAM). Their task is to construct and oversee efficient data pipelines.

AWS

AWS ML ML Machine Learning

Why Open Table Format Architecture is Essential for Modern Data Systems

phData

NOVEMBER 8, 2024

Versioning also ensures a safer experimentation environment, where data scientists can test new models or hypotheses on historical data snapshots without impacting live data. Note : Cloud Data warehouses like Snowflake and Big Query already have a default time travel feature. FAQs What is a Data Lakehouse?

Data Lakes

Data Lakes Data Warehouse Database Azure

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. If you’re familiar with SageMaker and writing Spark code, option B could be your choice.

ML

ML ML AWS Data Warehouse

MLOps and DevOps: Why Data Makes It Different

O'Reilly Media

OCTOBER 19, 2021

ML use cases rarely dictate the master data management solution, so the ML stack needs to integrate with existing data warehouses. Today, a number of cloud-based, auto-scaling systems are easily available, such as AWS Batch. Data Science Layers. Software Development Layers. Software Architecture.

ML

ML ML Data Scientist AWS

Data Science News from Microsoft Ignite 2019

Data Science 101

NOVEMBER 7, 2019

Azure Synapse Analytics can be seen as a merge of Azure SQL Data Warehouse and Azure Data Lake. Synapse allows one to use SQL to query petabytes of data, both relational and non-relational, with amazing speed. I have not gotten a chance to try it out yet, so I am not sure its usecase for data science yet.

Data Science

Data Science Azure SQL Machine Learning

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

The role of a data scientist is in demand and 2023 will be no exception. To get a better grip on those changes we reviewed over 25,000 data scientist job descriptions from that past year to find out what employers are looking for in 2023. Data Science Of course, a data scientist should know data science!

Data Science

Data Science Data Scientist Computer Science Computer Science

Build generative AI–powered Salesforce applications with Amazon Bedrock

AWS Machine Learning Blog

JULY 29, 2024

In Part 3 , we demonstrate how business analysts and citizen data scientists can create machine learning (ML) models, without code, in Amazon SageMaker Canvas and deploy trained models for integration with Salesforce Einstein Studio to create powerful business applications. For this post, we use the Anthropic Claude 3 Sonnet model.

AWS

AWS AI AI ML

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

In this post, we assign the functions in terms of the ML lifecycle to each role as follows: Lead data scientist Provision accounts for ML development teams, govern access to the accounts and resources, and promote standardized model development and approval process to eliminate repeated engineering effort.

ML

ML ML Data Scientist AWS

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

AI

AI AI ML ML

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

By using these capabilities, businesses can efficiently store, manage, and analyze time-series data, enabling data-driven decisions and gaining a competitive edge. sales-train-data is used to store data extracted from MongoDB Atlas, while sales-forecast-output contains predictions from Canvas. Note we have two folders.

Clustering

Clustering AWS Database ML

Promote pipelines in a multi-environment setup using Amazon SageMaker Model Registry, HashiCorp Terraform, GitHub, and Jenkins CI/CD

AWS Machine Learning Blog

NOVEMBER 9, 2023

Central model registry – Amazon SageMaker Model Registry is set up in a separate AWS account to track model versions generated across the dev and prod environments. with administrative privileges installed on AWS Terraform version 1.5.5 After the key is provisioned, it should be visible on the AWS KMS console.

AWS

AWS ML ML Machine Learning

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Downtime, like the AWS outage in 2017 that affected several high-profile websites, can disrupt business operations. Data integration: Integrate data from various sources into a centralized cloud data warehouse or data lake. Ensure that data is clean, consistent, and up-to-date.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

In this post, we describe how AWS Partner Airis Solutions used Amazon Lookout for Equipment , AWS Internet of Things (IoT) services, and CloudRail sensor technologies to provide a state-of-the-art solution to address these challenges. It’s an easy way to run analytics on IoT data to gain accurate insights.

AWS

AWS ML ML Machine Learning

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena. Eitan Sela is a Generative AI and Machine Learning Specialist Solutions Architect at AWS.

SQL

SQL Database AWS Machine Learning

Demand forecasting at Getir built with Amazon Forecast

AWS Machine Learning Blog

MAY 15, 2023

We outline how we built an automated demand forecasting pipeline using Forecast and orchestrated by AWS Step Functions to predict daily demand for SKUs. On an ongoing basis, we calculate mean absolute percentage error (MAPE) ratios with product-based data, and optimize model and feature ingestion processes.

Algorithm

Algorithm Data Scientist Machine Learning Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. SageMaker Studio offers built-in algorithms, automated model tuning, and seamless integration with AWS services, making it a powerful platform for developing and deploying machine learning solutions at scale.

Machine Learning

Machine Learning Machine Learning ML ML

Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra

AWS Machine Learning Blog

AUGUST 2, 2023

We demonstrate CDE using simple examples and provide a step-by-step guide for you to experience CDE in an Amazon Kendra index in your own AWS account. Marketing firms store vast amounts of digital data that needs to be centralized, easily searchable, and scalable enabled by data catalogs.

AWS

AWS AI AI Machine Learning

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

AWS Machine Learning Blog

APRIL 18, 2023

It includes sensor devices to capture vibration and temperature data, a gateway device to securely transfer data to the AWS Cloud, the Amazon Monitron service that analyzes the data for anomalies with ML, and a companion mobile app to track potential failures in your machinery.

AWS

AWS ML ML Database

Adopting & Scaling AI, a Beginner’s Guide to Prompt Engineering, and Pretraining Large Language…

ODSC - Open Data Science

JULY 27, 2023

Choosing a Data Lake Format: What to Actually Look For The differences between many data lake products today might not matter as much as you think. When choosing a data lake, here’s something else to consider. When choosing a data lake, here’s something else to consider.

Data Lakes

Data Lakes SQL AI AI

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities.

AI

AI AI Machine Learning Machine Learning

How HR Tech Company Sense Scaled their ML Operations using Iguazio

Iguazio

JANUARY 16, 2024

Since AI is a central pillar of their value offering, Sense has invested heavily in a robust engineering organization including a large number of data and AI professionals. This includes a data team, an analytics team, DevOps, AI/ML, and a data science team. First, the data lake is fed from a number of data sources.

ML

ML ML DataOps Data Scientist

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

When it comes to data complexity, it is for sure that in machine learning, we are dealing with much more complex data. First of all, machine learning engineers and data scientists often use data from different data vendors. Some data sets are being corrected by data entry specialists and manual inspectors.

ML

ML ML Data Lakes Machine Learning

How Sense Uses Iguazio as a Key Component of Their ML Stack

Iguazio

JANUARY 16, 2024

Since AI is a central pillar of their value offering, Sense has invested heavily in a robust engineering organization, including a large number of data and data science professionals. This includes a data team, an analytics team, DevOps, AI/ML, and a data science team. Gennaro Frazzingaro, Head of AI/ML at Sense.

ML

ML ML DataOps Data Scientist

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

How Rocket Companies modernized their data science solution on AWS

Webinars

Trending Sources

Precise Software Solutions implements ML as a service on AWS to save time and money for federal agency

Webinars

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Data Warehouse vs. Data Lake

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

Tackling AI’s data challenges with IBM databases on AWS

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

8 Data Lake Vendors to Make Your Data Life Easier in 2023

How Marubeni is optimizing market decisions using AWS machine learning and analytics

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Why Open Table Format Architecture is Essential for Modern Data Systems

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

MLOps and DevOps: Why Data Makes It Different

Data Science News from Microsoft Ignite 2019

40 Must-Know Data Science Skills and Frameworks for 2023

Build generative AI–powered Salesforce applications with Amazon Bedrock

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

Promote pipelines in a multi-environment setup using Amazon SageMaker Model Registry, HashiCorp Terraform, GitHub, and Jenkins CI/CD

Beyond data: Cloud analytics mastery for business brilliance

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

Imperva optimizes SQL generation from natural language using Amazon Bedrock

Demand forecasting at Getir built with Amazon Forecast

MLOps Landscape in 2023: Top Tools and Platforms

Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

Adopting & Scaling AI, a Beginner’s Guide to Prompt Engineering, and Pretraining Large Language…

Exploring the AI and data capabilities of watsonx

How HR Tech Company Sense Scaled their ML Operations using Iguazio

How to Version Control Data in ML for Various Data Sources

How Sense Uses Iguazio as a Key Component of Their ML Stack

Stay Connected