AWS, Big Data and ML - Data Science Current

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Flipboard

NOVEMBER 22, 2024

This post is part of an ongoing series about governing the machine learning (ML) lifecycle at scale. This post dives deep into how to set up data governance at scale using Amazon DataZone for the data mesh. The data mesh is a modern approach to data management that decentralizes data ownership and treats data as a product.

Data Governance

Data Governance ML ML Data Lakes

AWS Announces Generative AI Innovation Center with $100 million Investment

insideBIGDATA

JUNE 22, 2023

AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), today announced the AWS Generative AI Innovation Center, a new program to help customers successfully build and deploy generative artificial intelligence (AI) solutions. Amazon Web Services, Inc.

AWS

AWS Artificial Intelligence Artificial Intelligence Machine Learning

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. However, with the help of AI and machine learning (ML), new software tools are now available to unearth the value of unstructured data. The solution integrates data in three tiers.

AWS

AWS ML ML Analytics

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Flipboard

JUNE 26, 2023

These techniques utilize various machine learning (ML) based approaches. In this post, we look at how we can use AWS Glue and the AWS Lake Formation ML transform FindMatches to harmonize (deduplicate) customer data coming from different sources to get a complete customer profile to be able to provide better customer experience.

AWS

AWS ML ML ETL

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS Machine Learning Blog

DECEMBER 4, 2024

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. Only 54% of ML prototypes make it to production, and only 5% of generative AI use cases make it to production. Using SageMaker, you can build, train and deploy ML models.

ML

ML ML AWS AI

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

With that, the need for data scientists and machine learning (ML) engineers has grown significantly. Data scientists and ML engineers require capable tooling and sufficient compute for their work. Data scientists and ML engineers require capable tooling and sufficient compute for their work.

ML

ML ML AWS AI

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

AWS

AWS ML ML Machine Learning

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Flipboard

NOVEMBER 19, 2024

The solution workflow consists of the following steps: The user accesses a smart search portal and lands on a web interface deployed on AWS Amplify. The API is integrated with AWS Lambda , which processes the user query and generates the answers based on available documents and user access using retrieval augmented generation (RAG).

AWS

AWS AI AI Big Data

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries.

ML

ML ML AWS Data Lakes

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Data Science Dojo

OCTOBER 31, 2024

Growth Outlook: Companies like Google DeepMind, NASA’s Jet Propulsion Lab, and IBM Research actively seek research data scientists for their teams, with salaries typically ranging from $120,000 to $180,000. With the continuous growth in AI, demand for remote data science jobs is set to rise.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Use AWS PrivateLink to set up private access to Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 30, 2023

Amazon Bedrock is a fully managed service provided by AWS that offers developers access to foundation models (FMs) and the tools to customize them for specific applications. Customers are building innovative generative AI applications using Amazon Bedrock APIs using their own proprietary data.

AWS

AWS ML ML AI

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Driven by significant advancements in computing technology, everything from mobile phones to smart appliances to mass transit systems generate and digest data, creating a big data landscape that forward-thinking enterprises can leverage to drive innovation. However, the big data landscape is just that.

Big Data

Big Data Big Data ML ML

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 12, 2024

Instead, organizations are increasingly looking to take advantage of transformative technologies like machine learning (ML) and artificial intelligence (AI) to deliver innovative products, improve outcomes, and gain operational efficiencies at scale.

ML

ML ML AWS AI

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

AWS (Amazon Web Services), the comprehensive and evolving cloud computing platform provided by Amazon, is comprised of infrastructure as a service (IaaS), platform as a service (PaaS) and packaged software as a service (SaaS). With its wide array of tools and convenience, AWS has already become a popular choice for many SaaS companies.

AWS

AWS Cloud Computing Data Lakes Database

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

AWS Machine Learning Blog

OCTOBER 16, 2024

DPG Media chose Amazon Transcribe for its ease of transcription and low maintenance, with the added benefit of incremental improvements by AWS over the years. The flexibility to experiment with multiple models was appreciated, and there are plans to try out Anthropic Claude Opus when it becomes available in their desired AWS Region.

AWS

AWS AI AI Big Data

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. We add this data to Snowflake as a new table.

ML

ML ML AWS Python

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

AWS Machine Learning Blog

MARCH 15, 2024

Many organizations are implementing machine learning (ML) to enhance their business decision-making through automation and the use of large distributed datasets. With increased access to data, ML has the potential to provide unparalleled business insights and opportunities.

AWS

AWS ML ML Machine Learning

Techniques and approaches for monitoring large language models on AWS

AWS Machine Learning Blog

FEBRUARY 26, 2024

By using AWS services, our architecture provides real-time visibility into LLM behavior and enables teams to quickly identify and address any issues or anomalies. In this post, we demonstrate a few metrics for online LLM monitoring and their respective architecture for scale using AWS services such as Amazon CloudWatch and AWS Lambda.

AWS

AWS Machine Learning Machine Learning Big Data

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler.

Data Preparation

Data Preparation ML ML Data Quality

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

AWS Machine Learning Blog

DECEMBER 4, 2023

In this post, we explain how we built an end-to-end product category prediction pipeline to help commercial teams by using Amazon SageMaker and AWS Batch , reducing model training duration by 90%. An important aspect of our strategy has been the use of SageMaker and AWS Batch to refine pre-trained BERT models for seven different languages.

AWS

AWS Predictive Analytics ML ML

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Lets assume that the question What date will AWS re:invent 2024 occur? The corresponding answer is also input as AWS re:Invent 2024 takes place on December 26, 2024. If the question was Whats the schedule for AWS events in December?, This setup uses the AWS SDK for Python (Boto3) to interact with AWS services.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment.

AWS

AWS Machine Learning Machine Learning ML

Navigating tomorrow: Role of AI and ML in information technology

Dataconomy

FEBRUARY 6, 2024

With the ability to analyze a vast amount of data in real-time, identify patterns, and detect anomalies, AI/ML-powered tools are enhancing the operational efficiency of businesses in the IT sector. Why does AI/ML deserve to be the future of the modern world? Let’s understand the crucial role of AI/ML in the tech industry.

ML

ML ML Machine Learning Machine Learning

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Second, because data, code, and other development artifacts like machine learning (ML) models are stored within different services, it can be cumbersome for users to understand how they interact with each other and make changes. SageMaker Unied Studio is an integrated development environment (IDE) for data, analytics, and AI.

SQL

SQL AWS Data Lakes AI

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

AWS Machine Learning Blog

DECEMBER 7, 2023

In this post, we describe the end-to-end workforce management system that begins with location-specific demand forecast, followed by courier workforce planning and shift assignment using Amazon Forecast and AWS Step Functions. AWS Step Functions automatically initiate and monitor these workflows by simplifying error handling.

AWS

AWS Algorithm Data Science Machine Learning

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

AWS Machine Learning Blog

DECEMBER 12, 2023

In this post, we’ll summarize training procedure of GPT NeoX on AWS Trainium , a purpose-built machine learning (ML) accelerator optimized for deep learning training. M tokens/$) trained such models with AWS Trainium without losing any model quality. We’ll outline how we cost-effectively (3.2 billion in Pythia. 2048 256 10.4

AWS

AWS Machine Learning Machine Learning Deep Learning

Governing the ML lifecycle at scale: Centralized observability with Amazon SageMaker and Amazon CloudWatch

AWS Machine Learning Blog

OCTOBER 29, 2024

This post is part of an ongoing series on governing the machine learning (ML) lifecycle at scale. To start from the beginning, refer to Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker.

ML

ML ML AWS Machine Learning

How AWS Prototyping enabled ICL-Group to build computer vision models on Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 14, 2023

This is a customer post jointly authored by ICL and AWS employees. To overcome this business challenge, ICL decided to develop in-house capabilities to use machine learning (ML) for computer vision (CV) to automatically monitor their mining machines.

AWS

AWS ML ML Machine Learning

Llama 4 family of models from Meta are now available in SageMaker JumpStart

AWS Machine Learning Blog

APRIL 7, 2025

Virginia) AWS Region. Prerequisites To try the Llama 4 models in SageMaker JumpStart, you need the following prerequisites: An AWS account that will contain all your AWS resources. An AWS Identity and Access Management (IAM) role to access SageMaker AI. The example extracts and contextualizes the buildspec-1-10-2.yml

AWS

AWS Machine Learning Machine Learning AI

NeMo Retriever Llama 3.2 text embedding and reranking NVIDIA NIM microservices now available in Amazon SageMaker JumpStart

AWS Machine Learning Blog

MARCH 18, 2025

With this launch, you can now deploy NVIDIAs optimized reranking and embedding models to build, experiment, and responsibly scale your generative AI ideas on AWS. As part of NVIDIA AI Enterprise available in AWS Marketplace , NIM is a set of user-friendly microservices designed to streamline and accelerate the deployment of generative AI.

AWS

AWS AI AI Computer Science

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

Harnessing the power of big data has become increasingly critical for businesses looking to gain a competitive edge. However, managing the complex infrastructure required for big data workloads has traditionally been a significant challenge, often requiring specialized expertise.

AWS

AWS Clustering Big Data Big Data

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

Starting today, you can interactively prepare large datasets, create end-to-end data flows, and invoke automated machine learning (AutoML) experiments on petabytes of data—a substantial leap from the previous 5 GB limit. Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data.

ML

ML ML Data Preparation AWS

Generate financial industry-specific insights using generative AI and in-context fine-tuning

AWS Machine Learning Blog

NOVEMBER 12, 2024

You may check out additional reference notebooks on aws-samples for how to use Meta’s Llama models hosted on Amazon Bedrock. You can implement these steps either from the AWS Management Console or using the latest version of the AWS Command Line Interface (AWS CLI). Solutions Architect at AWS. Varun Mehta is a Sr.

SQL

SQL AWS AI AI

Achieve rapid time-to-value business outcomes with faster ML model training using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 3, 2023

Machine learning (ML) can help companies make better business decisions through advanced analytics. Companies across industries apply ML to use cases such as predicting customer churn, demand forecasting, credit scoring, predicting late shipments, and improving manufacturing quality. He is very passionate about data-driven AI.

ML

ML ML Machine Learning Machine Learning

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Flipboard

NOVEMBER 17, 2023

Amazon SageMaker enables enterprises to build, train, and deploy machine learning (ML) models. Amazon SageMaker JumpStart provides pre-trained models and data to help you get started with ML. MongoDB vector data store MongoDB Atlas Vector Search is a new feature that allows you to store and search vector data in MongoDB.

K-nearest Neighbors

K-nearest Neighbors AWS Clustering Database

Define customized permissions in minutes with Amazon SageMaker Role Manager via the AWS CDK

AWS Machine Learning Blog

JUNE 26, 2023

Machine learning (ML) administrators play a critical role in maintaining the security and integrity of ML workloads. To address this challenge, AWS introduced Amazon SageMaker Role Manager in December 2022. Their primary focus is to ensure that users operate with the utmost security, adhering to the principle of least privilege.

AWS

AWS ML ML Data Scientist

ML for Big Data with PySpark on AWS, Asynchronous Programming in Python, and the Top Industries for…

ODSC - Open Data Science

AUGUST 3, 2023

ML for Big Data with PySpark on AWS, Asynchronous Programming in Python, and the Top Industries for AI Harnessing Machine Learning on Big Data with PySpark on AWS In this brief tutorial, you’ll learn some basics on how to use Spark on AWS for machine learning, MLlib, and more.

Big Data

Big Data Big Data AWS ML

Discovering the Role of Data Science in a Cloud World

Pickl AI

DECEMBER 26, 2024

Summary: “Data Science in a Cloud World” highlights how cloud computing transforms Data Science by providing scalable, cost-effective solutions for big data, Machine Learning, and real-time analytics. This accessibility democratises Data Science, making it available to businesses of all sizes.

Data Science

Data Science Cloud Computing Machine Learning Machine Learning

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts.

AWS

AWS ML ML Machine Learning

Use Amazon SageMaker Model Card sharing to improve model governance

AWS Machine Learning Blog

AUGUST 31, 2023

As Artificial Intelligence (AI) and Machine Learning (ML) technologies have become mainstream, many enterprises have been successful in building critical business applications powered by ML models at scale in production.

AWS

AWS ML ML Data Scientist

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

AWS Machine Learning Blog

JANUARY 6, 2023

The recently published IDC MarketScape: Asia/Pacific (Excluding Japan) AI Life-Cycle Software Tools and Platforms 2022 Vendor Assessment positions AWS in the Leaders category. The tools are typically used by data scientists and ML developers from experimentation to production deployment of AI and ML solutions.

AWS

AWS ML ML Data Preparation

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. Data is frequently kept in data lakes that can be managed by AWS Lake Formation , giving you the ability to implement fine-grained access control using a straightforward grant or revoke procedure.

AWS

AWS Data Lakes Clustering Data Preparation

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. Enterprises can use no-code ML solutions to streamline their operations and optimize their decision-making without extensive administrative overhead.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Securing MLflow in AWS: Fine-grained access control with AWS native services

AWS Machine Learning Blog

MAY 8, 2023

With Amazon SageMaker , you can manage the whole end-to-end machine learning (ML) lifecycle. It offers many native capabilities to help manage ML workflows aspects, such as experiment tracking, and model governance via the model registry. To automate the infrastructure deployment, we use the AWS Cloud Development Kit (AWS CDK).

AWS

AWS Data Science Machine Learning Machine Learning

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

AWS Announces Generative AI Innovation Center with $100 million Investment

Webinars

Trending Sources

Unstructured data management and governance using AWS AI/ML and analytics services

Webinars

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Real value, real time: Production AI with Amazon SageMaker and Tecton

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Use AWS PrivateLink to set up private access to Amazon Bedrock

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

10 Things AWS Can Do for Your SaaS Company

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

Use Snowflake as a data source to train ML models with Amazon SageMaker

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

Techniques and approaches for monitoring large language models on AWS

Accelerate data preparation for ML in Amazon SageMaker Canvas

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Navigating tomorrow: Role of AI and ML in information technology

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

Governing the ML lifecycle at scale: Centralized observability with Amazon SageMaker and Amazon CloudWatch

How AWS Prototyping enabled ICL-Group to build computer vision models on Amazon SageMaker

Llama 4 family of models from Meta are now available in SageMaker JumpStart

NeMo Retriever Llama 3.2 text embedding and reranking NVIDIA NIM microservices now available in Amazon SageMaker JumpStart

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Generate financial industry-specific insights using generative AI and in-context fine-tuning

Achieve rapid time-to-value business outcomes with faster ML model training using Amazon SageMaker Canvas

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Define customized permissions in minutes with Amazon SageMaker Role Manager via the AWS CDK

ML for Big Data with PySpark on AWS, Asynchronous Programming in Python, and the Top Industries for…

Discovering the Role of Data Science in a Cloud World

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Use Amazon SageMaker Model Card sharing to improve model governance

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Securing MLflow in AWS: Fine-grained access control with AWS native services

Stay Connected