AWS and Data Modeling - Data Science Current

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

AWS Machine Learning Blog

DECEMBER 9, 2024

It simplifies the often complex and time-consuming tasks involved in setting up and managing an MLflow environment, allowing ML administrators to quickly establish secure and scalable MLflow environments on AWS. AWS CodeArtifact , which provides a private PyPI repository so that SageMaker can use it to download necessary packages.

AWS

AWS ML ML Data Scientist

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

The Hadoop environment was hosted on Amazon Elastic Compute Cloud (Amazon EC2) servers, managed in-house by Rockets technology team, while the data science experience infrastructure was hosted on premises. Communication between the two systems was established through Kerberized Apache Livy (HTTPS) connections over AWS PrivateLink.

Data Science

Data Science AWS Hadoop Data Scientist

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

Unstructured data is information that doesn’t conform to a predefined schema or isn’t organized according to a preset data model. Text, images, audio, and videos are common examples of unstructured data. Additionally, we show how to use AWS AI/ML services for analyzing unstructured data.

AWS

AWS ML ML Analytics

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

Derive meaningful and actionable operational insights from AWS Using Amazon Q Business

AWS Machine Learning Blog

JULY 17, 2024

As a customer, you rely on Amazon Web Services (AWS) expertise to be available and understand your specific environment and operations. Amazon Q Business is a fully managed, secure, generative-AI powered enterprise chat assistant that enables natural language interactions with your organization’s data.

AWS

AWS AI AI Artificial Intelligence

Integrate foundation models into your code with Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 6, 2024

Prerequisites Before you dive into the integration process, make sure you have the following prerequisites in place: AWS account – You’ll need an AWS account to access and use Amazon Bedrock. You can interact with Amazon Bedrock using AWS SDKs available in Python, Java, Node.js, and more.

AWS

AWS Python Machine Learning Machine Learning

Develop and train large models cost-efficiently with Metaflow and AWS Trainium

AWS Machine Learning Blog

APRIL 29, 2024

In 2024, however, organizations are using large language models (LLMs), which require relatively little focus on NLP, shifting research and development from modeling to the infrastructure needed to support LLM workflows. This often means the method of using a third-party LLM API won’t do for security, control, and scale reasons.

AWS

AWS ML ML Python

Modernizing data science lifecycle management with AWS and Wipro

AWS Machine Learning Blog

JANUARY 5, 2024

This post was written in collaboration with Bhajandeep Singh and Ajay Vishwakarma from Wipro’s AWS AI/ML Practice. Many organizations have been using a combination of on-premises and open source data science solutions to create and manage machine learning (ML) models.

AWS

AWS Data Science ML ML

Security best practices to consider while fine-tuning models in Amazon Bedrock

AWS Machine Learning Blog

JANUARY 24, 2025

In this post, we delve into the essential security best practices that organizations should consider when fine-tuning generative AI models. Security in Amazon Bedrock Cloud security at AWS is the highest priority. Amazon Bedrock prioritizes security through a comprehensive approach to protect customer data and AI workloads.

AWS

AWS AI AI Machine Learning

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

AWS Machine Learning Blog

DECEMBER 12, 2023

In this post, we’ll summarize training procedure of GPT NeoX on AWS Trainium , a purpose-built machine learning (ML) accelerator optimized for deep learning training. M tokens/$) trained such models with AWS Trainium without losing any model quality. We’ll outline how we cost-effectively (3.2 billion in Pythia.

AWS

AWS Machine Learning Machine Learning Deep Learning

Object-centric Process Mining on Data Mesh Architectures

Data Science Blog

NOVEMBER 15, 2023

New big data architectures and, above all, data sharing concepts such as Data Mesh are ideal for creating a common database for many data products and applications. The Event Log Data Model for Process Mining Process Mining as an analytical system can very well be imagined as an iceberg.

Data Models

Data Models Data Modeling Business Intelligence Business Intelligence

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

In addition to its groundbreaking AI innovations, Zeta Global has harnessed Amazon Elastic Container Service (Amazon ECS) with AWS Fargate to deploy a multitude of smaller models efficiently. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 5, 2023

The solution framework is scalable as more equipment is installed and can be reused for a variety of downstream modeling tasks. In this post, we show how the Carrier and AWS teams applied ML to predict faults across large fleets of equipment using a single model. The effective precision of the trained model is 91.6%.

AWS

AWS ML ML Machine Learning

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

By combining the capabilities of LLM function calling and Pydantic data models, you can dynamically extract metadata from user queries. Prerequisites Before proceeding with this tutorial, make sure you have the following in place: AWS account – You should have an AWS account with access to Amazon Bedrock.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 21, 2024

We guide you through deploying the necessary infrastructure using AWS CloudFormation , creating an internal labeling workforce, and setting up your first labeling job. This precision helps models learn the fine details that separate natural from artificial-sounding speech. We demonstrate how to use Wavesurfer.js

AWS

AWS AI AI Natural Language Processing

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

Data Mesh Architecture on Cloud for BI, Data Science and Process Mining

Data Science Blog

JULY 23, 2023

Data Mesh on Azure Cloud with Databricks and Delta Lake for Applications of Business Intelligence, Data Science and Process Mining. However, this concept on the Azure Cloud is just an example and can easily be implemented on the Google Cloud (GCP), Amazon Cloud (AWS) and now even on the SAP Cloud (Datasphere) using Databricks.

Data Science

Data Science Azure Power BI Business Intelligence

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Data Science Dojo

OCTOBER 31, 2024

Key Skills Proficiency in SQL is essential, along with experience in data visualization tools such as Tableau or Power BI. Strong analytical skills and the ability to work with large datasets are critical, as is familiarity with data modeling and ETL processes.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

TigerEye (YC S22) Is Hiring a Full Stack Engineer

Hacker News

NOVEMBER 19, 2024

Here are a few of the things that you might do as an AI Engineer at TigerEye: - Design, develop, and validate statistical models to explain past behavior and to predict future behavior of our customers’ sales teams - Own training, integration, deployment, versioning, and monitoring of ML components - Improve TigerEye’s existing metrics collection and (..)

Computer Science

Computer Science Computer Science ML ML

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

This ensures that the data models and queries developed by data professionals are consistent with the underlying infrastructure. Enhanced Security and Compliance Data Warehouses often store sensitive information, making security a paramount concern. IaC allows these teams to collaborate more effectively.

Data Warehouse

Data Warehouse Azure SQL Database

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries. The architecture maps the different capabilities of the ML platform to AWS accounts.

ML

ML ML AWS Data Lakes

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Understanding how data warehousing works and how to design and implement a data warehouse is an important skill for a data engineer. Learn about data modeling: Data modeling is the process of creating a conceptual representation of data.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

How InsuranceDekho transformed insurance agent interactions using Amazon Bedrock and generative AI

AWS Machine Learning Blog

NOVEMBER 18, 2024

Secure model access – Secure, private model access using AWS PrivateLink gives controlled data transfer for inference without traversing the public internet, maintaining data privacy and helping to adhere to compliance requirements.

AI

AI AI Database AWS

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the Amazon Web Services (AWS) tools without having to manage infrastructure. However, this is beyond the scope of this post.

AWS

AWS Machine Learning Machine Learning Database

Comparing DynamoDB and MongoDB for Big Data Management

Smart Data Collective

OCTOBER 19, 2022

You can only deploy DynamoDB on Amazon Web Services (AWS), and it does not support on-premise deployments. With DynamoDB, you are essentially locked into AWS as your cloud provider. MongoDB is deployable anywhere, and the MongoDB Atlas database-as-a-service can be deployed on AWS, Azure, and Google Cloud Platform (GCP).

Big Data

Big Data Big Data Database AWS

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

AWS Machine Learning Blog

NOVEMBER 22, 2023

The AWS Well-Architected Framework provides a systematic way for organizations to learn operational and architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable workloads in the cloud. These resources introduce common AWS services for IDP workloads and suggested workflows.

AWS

AWS ML ML Machine Learning

Automate the deployment of an Amazon Forecast time-series forecasting model

AWS Machine Learning Blog

MAY 4, 2023

Forecast uses ML to learn not only the best algorithm for each item, but also the best ensemble of algorithms for each item, automatically creating the best model for your data. The console and AWS CLI methods are best suited for quick experimentation to check the feasibility of time series forecasting using your data.

AWS

AWS ML ML Data Scientist

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

However, to fully harness the potential of a data lake, effective data modeling methodologies and processes are crucial. Data modeling plays a pivotal role in defining the structure, relationships, and semantics of data within a data lake. Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Models Data Modeling Data Warehouse

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 27, 2024

In this post, we share how Axfood, a large Swedish food retailer, improved operations and scalability of their existing artificial intelligence (AI) and machine learning (ML) operations by prototyping in close collaboration with AWS experts and using Amazon SageMaker.

Machine Learning

Machine Learning Machine Learning ML ML

Deploy a Hugging Face (PyAnnote) speaker diarization model on Amazon SageMaker as an asynchronous endpoint

AWS Machine Learning Blog

APRIL 25, 2024

We provide a comprehensive guide on how to deploy speaker segmentation and clustering solutions using SageMaker on the AWS Cloud. Solution overview Amazon Transcribe is the go-to service for speaker diarization in AWS. Hugging Face is a popular open source hub for machine learning (ML) models.

AWS

AWS ML ML Python

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

Amazon Redshift: Amazon Redshift is a cloud-based data warehousing service provided by Amazon Web Services (AWS). Amazon Redshift allows data engineers to analyze large datasets quickly using massively parallel processing (MPP) architecture. It is known for its high performance and cost-effectiveness.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Say Goodbye to Costly BERT Inference: Turbocharge with AWS Inferentia2 and Hugging Face…

Mlearning.ai

JUNE 7, 2023

AWS Inferentia accelerators are custom-built machine learning inference chips designed by Amazon Web Services (AWS) to optimize inference workloads on the AWS platform. The AWS Inferentia chips are designed with a focus on delivering high performance, low latency, and cost efficiency for inference workloads.

AWS

AWS Deep Learning Deep Learning AI

Fine-tune Mixtral 8x7b on AWS SageMaker and Deploy to RunPod

Mlearning.ai

DECEMBER 22, 2023

You can jump AWS authentication steps if you're already in AWS 's environment. Tip: Only include libraries with the most updated version in the requirements.txt file, specially the transformerslibrary as it may miss newly updated models in older versions. You can find here more about it. 1.46k [00:00<?

AWS

AWS ML ML Python

Fine-tune Meta Llama 3.1 models using torchtune on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 19, 2024

In this post, AWS collaborates with Meta’s PyTorch team to showcase how you can use Meta’s torchtune library to fine-tune Meta Llama-like architectures while using a fully-managed environment provided by Amazon SageMaker Training. cat config_l3.1_8b_lora.yaml # Model Arguments model: _component_: torchtune.models.llama3_1.lora_llama3_1_8b

AWS

AWS ML ML Machine Learning

Best practices for prompt engineering with Meta Llama 3 for Text-to-SQL use cases

AWS Machine Learning Blog

AUGUST 30, 2024

With the rapid growth of generative artificial intelligence (AI), many AWS customers are looking to take advantage of publicly available foundation models (FMs) and technologies. This includes Meta Llama 3, Meta’s publicly available large language model (LLM).

SQL

SQL AWS Database AI

Automate mortgage document fraud detection using an ML model and business-defined rules with Amazon Fraud Detector: Part 3

AWS Machine Learning Blog

FEBRUARY 7, 2024

In the first post of this three-part series, we presented a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case. Add rules to interpret model scores. It’s recommended to use at least 3–6 months of data.

ML

ML ML AWS Data Profiling

MLOps for IoT Edge Ecosystems: Building an MLOps Environment on AWS

The MLOps Blog

JANUARY 11, 2023

This can enable the company to leverage the data generated by its IoT edge devices to drive business decisions and gain a competitive advantage. AWS offers a three-layered machine learning stack to choose from based on your skill set and team’s requirements for implementing workloads to execute machine learning tasks.

AWS

AWS Machine Learning Machine Learning ML

Scaling Thomson Reuters’ language model research with Amazon SageMaker HyperPod

AWS Machine Learning Blog

SEPTEMBER 12, 2024

In this post, we explore the journey that Thomson Reuters took to enable cutting-edge research in training domain-adapted large language models (LLMs) using Amazon SageMaker HyperPod , an Amazon Web Services (AWS) feature focused on providing purpose-built infrastructure for distributed training at scale.

Clustering

Clustering AWS ML ML

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Key features of cloud analytics solutions include: Data models , Processing applications, and Analytics models. Data models help visualize and organize data, processing applications handle large datasets efficiently, and analytics models aid in understanding complex data sets, laying the foundation for business intelligence.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

AWS Machine Learning Blog

JULY 29, 2024

Launched in August 2019, Forecast predates Amazon SageMaker Canvas , a popular low-code no-code AWS tool for building, customizing, and deploying ML models, including time series forecasting models. For more information about AWS Region availability, see AWS Services by Region.

ML

ML ML Algorithm AWS

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

AWS Machine Learning Blog

JUNE 27, 2023

By enabling effective management of the ML lifecycle, MLOps can help account for various alterations in data, models, and concepts that the development of real-time image recognition applications is associated with. At-scale, real-time image recognition is a complex technical problem that also requires the implementation of MLOps.

ML

ML ML AWS Machine Learning

Implement a custom AutoML job using pre-selected algorithms in Amazon SageMaker Automatic Model Tuning

AWS Machine Learning Blog

NOVEMBER 15, 2023

Prerequisites The following are prerequisites for completing the walkthrough in this post: An AWS account Familiarity with SageMaker concepts, such as an Estimator, training job, and HPO job Familiarity with the Amazon SageMaker Python SDK Python programming knowledge Implement the solution The full code is available in the GitHub repo.

Algorithm

Algorithm AWS ML ML

Train a Large Language Model on a single Amazon SageMaker GPU with Hugging Face and LoRA

AWS Machine Learning Blog

JUNE 5, 2023

We use PEFT to optimize this model for the specific task of summarizing messenger-like conversations. The single-GPU instance that we use is a low-cost example of the many instance types AWS provides. Training this model on a single GPU highlights AWS’s commitment to being the most cost-effective provider of AI/ML services.

AWS

AWS ML ML Machine Learning

MLOps and DevOps: Why Data Makes It Different

O'Reilly Media

OCTOBER 19, 2021

We need robust versioning for data, models, code, and preferably even the internal state of applications—think Git on steroids to answer inevitable questions: What changed? ML use cases rarely dictate the master data management solution, so the ML stack needs to integrate with existing data warehouses.

ML

ML ML Data Scientist AWS

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

How Rocket Companies modernized their data science solution on AWS

Webinars

Trending Sources

Unstructured data management and governance using AWS AI/ML and analytics services

Webinars

Derive meaningful and actionable operational insights from AWS Using Amazon Q Business

Integrate foundation models into your code with Amazon Bedrock

Develop and train large models cost-efficiently with Metaflow and AWS Trainium

Modernizing data science lifecycle management with AWS and Wipro

Security best practices to consider while fine-tuning models in Amazon Bedrock

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

Object-centric Process Mining on Data Mesh Architectures

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Data Mesh Architecture on Cloud for BI, Data Science and Process Mining

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

TigerEye (YC S22) Is Hiring a Full Stack Engineer

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

How InsuranceDekho transformed insurance agent interactions using Amazon Bedrock and generative AI

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Comparing DynamoDB and MongoDB for Big Data Management

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

Automate the deployment of an Amazon Forecast time-series forecasting model

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

Deploy a Hugging Face (PyAnnote) speaker diarization model on Amazon SageMaker as an asynchronous endpoint

Essential data engineering tools for 2023: Empowering for management and analysis

Say Goodbye to Costly BERT Inference: Turbocharge with AWS Inferentia2 and Hugging Face…

Fine-tune Mixtral 8x7b on AWS SageMaker and Deploy to RunPod

Fine-tune Meta Llama 3.1 models using torchtune on Amazon SageMaker

Best practices for prompt engineering with Meta Llama 3 for Text-to-SQL use cases

Automate mortgage document fraud detection using an ML model and business-defined rules with Amazon Fraud Detector: Part 3

MLOps for IoT Edge Ecosystems: Building an MLOps Environment on AWS

Scaling Thomson Reuters’ language model research with Amazon SageMaker HyperPod

Beyond data: Cloud analytics mastery for business brilliance

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

Implement a custom AutoML job using pre-selected algorithms in Amazon SageMaker Automatic Model Tuning

Train a Large Language Model on a single Amazon SageMaker GPU with Hugging Face and LoRA

MLOps and DevOps: Why Data Makes It Different

Stay Connected