AWS, Data Engineering and Data Preparation

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. Within the data flow, add an Amazon S3 destination node.

Data Preparation

Data Preparation ML ML Data Quality

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape. Let’s delve into the database portfolio from IBM available on AWS. 

AWS

AWS Database ETL AI

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

AWS Machine Learning Blog

JANUARY 6, 2023

The recently published IDC MarketScape: Asia/Pacific (Excluding Japan) AI Life-Cycle Software Tools and Platforms 2022 Vendor Assessment positions AWS in the Leaders category. AWS met the criteria and was evaluated by IDC along with eight other vendors. AWS is positioned in the Leaders category based on current capabilities.

AWS

AWS ML ML Data Preparation

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Organizations are building data-driven applications to guide business decisions, improve agility, and drive innovation. Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services.

SQL

SQL AWS Data Lakes AI

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

This is where the AWS suite of low-code and no-code ML services becomes an essential tool. As a strategic systems integrator with deep ML experience, Deloitte utilizes the no-code and low-code ML tools from AWS to efficiently build and deploy ML models for Deloitte’s clients and for internal assets.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS.

ML

ML ML AWS AI

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

Through ML EBA, experienced AWS ML subject matter experts work side by side with your cross-functional team to provide prescriptive guidance, remove blockers, and build organizational capability for a continued ML adoption. Additionally, AWS can offer financial incentives to help offset the costs for your first ML use case.

ML

ML ML AWS Machine Learning

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

On December 6 th -8 th 2023, the non-profit organization, Tech to the Rescue , in collaboration with AWS, organized the world’s largest Air Quality Hackathon – aimed at tackling one of the world’s most pressing health and environmental challenges, air pollution. Having a human-in-the-loop to validate each data transformation step is optional.

AWS

AWS AI AI Python

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. This same interface is also used for provisioning EMR clusters.

AWS

AWS Clustering Big Data Big Data

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

AWS Machine Learning Blog

JULY 11, 2024

It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. CodeBuild supports a broad selection of git version control sources like AWS CodeCommit , GitHub, and GitLab.

AWS

AWS ML ML Python

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. Across 180 countries, millions of developers and hundreds of thousands of businesses use Twilio to create magical experiences for their customers.

ML

ML ML AWS Machine Learning

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

AWS Machine Learning Blog

MARCH 10, 2023

Starting today, you can connect to Amazon EMR Hive as a big data query engine to bring in large datasets for ML. Aggregating and preparing large amounts of data is a critical part of ML workflow. Solution overview With SageMaker Studio setups, data professionals can quickly identify and connect to existing EMR clusters.

Clustering

Clustering AWS ML ML

Migrating From AWS Redshift to Snowflake: 2 Methods to Explore

phData

FEBRUARY 9, 2023

Welcome to our AWS Redshift to the Snowflake Data Cloud migration blog! In this blog, we’ll walk you through the process of migrating your data from AWS Redshift to the Snowflake Data Cloud. As an experienced data engineering consulting company, phData has helped with numerous migrations to Snowflake.

AWS

AWS ETL Data Preparation SQL

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

For example, you might have acquired a company that was already running on a different cloud provider, or you may have a workload that generates value from unique capabilities provided by AWS. We show how you can build and train an ML model in AWS and deploy the model in another platform.

ML

ML ML Azure AWS

Experience the new and improved Amazon SageMaker Studio

AWS Machine Learning Blog

DECEMBER 1, 2023

Launched in 2019, Amazon SageMaker Studio provides one place for all end-to-end machine learning (ML) workflows, from data preparation, building and experimentation, training, hosting, and monitoring. Lauren Mullennex is a Senior AI/ML Specialist Solutions Architect at AWS. In his spare time, he loves traveling and writing.

ML

ML ML Machine Learning Machine Learning

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

AWS Machine Learning Blog

FEBRUARY 22, 2023

Boomi funded this solution using the AWS PE ML FastStart program, a customer enablement program meant to take ML-enabled solutions from idea to production in a matter of weeks. Boomi’s ML and data engineering teams needed the solution to be deployed quickly, in a repeatable and consistent way, at scale.

AWS

AWS ML ML Data Science

Harnessing Machine Learning on Big Data with PySpark on AWS

ODSC - Open Data Science

AUGUST 9, 2023

Be sure to check out his talk, “ Build Classification and Regression Models with Spark on AWS ,” there! In the unceasingly dynamic arena of data science, discerning and applying the right instruments can significantly shape the outcomes of your machine learning initiatives. A cordial greeting to all data science enthusiasts!

Machine Learning

Machine Learning Machine Learning AWS Big Data

What is MLOps

Towards AI

AUGUST 16, 2023

Thus, MLOps is the intersection of Machine Learning, DevOps, and Data Engineering (Figure 1). Any competent software engineer can learn how to use a particular MLOps platform since it does not require an advanced degree. The ideal MLOps engineer would have some experience with several MLOps and/or DevOps platforms.

Machine Learning

Machine Learning Machine Learning ML ML

How Vericast optimized feature engineering using Amazon SageMaker Processing

AWS Machine Learning Blog

MAY 3, 2023

This includes gathering, exploring, and understanding the business and technical aspects of the data, along with evaluation of any manipulations that may be needed for the model building process. One aspect of this data preparation is feature engineering.

AWS

AWS Machine Learning Machine Learning ML

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

We use a test data preparation notebook as part of this step, which is a dependency for the fine-tuning and batch inference step. When fine-tuning is complete, this notebook is run using run magic and prepares a test dataset for sample inference with the fine-tuned model. train sst2.train train sst2.train

ML

ML ML Data Scientist Python

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Flipboard

MARCH 7, 2023

Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity. You can manage app images via the SageMaker console, the AWS SDK for Python (Boto3), and the AWS Command Line Interface (AWS CLI). Environments without internet access.

Python

Python AWS ML ML

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities.

AI

AI AI Machine Learning Machine Learning

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Consequently, AIOps is designed to harness data and insight generation capabilities to help organizations manage increasingly complex IT stacks. MLOps prioritizes end-to-end management of machine learning models, encompassing data preparation, model training, hyperparameter tuning and validation.

Big Data

Big Data Big Data ML ML

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Below, we explore five popular data transformation tools, providing an overview of their features, use cases, strengths, and limitations. Apache Nifi Apache Nifi is an open-source data integration tool that automates system data flow. AWS Glue AWS Glue is a fully managed ETL service provided by Amazon Web Services.

Data Quality

Data Quality AWS Machine Learning Machine Learning

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Trigger Tweets Batch Inference Job: Define and trigger a Batch inference job with S3 input and output paths, data type, and inference job resources like instance type and instance count. Prerequisites Create an AWS EC2 instance with ubuntu AMI, for example, ml.m5.xlarge

Data Pipeline

Data Pipeline ML ML AWS

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Tools like Apache NiFi, Talend, and Informatica provide user-friendly interfaces for designing workflows, integrating diverse data sources, and executing ETL processes efficiently. Choosing the right tool based on the organisation’s specific needs, such as data volume and complexity, is vital for optimising ETL efficiency.

ETL

ETL Data Warehouse Data Quality Data Governance

How to choose the best AI platform

IBM Journey to AI blog

OCTOBER 20, 2023

Major cloud infrastructure providers such as IBM, Amazon AWS, Microsoft Azure and Google Cloud have expanded the market by adding AI platforms to their offerings. Automated development: With AutoAI , beginners can quickly get started and more advanced data scientists can accelerate experimentation in AI development.

AI

AI AI Machine Learning Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. SageMaker Studio offers built-in algorithms, automated model tuning, and seamless integration with AWS services, making it a powerful platform for developing and deploying machine learning solutions at scale.

Machine Learning

Machine Learning Machine Learning ML ML

Must-Have Prompt Engineering Skills for 2024

ODSC - Open Data Science

JANUARY 29, 2024

Data, Engineering, and Programming Skills Programming Despite the rise of no-code platforms and AI code assistance, programming skills are still essential for training and fine-tuning LLM models, scripting for data processing, and integrating models into applications. Kubernetes: A long-established tool for containerized apps.

Data Science

Data Science Machine Learning Machine Learning Natural Language Processing

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

If you answer “yes” to any of these questions, you will need cloud storage, such as Amazon AWS’s S3, Azure Data Lake Storage or GCP’s Google Storage. Knowing this, you want to have data prepared in a way to optimize your load. It might be tempting to have massive files and let the system sort it out.

Clustering

Clustering Database SQL Data Pipeline

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. AWS CodeBuild is a fully managed continuous integration service in the cloud.

AWS

AWS ML ML Machine Learning

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

AWS Machine Learning Blog

JANUARY 26, 2024

We also discuss common security concerns that can undermine trust in AI, as identified by the Open Worldwide Application Security Project (OWASP) Top 10 for LLM Applications , and show ways you can use AWS to increase your security posture and confidence while innovating with generative AI.

AWS

AWS ML ML AI

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

AI

AI AI ML ML

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

AWS Machine Learning Blog

MAY 23, 2023

We finish with a case study highlighting the benefits realize by a large AWS and PwC customer who implemented this solution. Solution overview AWS offers a comprehensive portfolio of cloud-native services for developing and running MLOps pipelines in a scalable and sustainable manner. The following diagram illustrates the workflow.

Machine Learning

Machine Learning Machine Learning AWS ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

In addition to its groundbreaking AI innovations, Zeta Global has harnessed Amazon Elastic Container Service (Amazon ECS) with AWS Fargate to deploy a multitude of smaller models efficiently. It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines.

AWS

AWS Machine Learning Machine Learning ML

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

With over 50 connectors, an intuitive Chat for data prep interface, and petabyte support, SageMaker Canvas provides a scalable, low-code/no-code (LCNC) ML solution for handling real-world, enterprise use cases. Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data.

ML

ML ML Data Preparation AWS

Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio

AWS Machine Learning Blog

APRIL 3, 2024

SageMaker Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity. Amazon SageMaker Canvas is a powerful no-code ML tool designed for business and data teams to generate accurate predictions without writing code or having extensive ML experience.

Machine Learning

Machine Learning Machine Learning ML ML

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

Enterprise data architects, data engineers, and business leaders from around the globe gathered in New York last week for the 3-day Strata Data Conference , which featured new technologies, innovations, and many collaborative ideas. 2) When data becomes information, many (incremental) use cases surface.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

Deploy RAG applications on Amazon SageMaker JumpStart using FAISS

AWS Machine Learning Blog

DECEMBER 5, 2024

RAG applications on AWS RAG models have proven useful for grounding language generation in external knowledge sources. This configuration might need to change depending on the RAG solution you are working with and the amount of data you will have on the file system itself. For IAM role , choose Create a new role.

AWS

AWS ML ML Machine Learning

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

AWS Machine Learning Blog

NOVEMBER 13, 2024

This required custom integration efforts, along with complex AWS Identity and Access Management (IAM) policy management, further complicating the model governance process. With the integration of SageMaker and Amazon DataZone, it enables collaboration between ML builders and data engineers for building ML use cases.

ML

ML ML AWS Data Preparation

Import data from Google Cloud Platform BigQuery for no-code machine learning with Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 28, 2024

This minimizes the complexity and overhead associated with moving data between cloud environments, enabling organizations to access and utilize their disparate data assets for ML projects. You can use SageMaker Canvas to build the initial data preparation routine and generate accurate predictions without writing code.

Machine Learning

Machine Learning Machine Learning ML ML

Accelerate data preparation for ML in Amazon SageMaker Canvas

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Trending Sources

Tackling AI’s data challenges with IBM databases on AWS

Webinars

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Deliver your first ML use case in 8–12 weeks

Improving air quality with generative AI

Discover the Most Important Fundamentals of Data Engineering

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive

Migrating From AWS Redshift to Snowflake: 2 Methods to Explore

Train and deploy ML models in a multicloud environment using Amazon SageMaker

Experience the new and improved Amazon SageMaker Studio

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

Harnessing Machine Learning on Big Data with PySpark on AWS

What is MLOps

How Vericast optimized feature engineering using Amazon SageMaker Processing

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Exploring the AI and data capabilities of watsonx

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Popular Data Transformation Tools: Importance and Best Practices

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Maximising Efficiency with ETL Data: Future Trends and Best Practices

How to choose the best AI platform

MLOps Landscape in 2023: Top Tools and Platforms

Must-Have Prompt Engineering Skills for 2024

Getting Started With Snowflake: Best Practices For Launching

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio

3 Major Trends at Strata New York 2017

Deploy RAG applications on Amazon SageMaker JumpStart using FAISS

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

Import data from Google Cloud Platform BigQuery for no-code machine learning with Amazon SageMaker Canvas

Stay Connected