Accelerate NLP inference with ONNX Runtime on AWS Graviton processors
AWS Machine Learning Blog
MAY 15, 2024
AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, Scalable Vector Extension (SVE), and Matrix Multiplication (MMLA) instructions. In this post, we show how to run ONNX Runtime inference on AWS Graviton3-based EC2 instances and how to configure them to use optimized GEMM kernels.
Let's personalize your content