Scaling LLM/GenAI deployment with NVIDIA Triton on Amazon EKS

Triton is open source inference serving software that simplifies the inference serving process and provides high inference performance. This session explores the synergy of NVIDIA Triton and Amazon EKS for efficient, large-scale machine learning model deployment. We discuss how Triton Inference Server standardizes machine learning inferencing across various deep learning frameworks like PyTorch, ONNX, and TensorRT to streamline GenAI/LLM model deployment.

Level: L400
Speaker: Keita Watanabe, Senior Solutions Architect, Frameworks, AWS

Scaling LLM/GenAI deployment with NVIDIA Triton on Amazon EKS

Next generation Authz with Cedar

Leveraging OpenSearch for Security Analytics

DevSecOps: Uplift your security in CI/CD pipelines

Observability for Developers

Connecting Applications in a Cloud-Native Way

Developer Webinar Series: Harnessing the power of Generative AI

Developer Webinar Series - Build a world-class connected reality environment with AWS Panorama

Developer Webinar Series - Create adversarial attack resistance in your machine learning workflows

Developer Webinar Series - AIoT-powered drones for critical air logistics

Developer Webinar Series: Avoid unwanted surprises with AWS Cost Anomaly Detection

Developer Webinar Series: Build your skills with AWS Training and Certification

Developer Webinar Series: Web to Web3

Responsible AI - How to Develop a Repeatable Framework That Drives Business Value and Culture Change

Customer Contact Center Analytics and Insights Scale the QA Process

Optimising ML Inference on AWS Using Amazon SageMaker

Demystifying Data: The business benefits of improving data maturity in Australia and New Zealand

Enabling business analysts and the line of business to scale the adoption of ML

AWS Smart Business - Jims Mowing

Intelligent Document Processing—Understanding and extracting data to accelerate the application process

Reducing Operational Risk and Platform Scalability with MLOps