Skip to main content

Scaling LLM/GenAI deployment with NVIDIA Triton on Amazon EKS

Triton is open source inference serving software that simplifies the inference serving process and provides high inference performance. This session explores the synergy of NVIDIA Triton and Amazon EKS for efficient, large-scale machine learning model deployment. We discuss how Triton Inference Server standardizes machine learning inferencing across various deep learning frameworks like PyTorch, ONNX, and TensorRT to streamline GenAI/LLM model deployment.

Level: L400
Speaker: Keita Watanabe, Senior Solutions Architect, Frameworks, AWS