Scaling LLM/GenAI deployment with NVIDIA Triton on Amazon EKS

Triton is open source inference serving software that simplifies the inference serving process and provides high inference performance. This session explores the synergy of NVIDIA Triton and Amazon EKS for efficient, large-scale machine learning model deployment. We discuss how Triton Inference Server standardizes machine learning inferencing across various deep learning frameworks like PyTorch, ONNX, and TensorRT to streamline GenAI/LLM model deployment.

Level: L400
Speaker: Keita Watanabe, Senior Solutions Architect, Frameworks, AWS

Scaling LLM/GenAI deployment with NVIDIA Triton on Amazon EKS

Serverless Developer Experience Workshop

Next generation Authz with Cedar

Leveraging OpenSearch for Security Analytics

Supercharging your data pipeline: Real-time streaming with Amazon Kinesis and Amazon MSK

AWS DeepRacer League Workshop Webinar

Accelerate your Serverless outcomes with messaging and event based architecture

What's New In Serverless

Episode 4: Building end-to-end stream processing pipeline

Track and Analyse User Click Data with Amazon Kinesis, Athena & Quicksight

Episode 3: Serverless stream processing with SQL

Episode 2: Stream data processing with Kinesis Data Firehose and AWS Lambda

Episode 1: Real-time data processing core concepts and create your first data stream

Hey Operator Using SageMaker with EKS for Machine Learning Pipelines

Dev Connect - Applying chaos engineering principles for building fault-tolerant applications

Developer Day 1 - Boost Your Developer Speed

Developer Day 1 - Building on a Budget: How to keep your costs down while moving fast

Developer Day 1 - Modern Application Observability

Developer Day 1 - Event sourcing with Amazon Managed Streaming for Apache Kafka

Developer Day 2 - MLOps: Automation with Machine Learning

Developer Day 2 - Supercharge your applications with AWS Amplify