Deploy Models in Minutes SageMaker HyperPod Inference Operator One-Click Install

Why This Matters

Deploying inference workloads on Kubernetes-native infrastructure has traditionally been a pain. AI teams spend hours wrestling with Helm charts, IAM role configurations, dependency management, and manual upgrades before a single model can serve predictions. The new Amazon SageMaker HyperPod Inference Operator changes that by offering a native EKS add-on with one-click installation and managed upgrades. This means faster experimentation, reduced complexity, and consistent security configurations.

In this guide, we'll walk through:

The three installation methods (SageMaker UI, EKS CLI, Terraform)
Key new features (multi-instance type deployment, node affinity)
A real deployment example using a DeepSeek model
Migration from Helm to the add-on (zero downtime)

Source: AWS Architecture Blog

AWS SageMaker HyperPod cluster dashboard showing inference operator installation status Coding Session Visual

Installation Methods

Method 1: SageMaker Console (Recommended)

The simplest path. Navigate to HyperPod Clusters → Cluster Management, select your cluster, go to the Inference tab, and choose Quick Install or Custom Install. The console automatically creates IAM roles, S3 buckets, VPC endpoints, and dependency add-ons (cert-manager, S3 CSI driver, FSx CSI driver, metrics-server).

Verify installation:

kubectl get pods -n hyperpod-inference-system
aws eks describe-addon --cluster-name CLUSTER-NAME --addon-name amazon-sagemaker-hyperpod-inference --region REGION

Method 2: EKS CLI (For Automation)

If you prefer command-line workflows, install directly via the AWS CLI. Note: All prerequisites (IAM roles, S3 buckets, VPC endpoints, dependency add-ons) must be created manually before running this command.

aws eks create-addon \
  --cluster-name my-hyperpod-cluster \
  --addon-name amazon-sagemaker-hyperpod-inference \
  --addon-version v1.0.0-eksbuild.1 \
  --configuration-values '{
    "executionRoleArn": "arn:aws:iam::ACCOUNT-ID:role/SageMakerHyperPodInference-inference-role",
    "tlsCertificateS3Bucket": "hyperpod-tls-certificate-bucket",
    "hyperpodClusterArn": "arn:aws:sagemaker:REGION:ACCOUNT-ID:cluster/CLUSTER-ID",
    "alb": {
      "serviceAccount": {
        "create": true,
        "roleArn": "arn:aws:iam::ACCOUNT-ID:role/alb-controller-role"
      }
    },
    "keda": {
      "auth": {
        "aws": {
          "irsa": {
            "roleArn": "arn:aws:iam::ACCOUNT-ID:role/keda-operator-role"
          }
        }
      }
    }
  }' \
  --region us-west-2

Method 3: Terraform (Infrastructure as Code)

For organizations using Terraform, the awesome-distributed-training GitHub repo provides modules. Set create_hyperpod_inference_operator_module = true in your custom.tfvars:

kubernetes_version = "1.33"
eks_cluster_name = "tf-eks-cluster"
hyperpod_cluster_name = "tf-hp-cluster"
resource_name_prefix = "tf-eks-test"
aws_region = "us-east-1"
instance_groups = [
  {
    name = "accelerated-instance-group-1"
    instance_type = "ml.g5.8xlarge"
    instance_count = 2
    availability_zone_id = "use1-az2"
    ebs_volume_size_in_gb = 100
    threads_per_core = 1
    enable_stress_check = false
    enable_connectivity_check = false
    lifecycle_script = "on_create.sh"
  }
]
create_hyperpod_inference_operator_module = true

Deploying Your First Model

Once the add-on is installed, deploy a model using a JumpStartModel custom resource. Here's an example for DeepSeek R1 Distill Qwen 1.5B:

apiVersion: inference.sagemaker.aws.amazon.com/v1
kind: JumpStartModel
metadata:
  name: deepseek-test-endpoint
spec:
  model:
    modelId: "deepseek-llm-r1-distill-qwen-1-5b"
  sageMakerEndpoint:
    name: deepseek-test-endpoint
    server:
      instanceType: "ml.g5.8xlarge"

Apply it:

kubectl apply -f deepseek-endpoint.yaml

Kubernetes pods running inference workloads on SageMaker HyperPod with GPU utilization metrics Dev Environment Setup

Advanced Features

Multi-Instance Type Deployment

Specify a prioritized list of instance types. The system automatically falls back to the next available type if the preferred one lacks capacity:

apiVersion: inference.sagemaker.aws.amazon.com/v1
kind: InferenceEndpointConfig
metadata:
  name: lmcache-test-1
  namespace: default
spec:
  replicas: 13
  modelName: Llama-3.1-8B-Instruct
  instanceTypes: ["ml.p4d.24xlarge","ml.g5.24xlarge","ml.g5.8xlarge"]

Node Affinity for Granular Scheduling

Use Kubernetes native nodeAffinity to exclude spot instances, target specific AZs, or pin to custom labels:

apiVersion: inference.sagemaker.aws.amazon.com/v1
kind: InferenceEndpointConfig
metadata:
  name: lmcache-test-1
  namespace: default
spec:
  replicas: 15
  modelName: Llama-3.1-8B-Instruct
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
            - key: node.kubernetes.io/instanceType
              operator: In
              values: ["ml.g5.4xlarge"]
  worker:
    resources:
      limits:
        nvidia.com/gpu: "1"
      requests:
        cpu: "6"
        memory: 30Gi
        nvidia.com/gpu: "1"

Limitations & Caveats

Dependency conflicts: If you already have cert-manager or KEDA installed on your cluster, the add-on may conflict. Use the --skip-dependencies flag during migration to avoid duplication.
IAM role limits: The automated creation of IAM roles may exceed your account's role limit if you have many clusters. Plan accordingly.
TLS certificate bucket: The S3 bucket for TLS certificates must be in the same region as your cluster. Cross-region setups are not supported.

Next Steps

Explore managed tiered KV cache to reduce inference latency up to 40% for long-context workloads.
Set up HyperPod Observability with Amazon Managed Grafana for real-time metrics.
Check the migration script from Helm to add-on for existing users.

Developer configuring Terraform deployment for HyperPod inference operator add-on IT Technology Image

Conclusion

The SageMaker HyperPod Inference Operator as an EKS add-on eliminates the infrastructure tax that slows down ML teams. With one-click installation, automated resource creation, and managed upgrades, you can go from cluster creation to serving predictions in minutes instead of hours. The integration with advanced features like multi-instance type deployment and node affinity gives you fine-grained control over inference scheduling.

Get started today: Create a new HyperPod cluster with the Inference Operator pre-installed, or add it to an existing cluster with a single click through the SageMaker console. For detailed configuration options, refer to the official guide.

Deploy Models in Minutes SageMaker HyperPod Inference Operator One-Click Install

Why This Matters

Installation Methods

Method 1: SageMaker Console (Recommended)

Method 2: EKS CLI (For Automation)

Method 3: Terraform (Infrastructure as Code)

Deploying Your First Model

Advanced Features

Multi-Instance Type Deployment

Node Affinity for Granular Scheduling

Limitations & Caveats

Next Steps

Conclusion

Related Content

Share this post

Did you find this post helpful?
It helps the author a lot!

Subscribe

RSS / Atom Feed

Real-time Alerts

Comments 0

Why This Matters

Installation Methods

Method 1: SageMaker Console (Recommended)

Method 2: EKS CLI (For Automation)

Method 3: Terraform (Infrastructure as Code)

Deploying Your First Model

Advanced Features

Multi-Instance Type Deployment

Node Affinity for Granular Scheduling

Limitations & Caveats

Next Steps

Conclusion

Related Content

Share this post

Did you find this post helpful?It helps the author a lot!

Subscribe

RSS / Atom Feed

Real-time Alerts

Comments 0

Did you find this post helpful?
It helps the author a lot!