Why This Matters

Deploying inference workloads on Kubernetes-native infrastructure has traditionally been a pain. AI teams spend hours wrestling with Helm charts, IAM role configurations, dependency management, and manual upgrades before a single model can serve predictions. The new Amazon SageMaker HyperPod Inference Operator changes that by offering a native EKS add-on with one-click installation and managed upgrades. This means faster experimentation, reduced complexity, and consistent security configurations.

In this guide, we'll walk through:

  • The three installation methods (SageMaker UI, EKS CLI, Terraform)
  • Key new features (multi-instance type deployment, node affinity)
  • A real deployment example using a DeepSeek model
  • Migration from Helm to the add-on (zero downtime)

Source: AWS Architecture Blog

AWS SageMaker HyperPod cluster dashboard showing inference operator installation status Coding Session Visual

Installation Methods

Method 1: SageMaker Console (Recommended)

The simplest path. Navigate to HyperPod Clusters → Cluster Management, select your cluster, go to the Inference tab, and choose Quick Install or Custom Install. The console automatically creates IAM roles, S3 buckets, VPC endpoints, and dependency add-ons (cert-manager, S3 CSI driver, FSx CSI driver, metrics-server).

Verify installation:

kubectl get pods -n hyperpod-inference-system
aws eks describe-addon --cluster-name CLUSTER-NAME --addon-name amazon-sagemaker-hyperpod-inference --region REGION

Method 2: EKS CLI (For Automation)

If you prefer command-line workflows, install directly via the AWS CLI. Note: All prerequisites (IAM roles, S3 buckets, VPC endpoints, dependency add-ons) must be created manually before running this command.

aws eks create-addon \
  --cluster-name my-hyperpod-cluster \
  --addon-name amazon-sagemaker-hyperpod-inference \
  --addon-version v1.0.0-eksbuild.1 \
  --configuration-values '{
    "executionRoleArn": "arn:aws:iam::ACCOUNT-ID:role/SageMakerHyperPodInference-inference-role",
    "tlsCertificateS3Bucket": "hyperpod-tls-certificate-bucket",
    "hyperpodClusterArn": "arn:aws:sagemaker:REGION:ACCOUNT-ID:cluster/CLUSTER-ID",
    "alb": {
      "serviceAccount": {
        "create": true,
        "roleArn": "arn:aws:iam::ACCOUNT-ID:role/alb-controller-role"
      }
    },
    "keda": {
      "auth": {
        "aws": {
          "irsa": {
            "roleArn": "arn:aws:iam::ACCOUNT-ID:role/keda-operator-role"
          }
        }
      }
    }
  }' \
  --region us-west-2

Method 3: Terraform (Infrastructure as Code)

For organizations using Terraform, the awesome-distributed-training GitHub repo provides modules. Set create_hyperpod_inference_operator_module = true in your custom.tfvars:

kubernetes_version = "1.33"
eks_cluster_name = "tf-eks-cluster"
hyperpod_cluster_name = "tf-hp-cluster"
resource_name_prefix = "tf-eks-test"
aws_region = "us-east-1"
instance_groups = [
  {
    name = "accelerated-instance-group-1"
    instance_type = "ml.g5.8xlarge"
    instance_count = 2
    availability_zone_id = "use1-az2"
    ebs_volume_size_in_gb = 100
    threads_per_core = 1
    enable_stress_check = false
    enable_connectivity_check = false
    lifecycle_script = "on_create.sh"
  }
]
create_hyperpod_inference_operator_module = true

Deploying Your First Model

Once the add-on is installed, deploy a model using a JumpStartModel custom resource. Here's an example for DeepSeek R1 Distill Qwen 1.5B:

apiVersion: inference.sagemaker.aws.amazon.com/v1
kind: JumpStartModel
metadata:
  name: deepseek-test-endpoint
spec:
  model:
    modelId: "deepseek-llm-r1-distill-qwen-1-5b"
  sageMakerEndpoint:
    name: deepseek-test-endpoint
    server:
      instanceType: "ml.g5.8xlarge"

Apply it:

kubectl apply -f deepseek-endpoint.yaml

Kubernetes pods running inference workloads on SageMaker HyperPod with GPU utilization metrics Dev Environment Setup

Advanced Features

Multi-Instance Type Deployment

Specify a prioritized list of instance types. The system automatically falls back to the next available type if the preferred one lacks capacity:

apiVersion: inference.sagemaker.aws.amazon.com/v1
kind: InferenceEndpointConfig
metadata:
  name: lmcache-test-1
  namespace: default
spec:
  replicas: 13
  modelName: Llama-3.1-8B-Instruct
  instanceTypes: ["ml.p4d.24xlarge","ml.g5.24xlarge","ml.g5.8xlarge"]

Node Affinity for Granular Scheduling

Use Kubernetes native nodeAffinity to exclude spot instances, target specific AZs, or pin to custom labels:

apiVersion: inference.sagemaker.aws.amazon.com/v1
kind: InferenceEndpointConfig
metadata:
  name: lmcache-test-1
  namespace: default
spec:
  replicas: 15
  modelName: Llama-3.1-8B-Instruct
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
            - key: node.kubernetes.io/instanceType
              operator: In
              values: ["ml.g5.4xlarge"]
  worker:
    resources:
      limits:
        nvidia.com/gpu: "1"
      requests:
        cpu: "6"
        memory: 30Gi
        nvidia.com/gpu: "1"

Limitations & Caveats

  • Dependency conflicts: If you already have cert-manager or KEDA installed on your cluster, the add-on may conflict. Use the --skip-dependencies flag during migration to avoid duplication.
  • IAM role limits: The automated creation of IAM roles may exceed your account's role limit if you have many clusters. Plan accordingly.
  • TLS certificate bucket: The S3 bucket for TLS certificates must be in the same region as your cluster. Cross-region setups are not supported.

Next Steps

Developer configuring Terraform deployment for HyperPod inference operator add-on IT Technology Image

Conclusion

The SageMaker HyperPod Inference Operator as an EKS add-on eliminates the infrastructure tax that slows down ML teams. With one-click installation, automated resource creation, and managed upgrades, you can go from cluster creation to serving predictions in minutes instead of hours. The integration with advanced features like multi-instance type deployment and node affinity gives you fine-grained control over inference scheduling.

Get started today: Create a new HyperPod cluster with the Inference Operator pre-installed, or add it to an existing cluster with a single click through the SageMaker console. For detailed configuration options, refer to the official guide.

Related Content

This content was drafted using AI tools based on reliable sources, and has been reviewed by our editorial team before publication. It is not intended to replace professional advice.