How to Build a Model Deployment Pipeline with Terraform and AWS

Why Terraform for Model Deployment

Clicking through the AWS console to deploy a SageMaker endpoint works once. The second time, you forget a setting. The third time, someone else on the team does it differently. Terraform fixes this. You define your SageMaker model, endpoint configuration, and endpoint as code, check it into git, and run terraform apply. Every deployment is identical.

The workflow looks like this: package your model artifact into a tar.gz, upload it to S3, then point Terraform at that artifact. Terraform creates the IAM role, SageMaker model, endpoint config, and endpoint. Tear it all down with terraform destroy when you’re done.

You’ll need Terraform installed (v1.5+), an AWS account, and the AWS CLI configured. Install Terraform if you haven’t:

1
2
3
4
5
6
7
8
# macOS
brew install terraform

# Linux (amd64)
wget https://releases.hashicorp.com/terraform/1.7.4/terraform_1.7.4_linux_amd64.zip
unzip terraform_1.7.4_linux_amd64.zip
sudo mv terraform /usr/local/bin/
terraform --version

Package and Upload the Model Artifact

SageMaker expects a model.tar.gz file in S3. This archive contains your serialized model and any inference code. Here’s a Python script that packages a scikit-learn model and uploads it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# package_and_upload.py
import os
import tarfile
import tempfile

import boto3
import joblib
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

# Train a quick model (replace with your actual trained model)
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

# Save the model
model_dir = tempfile.mkdtemp()
model_path = os.path.join(model_dir, "model.joblib")
joblib.dump(model, model_path)

# Write the inference script SageMaker needs
inference_code = '''
import joblib
import numpy as np

def model_fn(model_dir):
    return joblib.load(f"{model_dir}/model.joblib")

def input_fn(request_body, request_content_type):
    if request_content_type == "application/json":
        import json
        data = json.loads(request_body)
        return np.array(data["instances"])
    raise ValueError(f"Unsupported content type: {request_content_type}")

def predict_fn(input_data, model):
    return model.predict(input_data).tolist()

def output_fn(prediction, accept):
    import json
    return json.dumps({"predictions": prediction}), "application/json"
'''

inference_path = os.path.join(model_dir, "inference.py")
with open(inference_path, "w") as f:
    f.write(inference_code)

# Create tar.gz
tarball_path = os.path.join(model_dir, "model.tar.gz")
with tarfile.open(tarball_path, "w:gz") as tar:
    tar.add(model_path, arcname="model.joblib")
    tar.add(inference_path, arcname="inference.py")

# Upload to S3
s3 = boto3.client("s3")
bucket = "my-sagemaker-models-bucket"
s3_key = "models/iris-rf/v1/model.tar.gz"
s3.upload_file(tarball_path, bucket, s3_key)
print(f"Uploaded to s3://{bucket}/{s3_key}")

Run it:

1
2
pip install boto3 scikit-learn joblib
python package_and_upload.py

The Terraform Configuration

Create a directory for your Terraform project. You need four resources: an IAM role, a SageMaker model, an endpoint configuration, and the endpoint itself.

Provider and Variables

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# main.tf
terraform {
  required_version = ">= 1.5"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

variable "aws_region" {
  default = "us-east-1"
}

variable "model_name" {
  default = "iris-rf"
}

variable "model_version" {
  default = "v1"
}

variable "s3_bucket" {
  default = "my-sagemaker-models-bucket"
}

variable "instance_type" {
  default = "ml.m5.large"
}

IAM Role for SageMaker

SageMaker needs an execution role with permissions to pull model artifacts from S3 and write logs to CloudWatch:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# iam.tf
data "aws_iam_policy_document" "sagemaker_assume_role" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = ["sagemaker.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "sagemaker_execution" {
  name               = "${var.model_name}-sagemaker-role"
  assume_role_policy = data.aws_iam_policy_document.sagemaker_assume_role.json
}

resource "aws_iam_role_policy_attachment" "sagemaker_full_access" {
  role       = aws_iam_role.sagemaker_execution.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
}

resource "aws_iam_role_policy_attachment" "s3_read_access" {
  role       = aws_iam_role.sagemaker_execution.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
}

SageMaker Model, Endpoint Config, and Endpoint

This is the core. The model resource points at your S3 artifact and the container image. The endpoint config sets instance type and count. The endpoint ties it all together.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# sagemaker.tf
data "aws_sagemaker_prebuilt_ecr_image" "sklearn" {
  repository_name = "sagemaker-scikit-learn"
  image_tag       = "1.2-1"
}

resource "aws_sagemaker_model" "model" {
  name               = "${var.model_name}-${var.model_version}"
  execution_role_arn = aws_iam_role.sagemaker_execution.arn

  primary_container {
    image          = data.aws_sagemaker_prebuilt_ecr_image.sklearn.registry_path
    model_data_url = "s3://${var.s3_bucket}/models/${var.model_name}/${var.model_version}/model.tar.gz"
    environment = {
      SAGEMAKER_PROGRAM = "inference.py"
    }
  }

  tags = {
    Project = var.model_name
    Version = var.model_version
  }
}

resource "aws_sagemaker_endpoint_configuration" "config" {
  name = "${var.model_name}-${var.model_version}-config"

  production_variants {
    variant_name           = "primary"
    model_name             = aws_sagemaker_model.model.name
    initial_instance_count = 1
    instance_type          = var.instance_type
    initial_variant_weight = 1.0
  }

  tags = {
    Project = var.model_name
    Version = var.model_version
  }
}

resource "aws_sagemaker_endpoint" "endpoint" {
  name                 = "${var.model_name}-endpoint"
  endpoint_config_name = aws_sagemaker_endpoint_configuration.config.name

  tags = {
    Project = var.model_name
    Version = var.model_version
  }
}

output "endpoint_name" {
  value = aws_sagemaker_endpoint.endpoint.name
}

output "endpoint_arn" {
  value = aws_sagemaker_endpoint.endpoint.arn
}

Autoscaling

You probably want autoscaling on the endpoint so it handles traffic spikes without you manually adjusting instance counts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# autoscaling.tf
resource "aws_appautoscaling_target" "sagemaker" {
  max_capacity       = 4
  min_capacity       = 1
  resource_id        = "endpoint/${aws_sagemaker_endpoint.endpoint.name}/variant/primary"
  scalable_dimension = "sagemaker:variant:DesiredInstanceCount"
  service_namespace  = "sagemaker"
}

resource "aws_appautoscaling_policy" "sagemaker_scaling" {
  name               = "${var.model_name}-scaling-policy"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.sagemaker.resource_id
  scalable_dimension = aws_appautoscaling_target.sagemaker.scalable_dimension
  service_namespace  = aws_appautoscaling_target.sagemaker.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "SageMakerVariantInvocationsPerInstance"
    }
    target_value       = 750.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

The target_value of 750 means the policy tries to keep each instance at around 750 invocations per minute. Adjust based on your model’s latency. A model that takes 200ms per request can handle fewer concurrent invocations than one that takes 10ms.

Deploy the Endpoint

1
2
3
4
5
6
7
8
# Initialize Terraform and download the AWS provider
terraform init

# Preview what will be created
terraform plan -out=tfplan

# Apply the plan
terraform apply tfplan

SageMaker endpoint creation takes 5-10 minutes. Terraform will wait for it to complete. Once it’s up, test it with the AWS CLI:

1
2
3
4
5
aws sagemaker-runtime invoke-endpoint \
  --endpoint-name iris-rf-endpoint \
  --content-type application/json \
  --body '{"instances": [[5.1, 3.5, 1.4, 0.2]]}' \
  /dev/stdout

You should get back {"predictions": [0]} since those are measurements for an Iris setosa.

Updating the Model

When you train a new model version, the workflow is:

Package and upload the new artifact to S3 under a new version key
Update model_version in your Terraform variables
Run terraform plan to see the diff
Run terraform apply to deploy

Terraform will create a new SageMaker model and endpoint config, then update the endpoint to point to the new config. SageMaker handles the rolling update internally, so there’s no downtime.

1
terraform apply -var="model_version=v2"

To tear everything down:

1
terraform destroy

Common Errors and Fixes

ClientError: Could not assume role – The SageMaker execution role doesn’t exist yet or the trust policy is wrong. Make sure sagemaker.amazonaws.com is in the Principal of the assume role policy. If you created the role outside Terraform, verify it matches exactly.

ModelError: Received server error (503) from model container – Your inference.py is crashing inside the container. The most common cause: you forgot to include a dependency in the container image. The prebuilt scikit-learn image has sklearn, numpy, pandas, and joblib. If your model needs something else, you need a custom container.

ResourceLimitExceeded: The account-level service limit 'ml.m5.large for endpoint usage' is X – AWS has default quotas on SageMaker instance types. Go to Service Quotas in the AWS console and request an increase. For new accounts, the default is often 0 for GPU instances.

Terraform wants to recreate the endpoint on every apply – This happens when you include a timestamp or random value in a resource name. Use deterministic naming based on model_name and model_version like the config above.

Error: creating SageMaker Model: ValidationException: Could not find model data – Double-check your S3 path. The model_data_url must point to the exact model.tar.gz file, not just a prefix. Also confirm the SageMaker execution role has s3:GetObject permission on that bucket.

Endpoint stuck in Updating state – SageMaker endpoint updates can take 10+ minutes. If it’s been over 20 minutes, check CloudWatch logs for the endpoint. There might be a container startup failure that SageMaker is retrying.

Why Terraform for Model Deployment#

Package and Upload the Model Artifact#

The Terraform Configuration#

Provider and Variables#

IAM Role for SageMaker#

SageMaker Model, Endpoint Config, and Endpoint#

Autoscaling#

Deploy the Endpoint#

Updating the Model#

Common Errors and Fixes#

Related Guides#

About the Author