Clicking through the AWS console to deploy a SageMaker endpoint works once. The second time, you forget a setting. The third time, someone else on the team does it differently. Terraform fixes this. You define your SageMaker model, endpoint configuration, and endpoint as code, check it into git, and run terraform apply. Every deployment is identical.
The workflow looks like this: package your model artifact into a tar.gz, upload it to S3, then point Terraform at that artifact. Terraform creates the IAM role, SageMaker model, endpoint config, and endpoint. Tear it all down with terraform destroy when you’re done.
You’ll need Terraform installed (v1.5+), an AWS account, and the AWS CLI configured. Install Terraform if you haven’t:
1
2
3
4
5
6
7
8
| # macOS
brew install terraform
# Linux (amd64)
wget https://releases.hashicorp.com/terraform/1.7.4/terraform_1.7.4_linux_amd64.zip
unzip terraform_1.7.4_linux_amd64.zip
sudo mv terraform /usr/local/bin/
terraform --version
|
Package and Upload the Model Artifact#
SageMaker expects a model.tar.gz file in S3. This archive contains your serialized model and any inference code. Here’s a Python script that packages a scikit-learn model and uploads it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
| # package_and_upload.py
import os
import tarfile
import tempfile
import boto3
import joblib
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
# Train a quick model (replace with your actual trained model)
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
# Save the model
model_dir = tempfile.mkdtemp()
model_path = os.path.join(model_dir, "model.joblib")
joblib.dump(model, model_path)
# Write the inference script SageMaker needs
inference_code = '''
import joblib
import numpy as np
def model_fn(model_dir):
return joblib.load(f"{model_dir}/model.joblib")
def input_fn(request_body, request_content_type):
if request_content_type == "application/json":
import json
data = json.loads(request_body)
return np.array(data["instances"])
raise ValueError(f"Unsupported content type: {request_content_type}")
def predict_fn(input_data, model):
return model.predict(input_data).tolist()
def output_fn(prediction, accept):
import json
return json.dumps({"predictions": prediction}), "application/json"
'''
inference_path = os.path.join(model_dir, "inference.py")
with open(inference_path, "w") as f:
f.write(inference_code)
# Create tar.gz
tarball_path = os.path.join(model_dir, "model.tar.gz")
with tarfile.open(tarball_path, "w:gz") as tar:
tar.add(model_path, arcname="model.joblib")
tar.add(inference_path, arcname="inference.py")
# Upload to S3
s3 = boto3.client("s3")
bucket = "my-sagemaker-models-bucket"
s3_key = "models/iris-rf/v1/model.tar.gz"
s3.upload_file(tarball_path, bucket, s3_key)
print(f"Uploaded to s3://{bucket}/{s3_key}")
|
Run it:
1
2
| pip install boto3 scikit-learn joblib
python package_and_upload.py
|
Create a directory for your Terraform project. You need four resources: an IAM role, a SageMaker model, an endpoint configuration, and the endpoint itself.
Provider and Variables#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
| # main.tf
terraform {
required_version = ">= 1.5"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
}
variable "aws_region" {
default = "us-east-1"
}
variable "model_name" {
default = "iris-rf"
}
variable "model_version" {
default = "v1"
}
variable "s3_bucket" {
default = "my-sagemaker-models-bucket"
}
variable "instance_type" {
default = "ml.m5.large"
}
|
IAM Role for SageMaker#
SageMaker needs an execution role with permissions to pull model artifacts from S3 and write logs to CloudWatch:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| # iam.tf
data "aws_iam_policy_document" "sagemaker_assume_role" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["sagemaker.amazonaws.com"]
}
}
}
resource "aws_iam_role" "sagemaker_execution" {
name = "${var.model_name}-sagemaker-role"
assume_role_policy = data.aws_iam_policy_document.sagemaker_assume_role.json
}
resource "aws_iam_role_policy_attachment" "sagemaker_full_access" {
role = aws_iam_role.sagemaker_execution.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
}
resource "aws_iam_role_policy_attachment" "s3_read_access" {
role = aws_iam_role.sagemaker_execution.name
policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
}
|
SageMaker Model, Endpoint Config, and Endpoint#
This is the core. The model resource points at your S3 artifact and the container image. The endpoint config sets instance type and count. The endpoint ties it all together.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
| # sagemaker.tf
data "aws_sagemaker_prebuilt_ecr_image" "sklearn" {
repository_name = "sagemaker-scikit-learn"
image_tag = "1.2-1"
}
resource "aws_sagemaker_model" "model" {
name = "${var.model_name}-${var.model_version}"
execution_role_arn = aws_iam_role.sagemaker_execution.arn
primary_container {
image = data.aws_sagemaker_prebuilt_ecr_image.sklearn.registry_path
model_data_url = "s3://${var.s3_bucket}/models/${var.model_name}/${var.model_version}/model.tar.gz"
environment = {
SAGEMAKER_PROGRAM = "inference.py"
}
}
tags = {
Project = var.model_name
Version = var.model_version
}
}
resource "aws_sagemaker_endpoint_configuration" "config" {
name = "${var.model_name}-${var.model_version}-config"
production_variants {
variant_name = "primary"
model_name = aws_sagemaker_model.model.name
initial_instance_count = 1
instance_type = var.instance_type
initial_variant_weight = 1.0
}
tags = {
Project = var.model_name
Version = var.model_version
}
}
resource "aws_sagemaker_endpoint" "endpoint" {
name = "${var.model_name}-endpoint"
endpoint_config_name = aws_sagemaker_endpoint_configuration.config.name
tags = {
Project = var.model_name
Version = var.model_version
}
}
output "endpoint_name" {
value = aws_sagemaker_endpoint.endpoint.name
}
output "endpoint_arn" {
value = aws_sagemaker_endpoint.endpoint.arn
}
|
Autoscaling#
You probably want autoscaling on the endpoint so it handles traffic spikes without you manually adjusting instance counts:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| # autoscaling.tf
resource "aws_appautoscaling_target" "sagemaker" {
max_capacity = 4
min_capacity = 1
resource_id = "endpoint/${aws_sagemaker_endpoint.endpoint.name}/variant/primary"
scalable_dimension = "sagemaker:variant:DesiredInstanceCount"
service_namespace = "sagemaker"
}
resource "aws_appautoscaling_policy" "sagemaker_scaling" {
name = "${var.model_name}-scaling-policy"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.sagemaker.resource_id
scalable_dimension = aws_appautoscaling_target.sagemaker.scalable_dimension
service_namespace = aws_appautoscaling_target.sagemaker.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "SageMakerVariantInvocationsPerInstance"
}
target_value = 750.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
|
The target_value of 750 means the policy tries to keep each instance at around 750 invocations per minute. Adjust based on your model’s latency. A model that takes 200ms per request can handle fewer concurrent invocations than one that takes 10ms.
Deploy the Endpoint#
1
2
3
4
5
6
7
8
| # Initialize Terraform and download the AWS provider
terraform init
# Preview what will be created
terraform plan -out=tfplan
# Apply the plan
terraform apply tfplan
|
SageMaker endpoint creation takes 5-10 minutes. Terraform will wait for it to complete. Once it’s up, test it with the AWS CLI:
1
2
3
4
5
| aws sagemaker-runtime invoke-endpoint \
--endpoint-name iris-rf-endpoint \
--content-type application/json \
--body '{"instances": [[5.1, 3.5, 1.4, 0.2]]}' \
/dev/stdout
|
You should get back {"predictions": [0]} since those are measurements for an Iris setosa.
Updating the Model#
When you train a new model version, the workflow is:
- Package and upload the new artifact to S3 under a new version key
- Update
model_version in your Terraform variables - Run
terraform plan to see the diff - Run
terraform apply to deploy
Terraform will create a new SageMaker model and endpoint config, then update the endpoint to point to the new config. SageMaker handles the rolling update internally, so there’s no downtime.
1
| terraform apply -var="model_version=v2"
|
To tear everything down:
Common Errors and Fixes#
ClientError: Could not assume role – The SageMaker execution role doesn’t exist yet or the trust policy is wrong. Make sure sagemaker.amazonaws.com is in the Principal of the assume role policy. If you created the role outside Terraform, verify it matches exactly.
ModelError: Received server error (503) from model container – Your inference.py is crashing inside the container. The most common cause: you forgot to include a dependency in the container image. The prebuilt scikit-learn image has sklearn, numpy, pandas, and joblib. If your model needs something else, you need a custom container.
ResourceLimitExceeded: The account-level service limit 'ml.m5.large for endpoint usage' is X – AWS has default quotas on SageMaker instance types. Go to Service Quotas in the AWS console and request an increase. For new accounts, the default is often 0 for GPU instances.
Terraform wants to recreate the endpoint on every apply – This happens when you include a timestamp or random value in a resource name. Use deterministic naming based on model_name and model_version like the config above.
Error: creating SageMaker Model: ValidationException: Could not find model data – Double-check your S3 path. The model_data_url must point to the exact model.tar.gz file, not just a prefix. Also confirm the SageMaker execution role has s3:GetObject permission on that bucket.
Endpoint stuck in Updating state – SageMaker endpoint updates can take 10+ minutes. If it’s been over 20 minutes, check CloudWatch logs for the endpoint. There might be a container startup failure that SageMaker is retrying.