Large model files kill download speeds when you serve them from a single region. A 2GB ONNX model sitting in us-east-1 takes forever to pull from Tokyo or Mumbai. The fix: put CloudFront in front of S3 and let edge caches do the heavy lifting. Here’s the quickest way to get a working setup with boto3:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| import boto3
s3 = boto3.client("s3", region_name="us-east-1")
bucket_name = "my-model-artifacts-cdn"
s3.create_bucket(Bucket=bucket_name)
s3.upload_file(
Filename="models/resnet50.onnx",
Bucket=bucket_name,
Key="v1/resnet50.onnx",
ExtraArgs={"ContentType": "application/octet-stream"},
)
print(f"Uploaded to s3://{bucket_name}/v1/resnet50.onnx")
|
That gives you the storage layer. Now you need a CDN in front of it.
Set Up the S3 Bucket for Model Artifacts#
Before creating the CloudFront distribution, configure the bucket properly. You want versioning enabled so you can roll back bad model uploads, and you want to block all public access since CloudFront will handle access control through Origin Access Control (OAC).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
| import boto3
import json
s3 = boto3.client("s3", region_name="us-east-1")
bucket_name = "my-model-artifacts-cdn"
# Enable versioning for rollback capability
s3.put_bucket_versioning(
Bucket=bucket_name,
VersioningConfiguration={"Status": "Enabled"},
)
# Block all public access -- CloudFront handles this
s3.put_public_access_block(
Bucket=bucket_name,
PublicAccessBlockConfiguration={
"BlockPublicAcls": True,
"IgnorePublicAcls": True,
"BlockPublicPolicy": True,
"RestrictPublicBuckets": True,
},
)
# Upload a model with metadata tags
s3.upload_file(
Filename="models/bert-base.onnx",
Bucket=bucket_name,
Key="v2/bert-base.onnx",
ExtraArgs={
"ContentType": "application/octet-stream",
"Metadata": {
"model-version": "2.0.1",
"framework": "onnx",
"task": "text-classification",
},
},
)
# Set a lifecycle rule to move old versions to Glacier after 30 days
lifecycle_config = {
"Rules": [
{
"ID": "archive-old-model-versions",
"Status": "Enabled",
"Filter": {"Prefix": ""},
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 30,
"StorageClass": "GLACIER",
}
],
}
]
}
s3.put_bucket_lifecycle_configuration(
Bucket=bucket_name,
LifecycleConfiguration=lifecycle_config,
)
print("Bucket configured with versioning, public access block, and lifecycle rules")
|
The metadata tags are useful for tracking which model version and framework each artifact belongs to. You can query these later without downloading the file.
Create the CloudFront Distribution#
This is the core piece. You create an Origin Access Control, then a CloudFront distribution that points to your S3 bucket. After the distribution is created, you attach a bucket policy that grants CloudFront read access.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
| import boto3
import time
cf = boto3.client("cloudfront", region_name="us-east-1")
s3 = boto3.client("s3", region_name="us-east-1")
bucket_name = "my-model-artifacts-cdn"
s3_origin = f"{bucket_name}.s3.us-east-1.amazonaws.com"
# Create Origin Access Control for S3
oac_response = cf.create_origin_access_control(
OriginAccessControlConfig={
"Name": "model-cdn-oac",
"Description": "OAC for model artifact CDN",
"SigningProtocol": "sigv4",
"SigningBehavior": "always",
"OriginAccessControlOriginType": "s3",
}
)
oac_id = oac_response["OriginAccessControl"]["Id"]
print(f"Created OAC: {oac_id}")
# Create the CloudFront distribution
caller_ref = str(int(time.time()))
dist_config = {
"CallerReference": caller_ref,
"Comment": "Model artifact CDN",
"DefaultCacheBehavior": {
"TargetOriginId": "model-s3-origin",
"ViewerProtocolPolicy": "redirect-to-https",
"AllowedMethods": {
"Quantity": 2,
"Items": ["GET", "HEAD"],
"CachedMethods": {"Quantity": 2, "Items": ["GET", "HEAD"]},
},
"CachePolicyId": "658327ea-f89d-4fab-a63d-7e88639e58f6", # CachingOptimized
"Compress": True,
"TrustedKeyGroups": {"Enabled": False, "Quantity": 0},
"ForwardedValues": None,
},
"Origins": {
"Quantity": 1,
"Items": [
{
"Id": "model-s3-origin",
"DomainName": s3_origin,
"S3OriginConfig": {"OriginAccessIdentity": ""},
"OriginAccessControlId": oac_id,
}
],
},
"Enabled": True,
"PriceClass": "PriceClass_100", # US, Canada, Europe -- cheapest tier
"HttpVersion": "http2and3",
}
dist_response = cf.create_distribution(DistributionConfig=dist_config)
dist_id = dist_response["Distribution"]["Id"]
dist_domain = dist_response["Distribution"]["DomainName"]
dist_arn = dist_response["Distribution"]["ARN"]
print(f"Distribution ID: {dist_id}")
print(f"Domain: {dist_domain}")
# Attach bucket policy granting CloudFront access
bucket_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCloudFrontServicePrincipal",
"Effect": "Allow",
"Principal": {"Service": "cloudfront.amazonaws.com"},
"Action": "s3:GetObject",
"Resource": f"arn:aws:s3:::{bucket_name}/*",
"Condition": {
"StringEquals": {"AWS:SourceArn": dist_arn}
},
}
],
}
import json
s3.put_bucket_policy(
Bucket=bucket_name,
Policy=json.dumps(bucket_policy),
)
print("Bucket policy attached -- CloudFront can now read from S3")
|
A few things worth noting. PriceClass_100 keeps costs low by only using edge locations in North America and Europe. If your team pulls models from Asia-Pacific, switch to PriceClass_200 or PriceClass_All. The CachingOptimized managed cache policy (that long ID string) sets a default TTL of 24 hours, which works well for model artifacts that change infrequently.
Generate Signed URLs for Secure Model Downloads#
You don’t want anyone with the CloudFront domain to download your proprietary models. Signed URLs restrict access to authorized consumers. You need a CloudFront key pair – generate an RSA key, upload the public key to CloudFront, create a key group, then use the private key to sign URLs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
| import boto3
import datetime
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import padding
from botocore.signers import CloudFrontSigner
cf = boto3.client("cloudfront", region_name="us-east-1")
# Step 1: Generate RSA key pair (do this once, store the private key securely)
from cryptography.hazmat.primitives.asymmetric import rsa
private_key = rsa.generate_private_key(public_exponent=65537, key_size=2048)
# Save private key to file (store in Secrets Manager in production)
private_pem = private_key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.TraditionalOpenSSL,
encryption_algorithm=serialization.NoEncryption(),
)
with open("cf_private_key.pem", "wb") as f:
f.write(private_pem)
# Get public key in PEM format
public_pem = private_key.public_key().public_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PublicFormat.SubjectPublicKeyInfo,
)
# Step 2: Upload public key to CloudFront
pub_key_response = cf.create_public_key(
PublicKeyConfig={
"CallerReference": "model-cdn-key-1",
"Name": "model-cdn-signing-key",
"EncodedKey": public_pem.decode("utf-8"),
}
)
public_key_id = pub_key_response["PublicKey"]["Id"]
# Step 3: Create a key group
key_group_response = cf.create_key_group(
KeyGroupConfig={
"Name": "model-cdn-key-group",
"Items": [public_key_id],
}
)
key_group_id = key_group_response["KeyGroup"]["Id"]
print(f"Key group ID: {key_group_id}")
print("Update your distribution's TrustedKeyGroups with this ID")
|
Once the key group is attached to your distribution’s cache behavior, generate signed URLs like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| import datetime
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import padding
from botocore.signers import CloudFrontSigner
# Load the private key
with open("cf_private_key.pem", "rb") as f:
private_key = serialization.load_pem_private_key(f.read(), password=None)
# Define the RSA signer function
def rsa_signer(message):
return private_key.sign(message, padding.PKCS1v15(), hashes.SHA1())
# Create the CloudFront signer
cf_signer = CloudFrontSigner(public_key_id, rsa_signer)
# Generate a signed URL valid for 1 hour
dist_domain = "d1234abcdef.cloudfront.net" # replace with your distribution domain
model_key = "v2/bert-base.onnx"
signed_url = cf_signer.generate_presigned_url(
url=f"https://{dist_domain}/{model_key}",
date_less_than=datetime.datetime.utcnow() + datetime.timedelta(hours=1),
)
print(f"Signed URL:\n{signed_url}")
|
Your model loading code on inference servers can fetch from this signed URL. The URL expires after an hour, so rotate it in your deployment scripts.
Automate Upload and Cache Invalidation#
When you push a new model version to S3, CloudFront still serves the cached old version until the TTL expires. You need to invalidate the cache. Here’s a script that uploads a new model artifact and immediately invalidates the CloudFront cache for that path:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
| import boto3
import time
import os
s3 = boto3.client("s3", region_name="us-east-1")
cf = boto3.client("cloudfront", region_name="us-east-1")
bucket_name = "my-model-artifacts-cdn"
distribution_id = "E1A2B3C4D5E6F7" # replace with your distribution ID
def deploy_model(local_path: str, s3_key: str) -> str:
"""Upload model to S3 and invalidate CloudFront cache."""
file_size_mb = os.path.getsize(local_path) / (1024 * 1024)
print(f"Uploading {local_path} ({file_size_mb:.1f} MB) to s3://{bucket_name}/{s3_key}")
# Multipart upload for large files (boto3 handles this automatically)
from boto3.s3.transfer import TransferConfig
transfer_config = TransferConfig(
multipart_threshold=50 * 1024 * 1024, # 50 MB
multipart_chunksize=50 * 1024 * 1024,
max_concurrency=10,
)
s3.upload_file(
Filename=local_path,
Bucket=bucket_name,
Key=s3_key,
Config=transfer_config,
ExtraArgs={"ContentType": "application/octet-stream"},
)
print(f"Upload complete: s3://{bucket_name}/{s3_key}")
# Invalidate CloudFront cache for this specific path
invalidation = cf.create_invalidation(
DistributionId=distribution_id,
InvalidationBatch={
"Paths": {"Quantity": 1, "Items": [f"/{s3_key}"]},
"CallerReference": str(int(time.time())),
},
)
inv_id = invalidation["Invalidation"]["Id"]
print(f"Cache invalidation started: {inv_id}")
# Wait for invalidation to complete
waiter = cf.get_waiter("invalidation_completed")
waiter.wait(
DistributionId=distribution_id,
Id=inv_id,
WaiterConfig={"Delay": 10, "MaxAttempts": 30},
)
print("Cache invalidation complete")
return inv_id
# Deploy a new model version
deploy_model("models/resnet50-v3.onnx", "v3/resnet50.onnx")
|
The TransferConfig matters for large model files. A 2GB model upload without multipart chunking will fail or take forever. The 50MB chunk size with 10 concurrent threads gives you decent throughput on most connections.
For batch deployments, invalidate multiple paths at once to avoid per-invalidation charges (the first 1,000 paths per month are free):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| def batch_invalidate(distribution_id: str, paths: list[str]) -> str:
"""Invalidate multiple CloudFront paths in a single request."""
cf = boto3.client("cloudfront", region_name="us-east-1")
invalidation = cf.create_invalidation(
DistributionId=distribution_id,
InvalidationBatch={
"Paths": {
"Quantity": len(paths),
"Items": [f"/{p}" if not p.startswith("/") else p for p in paths],
},
"CallerReference": str(int(time.time())),
},
)
inv_id = invalidation["Invalidation"]["Id"]
print(f"Batch invalidation for {len(paths)} paths: {inv_id}")
return inv_id
# Invalidate all version 2 artifacts at once
batch_invalidate("E1A2B3C4D5E6F7", [
"v2/bert-base.onnx",
"v2/resnet50.onnx",
"v2/yolov8.onnx",
])
|
Common Errors and Fixes#
AccessDenied when fetching from CloudFront
This almost always means the bucket policy is wrong or missing. Double-check that the policy’s AWS:SourceArn matches your distribution’s ARN exactly. Also verify the OAC is attached to the origin in your distribution config. Run this to debug:
1
2
| aws cloudfront get-distribution --id E1A2B3C4D5E6F7 \
--query "Distribution.DistributionConfig.Origins.Items[0].OriginAccessControlId"
|
If it returns an empty string, the OAC isn’t attached.
NoSuchKey errors after uploading
S3 key names are case-sensitive. If you uploaded v2/Bert-Base.onnx but your CloudFront URL requests v2/bert-base.onnx, you’ll get this error. Standardize your key naming – always use lowercase with hyphens.
Stale model served after upload
You uploaded a new model but CloudFront keeps serving the old one. Either you forgot to invalidate, or the invalidation hasn’t propagated yet. Invalidations typically take 1-5 minutes. If you need instant freshness, version your S3 keys (e.g., v3/resnet50.onnx instead of overwriting latest/resnet50.onnx). Versioned keys never serve stale content because the URL changes.
TooManyInvalidations throttle error
CloudFront limits you to 3,000 in-progress invalidation paths. If you’re deploying many models in a loop, batch your paths into a single create_invalidation call. You can include up to 3,000 paths per request.
Signed URL returns 403 Forbidden
Check three things: (1) the key group is attached to the distribution’s cache behavior with TrustedKeyGroups, (2) the public key ID in your signer matches the one in the key group, and (3) the URL hasn’t expired. Print date_less_than and compare it to the current UTC time.
Multipart upload fails midway
Large model uploads over spotty connections will fail. boto3’s TransferConfig retries individual parts, but if the whole upload fails, you’ll have orphaned parts in S3 eating storage. Add a lifecycle rule to clean them up:
1
2
3
4
5
6
7
8
9
10
11
12
13
| s3.put_bucket_lifecycle_configuration(
Bucket="my-model-artifacts-cdn",
LifecycleConfiguration={
"Rules": [
{
"ID": "abort-incomplete-multipart",
"Status": "Enabled",
"Filter": {"Prefix": ""},
"AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 1},
}
]
},
)
|
This auto-cleans abandoned multipart uploads after 24 hours.