| Field | Type | Default | Description | |-------|------|---------|-------------| | enabled | boolean | false | Enable/disable autoscaling | | min_replicas | number | 1 | Minimum replicas (0 for scale-to-zero) | | max_replicas | number | 10 | Maximum replicas (hard ceiling) | | target_cpu_percent | number | 70 | Target CPU utilization (%) | | target_memory_percent | number | none | Optional memory threshold | | scale_up_cooldown_seconds | number | 45 | Wait time before next scale-up | | scale_down_cooldown_seconds | number | 300 | Wait time before scale-down |

Scaling Strategies

Strategy 1: Conservative (High Availability)

Keep ample headroom for traffic spikes:

YAML

Use Case: Production APIs, critical services

Cost: Higher baseline ($$$)

Strategy 2: Balanced (Standard)

Balance cost and performance:

YAML

Use Case: Most web applications

Cost: Moderate ($$)

Strategy 3: Aggressive (Cost-Optimized)

Minimize replicas, scale reactively:

YAML

Use Case: Dev/staging, batch jobs

Cost: Low ($)

Strategy 4: Scale-to-Zero (Event-Driven)

No baseline cost when idle:

YAML

Use Case: Batch processing, webhooks, scheduled jobs

Cost: Minimal (pay only when running)

Note: Cold starts add 5-15 seconds latency.

CPU-Based Scaling

How It Works

Monitor average CPU across all replicas
If CPU > target_cpu_percent: scale up
If CPU < target_cpu_percent: scale down

Scaling Formula

Plain Text

Example:

Current: 5 replicas at 90% CPU
Target: 70% CPU
Desired: 5 × (90 / 70) = 6.4 → 7 replicas

CPU Target Recommendations

| Target | Behavior | Use Case | |--------|----------|----------| | 50% | Aggressive headroom | Critical services | | 70% | Balanced (default) | Standard applications | | 85% | Minimal headroom | Cost-sensitive |

Warning: Targets >85% risk saturation during scale-up lag.

Memory-Based Scaling

Optional secondary metric:

YAML

Behavior: Scale if either CPU or memory exceeds target.

When to Use Memory Scaling

✅ Use memory scaling when:

Application is memory-intensive (caching, data processing)
Memory leaks are possible
OOM kills would be catastrophic

❌ Skip memory scaling when:

Application memory is stable
CPU is the bottleneck
Simplicity is preferred

Cooldown Periods

Scale-Up Cooldown

Wait time before another scale-up:

YAML

Purpose: Prevent thrashing during metric spikes.

Recommendations:

Fast APIs: 30-45s
Slow startup: 60-120s
Cold start: 180s+

Scale-Down Cooldown

Wait time before scale-down:

YAML

Purpose: Prevent premature scale-down during temporary dips.

Recommendations:

Stable traffic: 180-300s
Variable traffic: 600s
Expensive startup: 900s+

Custom Metrics

Scale based on application-specific metrics:

YAML

Custom Metric Fields

| Field | Type | Description | |-------|------|-------------| | target | number | Ideal metric value | | scale_down_threshold | number | Scale down below this | | scale_up_threshold | number | Scale up above this (optional) | | query | string | Prometheus query |

Custom Metric Flow

Plain Text

Real-World Examples

Example 1: Production API

YAML

Result:

Baseline: 5 replicas ($50/month)
Peak: 30 replicas during business hours
Cost: ~$180/month (vs $500/month fixed 50 replicas)

Example 2: Batch Worker

YAML

Result:

Idle: 0 replicas ($0/month)
Active: Scales to match queue (1 replica per 100 jobs)
Cost: Pay only for processing time

Example 3: Database Proxy

YAML

Result:

Baseline: 2 proxies (400 connections)
Peak: 8 proxies (1,600 connections)
Cost: ~$30/month

Example 4: ML Inference

YAML

Result:

Warm: 1 GPU ready ($2/hour)
Peak: 10 GPUs during batch inference
Cost: ~$1,500/month (vs $14,000/month for 20 fixed GPUs)

Troubleshooting

Issue: Service Not Scaling Up

Symptom: CPU at 100%, but replicas not increasing

Causes:

Hit max_replicas:

YAML
Scale-up cooldown active:

YAML
Provider capacity exhausted:
- Add more providers
- Check region availability

Solution:

Bash

Issue: Service Scaling Too Aggressively

Symptom: Constantly scaling up/down (yo-yo effect)

Cause: Cooldowns too short

Solution:

YAML

Issue: Slow Scale-Up

Symptom: Latency spikes before scaling kicks in

Solutions:

Lower CPU target:

YAML
Increase min_replicas:

YAML
Reduce cooldown:

YAML

Issue: Scale-to-Zero Not Working

Symptom: Service stays at min_replicas despite no traffic

Cause: min_replicas > 0

Solution:

YAML

Note: Requires container to start in under 15s for best experience.

Best Practices

1. Start Conservative

YAML

Monitor for 1-2 weeks, then optimize.

2. Set Realistic Maximums

Don't set max_replicas arbitrarily high:

YAML

3. Longer Scale-Down Cooldowns

Scale down slowly to avoid instability:

YAML

4. Monitor Autoscaler Behavior

YAML

5. Test Under Load

Use load testing to validate:

Bash

Next Steps

Service Configuration Basics

On This Page

Getting Started

Core Concepts

Provider Configuration

Service Configuration

Deployment & Operations

Governance & Compliance

Autoscaling

Configure horizontal autoscaling for your services

Basic Autoscaling

Configuration Reference

Complete Autoscaling Block

Field Reference

Scaling Strategies

Strategy 1: Conservative (High Availability)

Strategy 2: Balanced (Standard)

Strategy 3: Aggressive (Cost-Optimized)

Strategy 4: Scale-to-Zero (Event-Driven)

CPU-Based Scaling

How It Works

Scaling Formula

CPU Target Recommendations

Memory-Based Scaling

When to Use Memory Scaling

Cooldown Periods

Scale-Up Cooldown

Scale-Down Cooldown

Custom Metrics

Custom Metric Fields

Custom Metric Flow

Real-World Examples

Example 1: Production API

Example 2: Batch Worker

Example 3: Database Proxy

Example 4: ML Inference

Troubleshooting

Issue: Service Not Scaling Up

Issue: Service Scaling Too Aggressively

Issue: Slow Scale-Up

Issue: Scale-to-Zero Not Working

Best Practices

1. Start Conservative

2. Set Realistic Maximums

3. Longer Scale-Down Cooldowns

4. Monitor Autoscaler Behavior

5. Test Under Load

Next Steps