it scales based on cpu requests, not limits.
that means that if a deployment with like:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: some-deployment spec: . . template: spec: containers: - name: some-deployment image: some-image:latest imagePullPolicy: Always resources: requests: memory: 100Mi cpu: 0.1 limits: memory: 200Mi cpu: 1 . . --- apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler spec: maxReplicas: 10 minReplicas: 2 scaleTargetRef: apiVersion: extensions/v1beta1 kind: Deployment name: some-deployment targetCPUUtilizationPercentage: 80
the hpa will spin up a new pod when the current deployment is using 80% of 0.1 vcpus. Not 80% of 1 vcpu. If you expect high utlisation, you’ll find youself maxing out your maxReplicas super quickly.