Skip to content

Instantly share code, notes, and snippets.

@hamdikh
Last active January 21, 2021 11:24
Show Gist options
  • Select an option

  • Save hamdikh/041e0c2978ca4787d597618afce9ef0b to your computer and use it in GitHub Desktop.

Select an option

Save hamdikh/041e0c2978ca4787d597618afce9ef0b to your computer and use it in GitHub Desktop.

Kubernetes cheatsheet

Getting Started

  • Fault tolerance
  • Rollback
  • Auto-healing
  • Auto-scaling
  • Load-balancing
  • Isolation (sandbox)

Sample yaml

apiVersion: <>
kind: <>
metadata:
  name: <>
  labels:
    ...
  annotations:
    ...
spec:
  containers:
    ...
  initContainers:
    ...
  priorityClassName: <>

Workflow

  • (kube-scheduler, controller-manager, etcd) --443--> API Server

  • API Server --10055--> kubelet

    • non-verified certificate
    • MITM
    • Solution:
      • set kubelet-certificate-authority
      • ssh tunneling
  • API server --> (nodes, pods, services)

    • Plain HTTP (unsafe)

Physical components

Master

  • API Server (443)
  • kube-scheduler
  • controller-manager
    • cloud-controller-manager
    • kube-controller-manager
  • etcd

Other components talk to API server, no direct communication

Node

  • Kubelet

  • Container Engine

    • CRI
      • The protocol which used to connect between Kubelet & container engine
  • Kube-proxy

Everything is an object - persistent entities

  • maintained in etcd, identified using

    • names: client-given
    • UIDs: system-generated
  • Both need to be unique

  • three management methods

    • Imperative commands (kubectl)
    • Imperative object configuration (kubectl + yaml)
      • repeatable
      • observable
      • auditable
    • Declarative object configuration (yaml + config files)
      • Live object configuration
      • Current object configuration file
      • Last-applied object configuration file
      Node Capacity
---------------------------
|     kube-reserved       |
|-------------------------|
|     system-reserved     |
|-------------------------|
|    eviction-threshold   |
|-------------------------|
|                         |
|      allocatable        |
|   (available for pods)  |
|                         |
|                         |
---------------------------

Namespaces

  • Three pre-defined

    • default
    • kube-system
    • kube-public: auto-readable by all users
  • Objects without namespaces

    • Nodes
    • PersistentVolumes
    • Namespaces

Labels

  • key / value
  • loose coupling via selectors
  • need not be unique

ClusterIP

  • Independent of lifespan of any backend pod
  • Service object has a static port assigned to it

Controller manager

  • ReplicaSet, deployment, daemonset, statefulSet
  • Actual state <-> desired state
  • reconciliation loop

Kube-scheduler

  • nodeSelector
  • Affinity & Anti-Affinity
    • Node
      • Steer pod to node
    • Pod
      • Steer pod towards or away from pods
  • Taints & tolerations (anti-affinity between node and pod!)
    • Base on predefined configuration (env=dev:NoSchedule)
      ...
      tolerations:
      - key: "dev"
        operator: "equal"
        value: "env"
        effect: NoSchedule
      ...
    • Base on node condition (alpha in v1.8)
      • taints added by node controller

Pod

kubectl run name --image=<image>

What's available inside the container?

  • File system
    • Image
    • Associated Volumes
      • ordinary
      • persistent
    • Container
      • Hostname
    • Pod
      • Pod name
      • User-defined envs
    • Services
      • List of all services

Access with:

  • Symlink (important):

    • /etc/podinfo/labels
    • /etc/podinfo/annotations
  • Or:

volumes:
  - name: podinfo
    downwardAPI:
      items:
        - path: "labels"
          fieldRef:
            fieldPath: metadata.labels
        - path: "annotations"
          fieldRef:
            fieldPath: metadata.annotations

Status

  • Pending
  • Running
  • Succeeded
  • Failed
  • Unknown

Probe

  • Liveness
    • Failed? Restart policy applied
  • Readiness
    • Failed? Removed from service

Pod priorities

  • available since 1.8
  • PriorityClass object
  • Affect scheduling order
    • High priority pods could jump the queue
  • Preemption
    • Low priority pods could be pre-empted to make way for higher one (if no node is available for high priority)
    • These preempted pods would have a graceful termination period

Multi-Container Pods

  • Share access to memory space
  • Connect to each other using localhost
  • Share access to the same volume
  • entire pod is host on the same node
  • all in or nothing
  • no auto healing or scaling

Init containers

  • run before app containers
  • always run to completion
  • run serially

Lifecycle hooks

  • PostStart
  • PreStop (blocking)

Handlers:

  • Exec
  • HTTP
...
spec:
  containers:
    lifecycle:
      postStart:
        exec:
          command: <>
      preStop:
        http:
...

Could invoke multiple times

Quality of Service (QoS)

When Kubernetes creates a Pod it assigns one of these QoS classes to the Pod:

  • Guaranteed (all containers have limits == requests)

If a Container specifies its own memory limit, but does not specify a memory request, Kubernetes automatically assigns a memory request that matches the limit. Similarly, if a Container specifies its own cpu limit, but does not specify a cpu request, Kubernetes automatically assigns a cpu request that matches the limit.

  • Burstable (at least 1 has limits or requests)
  • BestEffort (no limits or requests)

PodPreset

You can use a podpreset object to inject information like secrets, volume mounts, and environment variables etc into pods at creation time. This task shows some examples on using the PodPreset resource

apiVersion: settings.k8s.io/v1alpha1
kind: PodPreset
metadata:
  name: allow-database
spec:
  selector:
    matchLabels:
      role: frontend
  env:
    - name: DB_PORT
      value: "6379"
  volumeMounts:
    - mountPath: /cache
      name: cache-volume
  volumes:
    - name: cache-volume
      emptyDir: {}

ReplicaSet

Features:

  • Scaling and healing
  • Pod template
  • number of replicas

Components:

  • Pod template

  • Pod selector (could use matchExpressions)

  • Label of replicaSet

  • Number of replica

  • Could delete replicaSet without its pods using --cascade =false

  • Isolating pods from replicaSet by changing its labels

Deployments

  • versioning and rollback

  • Contains spec of replicaSet within it

  • advanced deployment

  • blue-green

  • canary

  • Update containers --> new replicaSet & new pods created --> old RS still exists --> reduced to zero

  • Every change is tracked

  • Append --record in kubectl to keep history

  • Update strategy

    • Recreate
      • Old pods would be killed before new pods come up
    • RollingUpdate
      • progressDeadlineSeconds
      • minReadySeconds
      • rollbackTo
      • revisionHistoryLimit
      • paused
        • spec.Paused
  • kubectl rollout undo deployment/<> --to-revision=<>

  • kubectl rollout statua deployment/<>

  • kubectl set image deployment/<> <>=<>:<>

  • kubectl rollout resume/pause <>

ReplicationController

  • RC = ( RS + deployment ) before
  • Obsolete

DaemonSet

  • Ensure all nodes run a copy of pod
  • Cluster storage, log collection, node monitor ...

StatefulSet

  • Maintains a sticky identity
  • Not interchangeable
  • Identifier maintains across any rescheduling

Limitation

  • volumes must be pre-provisioned
  • Deleting / Scaling will not delete associated volumes

Flow

  • Deployed 0 --> (n-1)
  • Deleted (n-1) --> 0 (successor must be completely shutdown before proceed)
  • Must be all ready and running before scaling happens

Job (batch/v1)

  • Non-parallel jobs
  • Parallel jobs
    • Fixed completion count
      • job completes when number of completions reaches target
    • With work queue
      • requires coordination
  • Use spec.activeDeadlineSeconds to prevent infinite loop

Cronjob

  • Job should be idempotent

Horizontal pod autoscaler

  • Targets: replicaControllers, deployments, replicaSets
  • CPU or custom metrics
  • Won't work with non-scaling objects: daemonSets
  • Prevent thrashing (upscale/downscale-delay)

Services

  • Logical set of backend pods + frontend

  • Frontend: static IP + port + dns name

  • Backend: set of backend pods (via selector)

  • Static IP and networking.

  • Kube-proxy route traffic to VIP.

  • Automatically create endpoint based on selector.

  • CluterIP

  • NodePort

    • external --> NodeIP + NodePort --> kube-proxy --> ClusterIP
  • LoadBalancer

    • Need to have cloud-controller-manager
      • Node controller
      • Route controller
      • Service controller
      • Volume controller
    • external --> LB --> NodeIP + NodePort --> kube-proxy --> ClusterIP
  • ExternalName

    • Can only resolve with kube-dns
    • No selector

Service discovery

  • SRV record for named port
    • port-name.port-protocol.service-name.namespace.svc.cluster.local
  • Pod domain
    • pod-ip-address.namespace.pod.cluster.local
    • hostname is metadata.name

spec.dnsPolicy

  • default
    • inherit node's name resolution
  • ClusterFirst
    • Any DNS query that does not match the configured cluster domain suffix, such as “www.kubernetes.io”, is forwarded to the upstream nameserver inherited from the node
  • ClusterFirstWithHostNet
    • if host network = true
  • None (since k8s 1.9)
    • Allow custom dns server usage

Headless service

  • with selector? --> associate with pods in cluster
  • without selector? --> forward to externalName

Could specify externalIP to service

Volumes

Lifetime longer than any containers inside a pod.

4 types:

  • configMap

  • emptyDir

    • share space / state across containers in same pod
    • containers can mount at different times
    • pod crash --> data lost
    • container crash --> ok
  • gitRepo

  • secret

    • store on RAM
  • hostPath

Persistent volumes

Role-Based Access Control (RBAC)

  • Role
    • Apply on namespace resources
  • ClusterRole
    • cluster-scoped resources (nodes,...)
    • non-resources endpoint (/healthz)
    • namespace resources across all namespaces

Custom Resource Definitions

CustomResourceDefinitions themselves are non-namespaced and are available to all namespaces.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  # name must match the spec fields below, and be in the form: <plural>.<group>
  name: crontabs.stable.example.com
spec:
  # group name to use for REST API: /apis/<group>/<version>
  group: stable.example.com
  # version name to use for REST API: /apis/<group>/<version>
  version: v1
  # either Namespaced or Cluster
  scope: Namespaced
  names:
    # plural name to be used in the URL: /apis/<group>/<version>/<plural>
    plural: crontabs
    # singular name to be used as an alias on the CLI and for display
    singular: crontab
    # kind is normally the CamelCased singular type. Your resource manifests use this.
    kind: CronTab
    # shortNames allow shorter string to match your resource on the CLI
    shortNames:
    - ct
    # categories is a list of grouped resources the custom resource belongs to.
    categories:
    - all
  validation:
   # openAPIV3Schema is the schema for validating custom objects.
    openAPIV3Schema:
      properties:
        spec:
          properties:
            cronSpec:
              type: string
              pattern: '^(\d+|\*)(/\d+)?(\s+(\d+|\*)(/\d+)?){4}$'
            replicas:
              type: integer
              minimum: 1
              maximum: 10
  # subresources describes the subresources for custom resources.
  subresources:
    # status enables the status subresource.
    status: {}
    # scale enables the scale subresource.
    scale:
      # specReplicasPath defines the JSONPath inside of a custom resource that corresponds to Scale.Spec.Replicas.
      specReplicasPath: .spec.replicas
      # statusReplicasPath defines the JSONPath inside of a custom resource that corresponds to Scale.Status.Replicas.
      statusReplicasPath: .status.replicas
      # labelSelectorPath defines the JSONPath inside of a custom resource that corresponds to Scale.Status.Selector.
      labelSelectorPath: .status.labelSelector

Notes

Basic commands

# show current context
kubectl config current-context

# get specific resource
kubectl get (pod|svc|deployment|ingress) <resource-name>

# Get pod logs
kubectl logs -f <pod-name>

# Get nodes list
kubectl get no -o custom-columns=NAME:.metadata.name,AWS-INSTANCE:.spec.externalID,AGE:.metadata.creationTimestamp

# Run specific command | Drop to shell
kubectl exec -it <pod-name> <command>

# Describe specific resource
kubectl describe (pod|svc|deployment|ingress) <resource-name>

# Set context
kubectl config set-context $(kubectl config current-context) --namespace=<namespace-name>

# Run a test pod
kubectl run -it --rm --generator=run-pod/v1 --image=alpine:3.6 tuan-shell -- sh
  • from @so0k link

  • access dashboard

# bash
kubectl -n kube-system port-forward $(kubectl get pods -n kube-system -o wide | grep dashboard | awk '{print $1}') 9090

# fish
kubectl -n kube-system port-forward (kubectl get pods -n kube-system -o wide | grep dashboard | awk '{print $1}') 9090

jsonpath

From link

{
  "kind": "List",
  "items":[
    {
      "kind":"None",
      "metadata":{"name":"127.0.0.1"},
      "status":{
        "capacity":{"cpu":"4"},
        "addresses":[{"type": "LegacyHostIP", "address":"127.0.0.1"}]
      }
    },
    {
      "kind":"None",
      "metadata":{"name":"127.0.0.2"},
      "status":{
        "capacity":{"cpu":"8"},
        "addresses":[
          {"type": "LegacyHostIP", "address":"127.0.0.2"},
          {"type": "another", "address":"127.0.0.3"}
        ]
      }
    }
  ],
  "users":[
    {
      "name": "myself",
      "user": {}
    },
    {
      "name": "e2e",
      "user": {"username": "admin", "password": "secret"}
    }
  ]
}
Function Description Example Result
text the plain text kind is {.kind} kind is List
@ the current object {@} the same as input
. or [] child operator {.kind} or {['kind']} List
.. recursive descent {..name} 127.0.0.1 127.0.0.2 myself e2e
* wildcard. Get all objects {.items[*].metadata.name} [127.0.0.1 127.0.0.2]
[start:end :step] subscript operator {.users[0].name} myself
[,] union operator {.items[*]['metadata.name', 'status.capacity']} 127.0.0.1 127.0.0.2 map[cpu:4] map[cpu:8]
?() filter {.users[?(@.name=="e2e")].user.password} secret
range, end iterate list {range .items[*]}[{.metadata.name}, {.status.capacity}] {end} [127.0.0.1, map[cpu:4]] [127.0.0.2, map[cpu:8]]
'' quote interpreted string {range .items[*]}{.metadata.name}{'\t'}{end} 127.0.0.1 127.0.0.2

Below are some examples using jsonpath:

$ kubectl get pods -o json
$ kubectl get pods -o=jsonpath='{@}'
$ kubectl get pods -o=jsonpath='{.items[0]}'
$ kubectl get pods -o=jsonpath='{.items[0].metadata.name}'
$ kubectl get pods -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.startTime}{"\n"}{end}'

Resource limit

CPU

The CPU resource is measured in cpu units. One cpu, in Kubernetes, is equivalent to:

  • 1 AWS vCPU
  • 1 GCP Core
  • 1 Azure vCore
  • 1 Hyperthread on a bare-metal Intel processor with Hyperthreading

Memory

The memory resource is measured in bytes. You can express memory as a plain integer or a fixed-point integer with one of these suffixes: E, P, T, G, M, K, Ei, Pi, Ti, Gi, Mi, Ki. For example, the following represent approximately the same value:

128974848, 129e6, 129M , 123Mi

Chapter 13. Integrating storage solutions and Kubernetes

  • External service without selector (access with external-database.svc.default.cluster endpoint)
kind: Service
apiVersion: v1
metadata:
  name: external-database
spec:
  type: ExternalName
  externalName: "database.company.com
  • external service with IP only
kind: Service
apiVersion: v1
metadata:
  name: external-ip-database
---
kind: Endpoints
apiVersion: v1
metadata:
  name: external-ip-database
subsets:
  - addresses:
    - ip: 192.168.0.1
    ports:
    - port: 3306

Downward API

The following information is available to containers through environment variables and downwardAPI volumes:

Information available via fieldRef:

  • spec.nodeName - the node’s name
  • status.hostIP - the node’s IP
  • metadata.name - the pod’s name
  • metadata.namespace - the pod’s namespace
  • status.podIP - the pod’s IP address
  • spec.serviceAccountName - the pod’s service account name
  • metadata.uid - the pod’s UID
  • metadata.labels[''] - the value of the pod’s label (for example, metadata.labels['mylabel']); available in Kubernetes 1.9+
  • metadata.annotations[''] - the value of the pod’s annotation (for example, metadata.annotations['myannotation']); available in Kubernetes 1.9+
  • Information available via resourceFieldRef:
  • A Container’s CPU limit
  • A Container’s CPU request
  • A Container’s memory limit
  • A Container’s memory request

In addition, the following information is available through downwardAPI volume fieldRef:

  • metadata.labels - all of the pod’s labels, formatted as label-key="escaped-label-value" with one label per line
  • metadata.annotations - all of the pod’s annotations, formatted as annotation-key="escaped-annotation-value" with one annotation per line

Labs

Guaranteed Scheduling For Critical Add-On Pods

See link

  • Marking pod as critical when using Rescheduler. To be considered critical, the pod has to:
    • Run in the kube-system namespace (configurable via flag)
    • Have the scheduler.alpha.kubernetes.io/critical-pod annotation set to empty string
    • Have the PodSpec’s tolerations field set to [{"key":"CriticalAddonsOnly", "operator":"Exists"}].

The first one marks a pod a critical. The second one is required by Rescheduler algorithm.

  • Marking pod as critical when priorites are enabled. To be considered critical, the pod has to:
    • Run in the kube-system namespace (configurable via flag)
    • Have the priorityClass set as system-cluster-critical or system-node-critical, the latter being the highest for entire cluster
    • scheduler.alpha.kubernetes.io/critical-pod annotation set to empty string(This will be deprecated too).

Set command or arguments via env

env:
- name: MESSAGE
  value: "hello world"
command: ["/bin/echo"]
args: ["$(MESSAGE)"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment