반응형
Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
Tags
- 오퍼레이터
- seldon core
- argocd
- gitea
- Kopf
- tekton
- opensearch
- operator
- Pulumi
- Argo
- Continuous Deployment
- 카오스 엔지니어링
- Kubernetes 인증
- opentelemetry
- Kubernetes
- CANARY
- serving
- knative
- Model Serving
- kubernetes operator
- Kubeflow
- CI/CD
- keda
- blue/green
- MLflow
- gitops
- mlops
- nginx ingress
- argo rollout
- Litmus
Archives
- Today
- Total
Kubernetes 이야기
Litmus Chaos Engineering - 테스트 본문
반응형
카오스 테스트를 위한 일반적인 절차는 아래와 같다.
1. 테스트 할 애플리케이션을 준비하고, 애플리케이션의 정상 상태를 정의해야 한다.
2. 카오스 테스트할 시나리오 ( 예를 들어, Pod 비정상 종료, Network 속도 지연, 디스크 장애 등 ) 를 정의해야 한다.
3. 카오스 테스트 중 애플리케이션의 상태를 모니터링한다.
4. 카오스 테스트의 롤백을 진행하여 애플리케이션의 정상 상태로 돌아오는지 확인한다.
애플리케이션 준비
테스트할 애플리케이션은 아래와 같이 배포해 보자.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sampleapp
spec:
replicas: 1
selector:
matchLabels:
app: sampleapp
template:
metadata:
labels:
app: sampleapp
spec:
containers:
- name: sampleapp
image: nginx
ports:
- containerPort: 80
imagePullPolicy: Always
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: ingress-tomcat
spec:
rules:
- host: sampleapp.10.20.20.100.nip.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: sampleapp
port:
number: 80
---
apiVersion: v1
kind: Service
metadata:
name: sampleapp
spec:
type: ClusterIP
selector:
app: sampleapp
ports:
- protocol: TCP
port: 80
targetPort: 80
배포된 내역은 아래와 같다.
# k get all -n litmus-test
NAME READY STATUS RESTARTS AGE
pod/sampleapp-6c65c5cdb7-q9bjr 0/1 ContainerCreating 0 9s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/sampleapp ClusterIP 10.107.81.167 <none> 80/TCP 9s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/sampleapp 0/1 1 0 9s
NAME DESIRED CURRENT READY AGE
replicaset.apps/sampleapp-6c65c5cdb7 1 1 0 9s
domain을 한번 호출해 보면 아래와 같이 기본 화면이 나타난다.
이제 litmus 카오스 테스트로 메모리 증가 이슈를 발생시키는 시나리오를 테스트해보자.
먼저, RBAC 권한을 위해 아래와 같은 service account 정책을 생성한다. ( 앱이 배포된 ns에 생성한다. )
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: pod-memory-hog-sa
namespace: litmus-test
labels:
name: pod-memory-hog-sa
app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-memory-hog-sa
namespace: litmus-test
labels:
name: pod-memory-hog-sa
app.kubernetes.io/part-of: litmus
rules:
# Create and monitor the experiment & helper pods
- apiGroups: [""]
resources: ["pods"]
verbs: ["create","delete","get","list","patch","update", "deletecollection"]
# Performs CRUD operations on the events inside chaosengine and chaosresult
- apiGroups: [""]
resources: ["events"]
verbs: ["create","get","list","patch","update"]
# Fetch configmaps details and mount it to the experiment pod (if specified)
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get","list",]
# Track and get the runner, experiment, and helper pods log
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get","list","watch"]
# for creating and managing to execute comands inside target container
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["get","list","create"]
# deriving the parent/owner details of the pod(if parent is anyof {deployment, statefulset, daemonsets})
- apiGroups: ["apps"]
resources: ["deployments","statefulsets","replicasets", "daemonsets"]
verbs: ["list","get"]
# deriving the parent/owner details of the pod(if parent is deploymentConfig)
- apiGroups: ["apps.openshift.io"]
resources: ["deploymentconfigs"]
verbs: ["list","get"]
# deriving the parent/owner details of the pod(if parent is deploymentConfig)
- apiGroups: [""]
resources: ["replicationcontrollers"]
verbs: ["get","list"]
# deriving the parent/owner details of the pod(if parent is argo-rollouts)
- apiGroups: ["argoproj.io"]
resources: ["rollouts"]
verbs: ["list","get"]
# for configuring and monitor the experiment job by the chaos-runner pod
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","list","get","delete","deletecollection"]
# for creation, status polling and deletion of litmus chaos resources used within a chaos workflow
- apiGroups: ["litmuschaos.io"]
resources: ["chaosengines","chaosexperiments","chaosresults"]
verbs: ["create","list","get","patch","update","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-memory-hog-sa
namespace: litmus-test
labels:
name: pod-memory-hog-sa
app.kubernetes.io/part-of: litmus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: pod-memory-hog-sa
subjects:
- kind: ServiceAccount
name: pod-memory-hog-sa
namespace: litmus-test
시나리오를 생성하기 전에 먼저 실험에 필요한 ChaosExperiment를 생성하자.
---
apiVersion: litmuschaos.io/v1alpha1
description:
message: |
Injects memory consumption on pods belonging to an app deployment
kind: ChaosExperiment
metadata:
name: pod-memory-hog
labels:
name: pod-memory-hog
app.kubernetes.io/part-of: litmus
app.kubernetes.io/component: chaosexperiment
app.kubernetes.io/version: 2.11.0
spec:
definition:
scope: Namespaced
permissions:
# Create and monitor the experiment & helper pods
- apiGroups: [""]
resources: ["pods"]
verbs: ["create","delete","get","list","patch","update", "deletecollection"]
# Performs CRUD operations on the events inside chaosengine and chaosresult
- apiGroups: [""]
resources: ["events"]
verbs: ["create","get","list","patch","update"]
# Fetch configmaps details and mount it to the experiment pod (if specified)
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get","list",]
# Track and get the runner, experiment, and helper pods log
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get","list","watch"]
# for creating and managing to execute comands inside target container
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["get","list","create"]
# deriving the parent/owner details of the pod(if parent is anyof {deployment, statefulset, daemonsets})
- apiGroups: ["apps"]
resources: ["deployments","statefulsets","replicasets", "daemonsets"]
verbs: ["list","get"]
# deriving the parent/owner details of the pod(if parent is deploymentConfig)
- apiGroups: ["apps.openshift.io"]
resources: ["deploymentconfigs"]
verbs: ["list","get"]
# deriving the parent/owner details of the pod(if parent is deploymentConfig)
- apiGroups: [""]
resources: ["replicationcontrollers"]
verbs: ["get","list"]
# deriving the parent/owner details of the pod(if parent is argo-rollouts)
- apiGroups: ["argoproj.io"]
resources: ["rollouts"]
verbs: ["list","get"]
# for configuring and monitor the experiment job by the chaos-runner pod
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","list","get","delete","deletecollection"]
# for creation, status polling and deletion of litmus chaos resources used within a chaos workflow
- apiGroups: ["litmuschaos.io"]
resources: ["chaosengines","chaosexperiments","chaosresults"]
verbs: ["create","list","get","patch","update","delete"]
image: "litmuschaos/go-runner:2.11.0"
imagePullPolicy: Always
args:
- -c
- ./experiments -name pod-memory-hog
command:
- /bin/bash
env:
- name: TOTAL_CHAOS_DURATION
value: '60'
## enter the amount of memory in megabytes to be consumed by the application pod
- name: MEMORY_CONSUMPTION
value: '500'
## Number of workers to perform stress
- name: NUMBER_OF_WORKERS
value: '1'
## percentage of total pods to target
- name: PODS_AFFECTED_PERC
value: ''
## Period to wait before and after injection of chaos in sec
- name: RAMP_TIME
value: ''
## env var that describes the library used to execute the chaos
## default: litmus. Supported values: litmus, pumba
- name: LIB
value: 'litmus'
## It is used in pumba lib only
- name: LIB_IMAGE
value: 'litmuschaos/go-runner:2.11.0'
## It is used in pumba lib only
- name: STRESS_IMAGE
value: 'alexeiled/stress-ng:latest-ubuntu'
## provide the cluster runtime
- name: CONTAINER_RUNTIME
value: 'containerd'
# provide the socket file path
- name: SOCKET_PATH
value: '/run/containerd/containerd.sock'
## it defines the sequence of chaos execution for multiple target pods
## supported values: serial, parallel
- name: SEQUENCE
value: 'parallel'
- name: TARGET_PODS
value: ''
# To select pods on specific node(s)
- name: NODE_LABEL
value: ''
labels:
name: pod-memory-hog
app.kubernetes.io/part-of: litmus
app.kubernetes.io/component: experiment-job
app.kubernetes.io/runtime-api-usage: "true"
app.kubernetes.io/version: 2.11.0
주의 : 필자의 시스템은 docker 를 사용하지 않고 containerd를 사용하기 때문에 CONTAINER_RUNTIME을 containerd로 설정해야 한다. SOCKET_PATH 도 runtime에 따라 변경해 준다.
이제 memory 증가 카오스 시나리오를 생성해 보자.
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: nginx-network-chaos
namespace: litmus-test
spec:
# It can be active/stop
engineState: 'active'
annotationCheck: "false"
appinfo:
appns: 'litmus-test'
# FYI, To see app label, apply kubectl get pods --show-labels
applabel: 'app=sampleapp'
appkind: 'deployment'
chaosServiceAccount: pod-network-latency-sa
experiments:
- name: pod-network-latency
spec:
components:
env:
- name: TOTAL_CHAOS_DURATION
value: '300' # in seconds
- name: NETWORK_LATENCY
value: '2000'
- name: JITTER
value: '0'
# provide the name of container runtime
# for litmus LIB, it supports docker, containerd, crio
# for pumba LIB, it supports docker only
- name: CONTAINER_RUNTIME
value: 'containerd'
# provide the socket file path
- name: SOCKET_PATH
value: '/run/containerd/containerd.sock'
## percentage of total pods to target
- name: PODS_AFFECTED_PERC
value: ''
실행한 카오스 테스트 상태를 보자.
# k get po -n litmus-test
NAME READY STATUS RESTARTS AGE
engine-nginx-runner 1/1 Running 0 24s
pod-memory-hog-helper-imjzld 1/1 Running 0 12s
pod-memory-hog-n6kpdk-mpfnr 1/1 Running 0 23s
sampleapp-6c65c5cdb7-q9bjr 1/1 Running 0 3h6m
실행 전 메모리 상태
# k top pod -n litmus-test
NAME CPU(cores) MEMORY(bytes)
sampleapp-6c65c5cdb7-q9bjr 0m 24Mi
실행 후 메모리 상태
# k top pod -n litmus-test
NAME CPU(cores) MEMORY(bytes)
engine-nginx-runner 1m 13Mi
pod-memory-hog-helper-yuvrvs 22m 8Mi
pod-memory-hog-ux9mb2-pngkj 2m 12Mi
sampleapp-6c65c5cdb7-q9bjr 792m 533Mi
위와 같이 sampleapp 메모리가 500Mi 증가된 것을 볼 수 있다. 이 설정으로 mem limit 설정, oom killer 작동 등의 테스트를 진행해 볼 수 있다.
반응형
Comments