kube-monkey is an implementation of Netflix's Chaos Monkey for Kubernetes clusters. It randomly deletes Kubernetes (k8s) pods in the cluster encouraging and validating the development of failure-resilient services.
kube-monkey runs at a pre-configured hour (run_hour, defaults to 8 am) on weekdays, and builds a schedule of deployments that will face a random
Pod death sometime during the same day. The time-range during the day when the random pod Death might occur is configurable and defaults to 10 am to 4 pm.
kube-monkey can be configured with a list of namespaces
to blacklist (any deployments within a blacklisted namespace will not be touched)
To disable the blacklist provide [""] in the blacklisted_namespaces config.param.
Opting-In to Chaos
kube-monkey works on an opt-in model and will only schedule terminations for Kubernetes (k8s) apps that have explicitly agreed to have their pods terminated by kube-monkey.
Opt-in is done by setting the following labels on a k8s app:
kube-monkey/enabled: Set to "enabled" to opt-in to kube-monkey kube-monkey/mtbf: Mean time between failure (in days). For example, if set to "3", the k8s app can expect to have a Pod
killed approximately every third weekday. kube-monkey/identifier: A unique identifier for the k8s apps. This is used to identify the pods
that belong to a k8s app as Pods inherit labels from their k8s app. So, if kube-monkey detects that app foo has enrolled to be a victim, kube-monkey will look for all pods that have the label kube-monkey/identifier: foo to determine which pods are candidates for killing. The recommendation is to set this value to be the same as the app's name. kube-monkey/kill-mode: Default behavior is for kube-monkey to kill only ONE pod of your app. You can override this behavior by setting the value to:
kill-all if you want kube-monkey to kill ALL of your pods regardless of status (including not ready and not running pods). Does not require kill-value. Use this label carefully.
fixed if you want to kill a specific number of running pods with kill-value. If you overspecify, it will kill all running pods and issue a warning.
random-max-percent to specify a maximum% with kill-value that can be killed. At the scheduled time, a uniform random specified% of the running pods will be terminated.
fixed-percent to specify a fixed% with kill-value that can be killed. At the scheduled time, a specified fixed% of the running pods will be terminated.
kube-monkey/kill-value: Specify value for kill-mode
if fixed, provide an integer of pods to kill
if random-max-percent, provide a number from 0-100 to specify the max % of pods kube-monkey can kill
if fixed-percent, provide a number from 0-100 to specify the % of pods to kill
Example of opted-in Deployment killing one pod per purge
Since client-go does not support cluster dns explicitly with a // TODO: switch to using cluster DNS. note in the code, you may need to override the apiserver.
If you are running an unauthenticated system, you may need to force the http apiserver endpoint.
To override the apiserver specify in the config.toml file
Scheduling happens once a day on Weekdays - this is when a schedule for terminations for the current day is generated. During scheduling, kube-monkey will:
Generate a list of eligible k8s apps (k8s apps that have opted-in and are not blacklisted, if specified, and are whitelisted, if specified)
For each eligible k8s app, flip a biased coin (bias determined by kube-monkey/mtbf) to determine if a pod for that k8s app should be killed today
For each victim, calculate a random time when a pod will be killed
Termination time
This is the randomly generated time during the day when a victim k8s app will have a pod killed.
At termination time, kube-monkey will:
Check if the k8s app is still eligible (has not opted-out or been blacklisted or removed from the whitelist since scheduling)
Check if the k8s app has updated kill-mode and kill-value
Depending on kill-mode and kill-value, execute pods
Docker Images
Docker images for kube-monkey can be found at DockerHub
Building
Clone the repository and build the container.
go get github.com/asobti/kube-monkey
cd$GOPATH/src/github.com/asobti/kube-monkey
make build
make container
Configuring
kube-monkey is configured by environment variables or a toml file placed at /etc/kube-monkey/config.toml and expects the configmap to exist before the kube-monkey deployment.
[kubemonkey]
dry_run = true# Terminations are only loggedrun_hour = 8# Run scheduling at 8am on weekdaysstart_hour = 10# Don't schedule any pod deaths before 10amend_hour = 16# Don't schedule any pod deaths after 4pmblacklisted_namespaces = ["kube-system"] # Critical apps live heretime_zone = "America/New_York"# Set tzdata timezone example. Note the field is time_zone not timezone
{$timestamp}: attack's time from Unix epoch in milliseconds
{$time}: attack's time
{$date}: attack's date
{$error}: result's error, if any
{$kubemonkeyid}: kube-monkey id (set using KUBE_MONKEY_ID env variable otherwise empty)
message: '{
"what": "Kube-monkey(${kubemonkeyid}) attack of {$name} in {$namespace}",
"who": "{$name}",
"when": {$timestamp}
}'
The header supports a special placeholder to retrieve the value of an environment variable.
This is useful when calling an API that has a protected endpoint.
A typical scenario will be to pass an API token to the Kube-monkey container, this token is stored in a Kubernetes Secret and you want to pass it via an environment variable.
{$env:API_TOKEN} will be replaced by the environment variable API_TOKEN value.
Note if the environment variable does not exist, the notification call will NOT be cancelled. The value will resolve to an empty string, and a warning will show up in the logs.
Deploying
Manually
First, deploy the expected kube-monkey-config-map configmap in the namespace you intend to run kube-monkey in (for example, the kube-system namespace). Make sure to define the keyname as config.toml
For example kubectl create configmap km-config --from-file=config.toml=km-config.toml or kubectl apply -f km-config.yaml
Run kube-monkey as a k8s app within the Kubernetes cluster, in a namespace that has permissions to kill Pods in other namespaces (eg. kube-system).
See dir examples/ for example Kubernetes yaml files.
You should be able to see debug logs by kubectl logs -f deployment.apps/kube-monkey --namespace=kube-system here the deployment.apps/kube-monkey is the k8s deployment for kube-monkey.
kube-monkey uses glog and supports all command-line features for glog. To specify a custom v level or a custom log directory on the pod, see args: ["-v=5", "-log_dir=/path/to/custom/log"] in the example deployment file
Standardized glog levels grep -r V\([0-9]\) *
L0: None
L1: Highest Level current status info and Errors with Terminations
请发表评论