forked from cloudfoundry/docs-bosh
-
Notifications
You must be signed in to change notification settings - Fork 0
/
resurrector.html.md.erb
139 lines (98 loc) · 5.58 KB
/
resurrector.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
title: Automatic repair with Resurrector
---
The Resurrector is a plugin to the [Health Monitor](bosh-components.html#health-monitor). It's responsible for automatically recreating VMs that become inaccessible.
The Resurrector continiously cross-references VMs expected to be running against the VMs that are sending heartbeats. When resurrector does not receive heartbeats for a VM for a certain period of time, it will kick off a task on the Director (scan and fix task) to try to "resurrect" that VM. The Director may do one of two things:
- create a new VM if the old VM is missing from the IaaS
- replace a VM if the Agent on that VM is not responding to commands
Under certain conditions the Resurrector will consider the system in the "meltdown" and will stop sending requests to the Director. It will resume submitting scan and fix tasks to the Director once the conditions change.
Resurrection can be turned off per specific deployment job instance or for all VMs managed by the Director via [`bosh vm resurrection` CLI command](sysadmin-commands.html#vm-resurrection).
<p class='note'><strong>Note</strong>: The Health Monitor deploys with the Resurrector plugin disabled by default. To use it, you must enable the Resurrector plugin in your BOSH deployment manifest.</p>
---
## <a id="enable"></a> Enabling and Configuring the Resurrector
To enable the Resurrector:
1. Change deployment manifest for the Health Monitor (typically colocated with the Director):
```yaml
properties:
hm:
resurrector_enabled: true
```
1. Optionally change configuration values:
* **minimum\_down\_jobs** [Integer, optional]: If the total number of instances that are down in a deployment (within time interval T) is below this number, the Resurrector will _always_ request to fix instances. This decision takes precedence to the `percent_threshold` check. Default is 5.
* **percent_threshold** [Float, optional]: If the percentage of instances that are down in a deployment (within time interval T) is greater than the threshold percentage, the Resurrector will _not_ request to fix any instance. Going over this threshold is called "meltdown". Default is 0.2 (20%).
* **time_threshold** [Integer, optional]: Time interval (in seconds) used in the above calculations. Default is 600.
```yaml
properties:
hm:
resurrector_enabled: true
resurrector:
minimum_down_jobs: 5
percent_threshold: 0.2
time_threshold: 600
```
1. Depending on how you configured [Director user management](director-users.html), credentials are specified in the `user` and `password` properties or using a custom `client` to authenticate with the UAA.
### <a id="uaa-client"></a> Option a) Using UAA User Management
Define an additional client in the `uaa.clients` section of your manifest:
```yaml
properties:
uaa:
clients:
hm:
override: true
authorized-grant-types: client_credentials
scope: ""
authorities: bosh.admin
secret: "hm-password"
```
Configure the Health Monitor to use the client with the defined secret to authenticate with the Director:
```yaml
properties:
hm:
director_account:
client_id: hm
client_secret: "hm-password"
```
### <a id="preconfigured-users"></a> Option b) Using Preconfigured Users
Create new Director user so that the Resurrector plugin can communicate with the Director and query/submit information about deployments.
```yaml
properties:
director:
user_management:
provider: local
local:
users:
- {name: admin, password: admin-password}
- {name: hm, password: hm-password}
```
Configure the Health Monitor to use the HM user and password to authenticate with the Director:
```yaml
properties:
hm:
director_account:
user: hm
password: hm-password
```
1. Deploy.
### <a id="customize"></a> Customizing for Your Deployment
For most deployments, you can use the default configuration values. In very small or very large deployments, the default values may need customization, as discussed in the following examples.
#### Small Deployment
If your deployment consists of only five VMs, you may not want the Resurrector to attempt to recreate your entire deployment in the event of a catastrophic failure. In this scenario, we recommend that you set `minimum_down_jobs` to 1 or 2.
#### Large Deployment
If your deployment consists of 1000 VMs, and you use the defaults, the Resurrector notifies the Director to recreate at least five VMs and up to 200 VMs. Depending on your deployment, you may consider even 100 down instances a catastrophic failure. In this scenario, set `percent_threshold` to 5% so that the Director resurrects 50 instances or fewer.
---
## <a id="disable"></a> Disabling the Resurrector
To disable the Resurrector:
1. Change deployment manifest for the Health Monitor (typically colocated with the Director):
```yaml
properties:
hm:
resurrector_enabled: false
```
1. Redeploy.
1. Optionally remove Director user created for the Health Monitor.
---
## <a id="audit"></a> Viewing the Resurrector's Activity
Since scan and fix tasks on the Director are regular tasks, you can use `bosh tasks --no-filter` command to view currently running/queued Resurrector's activity and `bosh tasks recent --no-filter` to view finished tasks.
---
Next: [Persistent disk snapshotting](snapshots.html)
Previous: [Manual repair with Cloud Check](cck.html)