forked from cloudfoundry/docs-loggregator
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlog-ops-guide.html.md.erb
129 lines (71 loc) · 6.26 KB
/
log-ops-guide.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: Loggregator guide for Cloud Foundry operators
owner: Logging and Metrics
---
<%# Reset page title based on platform type %>
<% if vars.platform_code != 'CF' %>
<% set_title("Loggregator Guide for", vars.app_runtime_abbr, "Operators") %>
<% end %>
Here you can view information for <%= vars.app_runtime_first %> deployment operators about how to configure the
Loggregator system to avoid data loss with high volumes of logging and metrics data.
For more information about the Loggregator system, see [Loggregator Architecture](./architecture.html).
## <a id='throughput-reliability'></a> Loggregator message throughput and reliability
You can measure both the message throughput and message reliability rates of your Loggregator system.
### <a id='throughput-rate'></a> Measuring message throughput
To measure the message throughput of the Loggregator system, you can monitor the total number of egress messages from all Loggregator Agents in your deployment using the `metron.egress` metric.
If you do not use a monitoring platform, you can measure the overall message throughput of your Loggregator system. To measure the overall message throughput:
1. Log in to the Cloud Foundry Command Line Interface (cf CLI) with your admin credentials by running:
```
cf login
```
1. Install the <%= vars.app_runtime_abbr %> Firehose plug-in.
For more information, see [Install the plug-in](cli-plugin.html#add) in
_Installing the Loggregator Firehose Plug-in for cf CLI_.
1. Install Pipe Viewer by running:
```
apt-get install pv
```
1. Run:
```
cf nozzle -n | pv -l -i 10 -r > /dev/null
```
### <a id='reliability-rate'></a> Measuring message reliability
To measure the message reliability rate of your Loggregator system, you can run black-box tests.
## <a id='scaling'></a> Scaling Loggregator
<% if vars.platform_code == 'CF' %>
Most Loggregator configurations are set to use preferred resource defaults. If you want to customize these defaults or plan the capacity of your Loggregator system, see the following formulas:
### <a id='doppler'></a> Doppler
Doppler resources can require scaling to accommodate your overall log and metric volume. The recommended formula for estimating the number of Doppler instances you need to achieve a loss rate of < 1% is:
<p class="tip"><em>Number of Doppler instances = doppler.ingress / 16 000</em></p>
where `doppler.ingress` represents the per-second rate of change of the `ingress` metric from the `doppler` source, summed over all VMs in your deployment.
Using maximum values over a two-week period is a recommended approach for ingress-based capacity planning.
### <a id='traffic-controller'></a> Traffic Controller
Traffic Controller resources are usually scaled in line with Doppler resources. The recommended formula for determining the number of Traffic Controller instances is:
<p class="tip"><em>Number of Traffic Controller instances = Number of Doppler instances / 2</em></p>
In addition, Traffic Controller resources can require scaling to accommodate the number of your log streams and Firehose subscriptions.
### <a id='log-cache'></a> Log Cache
Log Cache is an ephemeral in-memory store of logs and metrics. Operators scale Log Cache to make sure that at least 15 minutes of
logs and metrics are available at a time. <%= vars.recommended_by %> recommends scaling Log Cache when the `log_cache_cache_period` metric drops under 900000 milliseconds.
For more information about scaling the Loggregator system,
see [Scaling Loggregator](../running/managing-cf/logging-config.html#scaling) in _Configuring System Logging_.
<% else %>
Most Loggregator configurations use preferred resource defaults. For more information about customizing these defaults and planning the capacity of your Loggregator system, see <%= vars.key_cap_link %>.
<% end %>
## <a id='scaling-nozzles'></a> Scaling nozzles
Nozzles are programs that consume data from the Loggregator Firehose. Nozzles can be configured to select, buffer, and transform data, and to forward it to other apps and services. You can scale a nozzle using the subscription ID specified when the nozzle connects to the Firehose. If you use the same subscription ID on each nozzle instance, the Firehose evenly distributes data across all instances of the nozzle.
For example, if you have two nozzle instances with the same subscription ID, the Firehose sends half of the data to one nozzle instance and half to the other. Similarly, if you have three nozzle instances with the same subscription ID, the Firehose sends one-third of the data to each instance.
If you want to scale a nozzle, the number of nozzle instances must match the number of Traffic Controller instances:
Note, the number of nozzle instances = Number of Traffic Controller instances.
Stateless nozzles handle scaling. If a nozzle buffers or caches the data, the nozzle author must test the results of scaling the number of nozzle instances up or down.
<p> You can deactivate the Firehose. In place of the Firehose, you can configure an aggregate log and metric drain for your foundation. <%= vars.configure_system_logging_link %></p>
## <a id='slow-noz'></a> Slow nozzle alerts
The Traffic Controller alerts the nozzle programs if they consume events too slowly.
The following metrics can be used to identify slow consumers:
* For v1 consumers: `doppler_proxy.slow_consumer` is incremented as consumers are disconnected for being slow.
* For v2 consumers: `doppler.dropped{direction="egress"}` or `rlp.dropped{direction="egress"}` are incremented when a v2 consumer fails to keep up.
For more information about the Traffic Controller,
see [Loggregator Architecture](./architecture.html#loggregator-architecture) in _Loggregator Architecture_.
## <a id='syslog-forward'></a> Forwarding logs to an external service
You can configure <%= vars.app_runtime_abbr %> to forward log data from apps to an external aggregator service. For information about how to bind apps to the external service and configure it to receive logs from <%= vars.app_runtime_abbr %>, see [Streaming App Logs to Log Management Services](../devguide/services/log-management.html).
## <a id='log-message-size'></a> Log message size constraints
When a Diego Cell emits app logs to Loggregator, Diego breaks up log messages greater than approximately 60 KiB into multiple envelopes.