forked from cloudfoundry/docs-loggregator
-
Notifications
You must be signed in to change notification settings - Fork 0
/
app-log-rate-limits.html.md.erb
117 lines (79 loc) · 8.05 KB
/
app-log-rate-limits.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
title: Limiting your App Log Rate in Cloud Foundry
owner: Logging and Metrics
---
Here you can learn about limiting your app log rate for apps in <%= vars.app_runtime_full %>.
Log rate limiting limits the number of logs that can be sent to an app.
App log rate limiting is deactivated by default. <%= vars.company_name %> recommends activating this feature to prevent app instances from overloading the
Loggregator Agent with logs, so the Loggregator Agent does not drop logs for other app instances on the same Diego Cell.
Using app log rate limiting can also do the following:
* Prevent apps from reporting inaccurate app metrics in the Cloud Foundry Command Line Interface (cf CLI), which can happen if Log Cache evicts metrics from
its cache in order to store large volumes of logs.
* Limit the CPU usage of logging agents on the Diego Cell VM.
You can allow log rate limits on a per-app basis in bytes per second.
## <a id='overview-byte-limit'></a> App log rate limiting in bytes per second
In <%= vars.app_runtime_abbr %>, you can limit the number of bytes each app instance can generate per second.
You can configure app log rate limiting in bytes per second on a per-app basis through either the app manifest or the cf CLI. Additionally, you can enforce
the log rate limit you configure for all apps that are deployed within a space or org by specifying the log rate limit in the quota plan for the space or org.
<% if vars.platform_code != "OFFLINE" %>For more information, see [Creating and Modifying Quota Plans](../adminguide/quota-plans.html).<% end %>
## <a id='determine-limit'></a> Determining the ideal app log rate limit
The ideal app logging rate for a deployment depends on characteristics such as VM sizes and the number and type of apps in <%= vars.app_runtime_abbr %>.
<%= vars.company_name %> recommends using at minimum the default limit of `1k` bytes per second.
When you allow app log rate limiting, Diego applies the log rate limit to each app instance. For example, if there are five instances of an app running, Diego
does not sum the logging rates of all five instances when determining if the log rate limit has been exceeded. Instead, Diego evaluates the logging rate of
each individual app instance and only limits instances that exceed the log rate limit.
## <a id='exceed-limit'></a> What happens when app instances exceed the app log rate limit
When an app instance exceeds the log rate limit you configured, Diego drops the app logs that exceed that limit. When this happens, you see a message
indicating that Diego is dropping app logs.
For more information about how Diego rate-limits app logs, see the [Go documentation](https://godoc.org/golang.org/x/time/rate).
## <a id='determine-exceeded'></a> How Diego Cells behave when an app instance exceeds the app log rate
When an app instance exceeds the log rate limit, the Diego Cell containing the app instance emits the `AppInstanceExceededLogRateLimitCount` counter metric,
similar to the following example:
<pre class="terminal">
origin:"rep" eventType:CounterEvent timestamp:1582582740243576212 deployment:"cf" job:"diego-cell" index:"0e98fd00-47b2-4589-94f0-385f78b3a04d" ip:"10.0.1.12" tags:<key:"instance_id" value:"0e98fd00-47b2-4589-94f0-385f78b3a04d" > tags:<key:"source_id" value:"rep" > counterEvent:<name:"AppInstanceExceededLogRateLimitCount" delta:1 total:206 >
</pre>
Each Diego Cell in a deployment has a unique `AppInstanceExceededLogRateLimitCount` counter. The `total` value of the counter is the sum total of all app
instances on that Diego Cell that have exceeded the log rate limit since the creation of the Diego Cell. When there are no app instances exceeding the log
rate limit, Diego Cells do not emit the `AppInstanceExceededLogRateLimitCount` metric.
For example, `app-instanceA` and `app-instanceB` are running on one Diego Cell, `app-instanceC` and `app-instanceD` are running on a second Diego Cell, and
the current `total` for the `AppInstanceExceededLogRateLimitCount` is `125` on the first Diego Cell and `43` on the second Diego Cell. If `app-instanceD`
exceeds the log rate limit, the second Diego Cell emits the `AppInstanceExceededLogRateLimitCount` metric with a incremented `total` value of `44`. However,
the first Diego Cell does not emit the `AppInstanceExceededLogRateLimitCount` metric, and the `total` value for the `AppInstanceExceededLogRateLimitCount`
metric on the first Diego Cell is still `125`.
A Diego Cell emits the `AppInstanceExceededLogRateLimitCount` metric conditionally when an app instance on that Diego Cell begins to exceed the log rate
limit. For example, `app-instanceC` and `app-instanceD` are on the same Diego Cell. If `app-instanceC` exceeds the log rate limit continually over a
ten-minute period, and `app-instanceD` exceeds the log rate limit during the first three minutes of each five-minute interval within that ten-minute period
and then stops, the Diego Cell emits the `AppInstanceExceededLogRateLimitCount` metric three times within that ten-minute period.
## <a id='alert-exceeded'></a> Configuring an alert for the AppInstanceExceededLogRateLimitCount metric
If you are using a third-party log management service, you can configure an alert for when the aggregated sum of the `AppInstanceExceededLogRateLimitCount`
metric across all the Diego Cells in <%= vars.app_runtime_abbr %> has been incremented more than a certain number of times or over a certain percentage in the
last five or more minutes.
When you configure this alert, consider the number of app instances running on <%= vars.app_runtime_abbr %>, the logging rate that you configured in
<%= vars.app_runtime_abbr %>, your other <%= vars.app_runtime_abbr %> configuration settings, and so on.
For more information about third-party log management services, see [Streaming App Logs to Log Management Services](../devguide/services/log-management.html).
## <a id='view-counter'></a> Identifying apps that exceed the app log rate limit
Diego also logs when a noisy app instance exceeds the log rate limit configured in <%= vars.app_runtime_abbr %>. A log message similar to the following example:
appears in the log stream for the noisy app:
<pre class="terminal">
2022-08-22T12:42:18.90-0800 [APP/PROC/WEB/0] OUT app instance exceeded log rate limit (1024 bytes/sec)
</pre>
To identify which app instances are exceeding the app log rate limit:
<p> The Firehose and Log Cache plug-ins are developed by the open-source Cloud Foundry community and are not supported by
VMware.</p>
1. In a terminal window, install the Firehose plug-in by running:
```
cf install-plugin 'Firehose Plugin'
```
1. Install the Log Cache plug-in by running:
```
cf install-plugin 'log-cache'
```
1. Filter your app log messages by running:
```
cf nozzle -f LogMessage | grep "app instance exceeded log rate limit"
```
This command returns all logs with log messages containing `"app instance exceeded log rate limit"`, similar to the following example:
<pre class="terminal">
origin:"rep" eventType:LogMessage timestamp:1583859621886751670 deployment:"warp-drive" job:"diego-cell" index:"3a574bde-91df-48b8-ae21-1d6913da0908" ip:"10.0.1.33" tags:<key:"app_id" value:"34bcfafc-402b-4bb4-84db-aea5401b79eb" > tags:<key:"app_name" value:"app-2" > tags:<key:"instance_id" value:"0" > tags:<key:"organization_id" value:"a30f39c2-4ff3-48a1-a869-a9ed21812a61" > tags:<key:"organization_name" value:"test" > tags:<key:"process_id" value:"34bcfafc-402b-4bb4-84db-aea5401b79eb" > tags:<key:"process_instance_id" value:"92e2ee78-3a1d-41a6-4933-e47b" > tags:<key:"process_type" value:"web" > tags:<key:"source_id" value:"34bcfafc-402b-4bb4-84db-aea5401b79eb" > tags:<key:"space_id" value:"0e2d2d58-3ef5-43f3-b880-c8a30903a96b" > tags:<key:"space_name" value:"test-2" > logMessage:<message:"app instance exceeded log rate limit (1024 bytes/sec)" message_type:OUT timestamp:1583859621886751670 app_id:"34bcfafc-402b-4bb4-84db-aea5401b79eb" source_type:"APP/PROC/WEB" source_instance:"0" >
</pre>
You can inspect these logs to identify which app instances are exceeding the log rate limit.