slug | id | title | date | comments | tags | slides | references | |
---|---|---|---|---|---|---|---|---|
168-designing-a-metric-system |
168-designing-a-metric-system |
Designing a metric system |
2019-08-26 11:58 |
true |
|
false |
Log v.s Metric: A log is an event that happened, and a metric is a measurement of the health of a system.
We are assuming that this system’s purpose is to serve metrics - namely, counters, conversion rate, timers, etc. for monitoring the system performance and health. If the conversion rate drops drastically, the system should alert the on-call.
- Monitoring business metrics like signup funnel’s conversion rate
- Supporting various queries, like on different platforms (IE/Chrome/Safari, iOS/Android/Desktop, etc.)
- data visualization
- Scalability and Availability
Two ways to build the system:
- Push Model: Influx/Telegraf/Grafana
- Pull Model: Prometheus/Grafana
The pull model is more scalable because it decreases the number of requests going into the metrics databases - there is no hot path and concurrency issue.
<svg xmlns="http://www.w3.org/2000/svg" xmlnsXlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" viewBox="-0.5 -0.5 361 556" content='7Vttd6I4FP41fpweIIL6sb7Udrc7225nz3T2y54IETMTiSfGqv31k2BAIanSFoSe0/nQIZcQ4Lkvz72X2AKD+WbM4GL2Fw0QaTlWsGmBYctxOl1P/JWC7U7geUoQMhzsRPZe8ICfkRJaSrrCAVpmJnJKCceLrNCnUYR8npFBxug6O21KSfauCxgiTfDgQ6JLv+OAz3bSrmvt5dcIh7PkzralzsxhMlkJljMY0PWBCIxaYMAo5buj+WaAiMQuwWV33dULZ9MHYyjiRS6Y3Nzjn//DP8ZP4W14/zh+Dn+sv6hVniBZqRdWD8u3CQKMrqIAyUWsFuivZ5ijhwX05dm1ULmQzficiJEtDvWHSu6AGEebA5F6yDGic8TZVkxRZ3sKL2UwDlDj9R5+O8F0dgC9p2RQaTxMV96DIg4ULq/AyGkeRqmVJSC16wYJNBAkOweSWzdIbQ2kB8TECwnZFWRzDTHxojwLy5Iz+gsNKKFMSCIaiZn9KSYkJ4IEh5EY+gI+sT7oS9iwCGuX6sQcB4G8jVEPWU1VoArg6KroGTThVKWJroY1CkTcV0PK+IyGNIJktJfmUNnPuaV0odTzE3G+VSQGV5xmlYc2mD8eHP+QS124ajTcqJXjwTYZROJ1H+OJtpuMdxda6Xh/aTzKXHuHGBaYSSM4rs0lXTEfHcHMVfQLWYj4kXmK4CWgR22DIQI5fsoSbematvXwvWbC5As4G0NL/Awn8QQJ3YLiiMfP5/Zb7vBVbkbgBJE+9H+FsRkl7tpywDT+Z9TMMdPVnC/NgdQTZ9IMk1NKi2p7Gb9M8oHCqlGL30lkDqbQ6XQpbCSvu/QZ3q5OV9MmRwSFDE7r55tOjm86rhbkXFcPcqCqIOdpWN1EU7LaDPsaViI5XchDf0uwAI2dBmyyQ/d2kgpS4/57xcUqqDxk0/QmoQ9Ppw/PQB9uVcjatobgGfnDunDcAwqxjxLI2+N9r+w4ngsWiXK7wKzcZIkd4airyo8otp66/jN6+CYkl3c3H5Yl7Jec7NU08cW6AL0knU94ot10nuhpSh0LjoARrJ0mQL52c3Wa6BqCWbuy+tY6whPW3Wo5E/+pxk4Z3iDi2nIX4uwD5yBoyg2uwWVs7C+FQnAUfosDpTC+cvi6m1WE3TPUh6CiquTPr7Ph5VcIr4fuyhug/67H/pWhG3PH5NIztFrGqoiXaKYqDIFG086R1MnNqKJtSJ3s9jlVoVcNl4sFEXBwTKP3xZBCMJcAaq6R5howNdXcZXQ/jJB69eZMxVOmg5o7W3KfqLeN+dLJ3MuIlcpJDmtt4zz3nSnZuxSqJ06jzULoSDa0ynCQNF8q3xk8A+manKGM2syInd4K9AlGcXJE8IRBdYtGYZjv4nmWDmLnnCDqzQDAAkmUkHGxukdkfJ0Ia/RCeVR6BK/QQDsGbE0GWkZWaMS2V2u0fkuH1M62R091R0uM1t2C0bpdZ7S29exyESeUzaxwCxeuWZW9bMvHClzLUd7WiILWnJC2602fMi0nq5hH5vIn+3we2SnokS/1Ps7jkh3NIw/rvY9QY6Q+mKQEtl5DO6YaultZDd2p00/e9mmv8cRl18tc9eYiH6tyLKxSUKdKk4+YBz13sS5GQtC81DwNYEmQawMtyJk+PwG3KoewNJA+k/P3OkSt/mD4CCW3BmHxZo41xEufCgtuYEsg7xodA/+f1zU+4M6e2lzDbhcli1r7jLZh3xynTO7cbZxDACv7JaN2rnDAB0yevJ6TdQrL6Z1wiyOb3crxFcP2NzPgTq2+ovdEZU15f9tAV+k2LK1yPkvH4v6QFP6n/aFW7nA+84EqdOrVqlN9w8olEZFoDiORE7DmRbp8l8wt+ksEAKqC0NEg/I4mQvDvTUs+yJX4m26YUuN4O5w1iD9TltSeLJVOCpQiZpAr4xMDF0sDHa54A0u5PB27ht9onJeO9f3LoznEDWwQadB5+sfbkqATw/1PBnfflva/uwSj3w==' onclick="(function(svg){var src=window.event.target||window.event.srcElement;while (src!=null&&src.nodeName.toLowerCase()!='a'){src=src.parentNode;}if(src==null){if(svg.wnd!=null&&!svg.wnd.closed){svg.wnd.focus();}else{var r=function(evt){if(evt.data=='ready'&&evt.source==svg.wnd){svg.wnd.postMessage(decodeURIComponent(svg.getAttribute('content')),'*');window.removeEventListener('message',r);}};window.addEventListener('message',r);svg.wnd=window.open('https://www.draw.io/?client=1&lightbox=1&edit=_blank');}}})(this);" style={{ cursor: "pointer", maxWidth: "100%", maxHeight: 556 }}
Application
Take a four-step sign up on the mobile app for example
INPUT_PHONE_NUMBER -> VERIFY_SMS_CODE -> INPUT_NAME -> INPUT_PASSWORD
Every step has IMPRESSION
and POST_VERIFICATION
phases. And emit metrics like this:
{
"sign_up_session_id": "uuid",
"step": "VERIFY_SMS_CODE",
"os": "iOS",
"phase": "POST_VERIFICATION",
"status": "SUCCESS",
// ... ts, contexts, ...
}
Consequently, we can query the overall conversion rate of VERIFY_SMS_CODE
step on iOS
like
(counts of step=VERIFY_SMS_CODE, os=iOS, status: SUCCESS, phase: POST_VERIFICATION) / (counts of step=VERIFY_SMS_CODE, os=iOS, phase: IMPRESSION)
Graphana is mature enough for the data visualization work. If you do not want to expose the whole site, you can use Embed Panel with iframe.