-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managed Service Monitoring #101
Comments
Simple health check response format to be supplied by {
"services": [
{
"service": "airflow",
"landingPage":"https://unity.com/project/venue/processing/ui",
"healthChecks": [
{
"status": "HEALTHY",
"date": "2024-04-09T18:01:08Z"
}
]
},
{
"service": "jupyter",
"landingPage":"https://unity.com/project/venue/ads/jupyter",
"healthChecks": [
{
"status": "HEALTHY",
"date": "2024-04-09T18:01:08Z"
}
]
},
{
"service": "otherService",
"landingPage":"https://unity.com/project/venue/other_service",
"healthChecks": [
{
"status": "UNHEALTHY",
"date": "2024-04-09T18:01:08Z"
}
]
}
]
}
In the future, we might add more detail to a healthcheck object, like date of check, error, or a subgraph of other dependencies (database health, api health). This should also accommodate the 'historical' record we envision in the future- where multiple healthchecks can be shown (e.g. daily health) for a given service. |
Would like to see:
|
Think about: Authorization- who owns the username/password for hitting an authenticated endpoint. Multiple components for a service area future: historical records and tracking 'events' |
@mike-gangl I updated the diagram and some descriptions, and some work tickets in the above description |
Updated to include SSM naming parameter:
|
@mike-gangl NOTE: the diagram above is slightly off at this time (still needs an update to have <MARKETPLACE_ITEM> |
Regarding the sample JSON Mike posted earlier. Would like to suggest minor changes.
|
Also, in the list of |
|
@hargitayjpl - see my comment above and the new format of the health check response you'll be writing. camelCase is really the only change, as i think you'll simply pass whatever healthcheck value was supplied by the application. |
We can use title and service interchangeably as long as we're happy using "service" as the "English" label for the service. For the keys, if we use camelCase then we can parse them into title case (e.g., "Camel Case") |
@mike-gangl @hargitayjpl Shared Services Account components: Venue account components: Brandon and I discussed this morning in a meeting, and we want the health components namespaced by what proj/venue they are in. If we simply use something like /unity/healthcheck/airflowUI, it will be ambiguous, and cause data overwrite issues.. |
This is close to being complete. Lambda and crons for proof of concept, management API is up and exposed. |
Related UI work: unity-sds/unity-ui#32 |
Waiting for the U-CS health-endpoint to be ready. Placeholder JSON is being used for the draft implementations of the clients: |
@brianlee731 should we move this to the current release? This is almost done and I think some other service areas need to integrate into what U-CS built. |
Managed Service Monitoring
"As an operator, i want to monitor the health of various Unity services"
NOTE: S3 bucket is defined here.
An example of the health dashboard from AWS:
per venue, you'd have something like:
Market place options:
Each Service needs a health endpoint
UI team to design endpoint to show health dashboard https://apigateway-for-unity/project/venue/health
Question to answer during planning:
Health check SSM params should be defined in the project venue as:
/unity/healthCheck/<MARKETPLACE_ITEM>/<COMPONENT_NAME>
Acceptance Criteria
Work Tickets
Link to work tickets required to implement the epic
Dependencies
Other epics or outside tickets required for this to work
Associated Risks
links to risk issues associated with this epic
Out of scope but future work:
previous
This overlaps with the idea of Common metrics /logs aggregation service #92 .
How do we plan to monitor the deployed managed services. I think to evolve into a full multi-tenant system we need to make sure we are monitoring:
Health of a service
Uptime of a service
Degredation (health?) - if it's responding to requests, how fast does it respond?
Think of a single console that can monitor all of the managed services across multiple accounts. What does this look like? how are logs/metrics/events propagated to the "central" dashboard? Or does the dashboard reach into different accounts to view things?
The text was updated successfully, but these errors were encountered: