Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lambda in Venue account to periodically gather health status #367

Closed
2 tasks done
Tracked by #101
galenatjpl opened this issue Apr 8, 2024 · 2 comments
Closed
2 tasks done
Tracked by #101
Assignees
Labels

Comments

@galenatjpl
Copy link
Collaborator

galenatjpl commented Apr 8, 2024

Implement a (most likely) lambda function that periodically fires off and gathers the health status of each of the below services.
The status will be gathered into a JSON file, which will be uploaded to a S3 bucket:

Screenshot 2024-04-09 at 8 41 00 PM

Where are the Health Check Endpoints defined?

The set of "healthCheck" endpoints will be defined by what's in SSM.
Health check endpoints will be defined in SSM parameters, starting with /unity/healthCheck/...
For example: /unity/healthCheck/<MARKETPLACE_ITEM>/<COMPONENT_NAME>
For shared services, shared-services is effectively the MARKETPLACE_ITEM
example: /unity/healthCheck/shared-services/data-catalog
For venue services, an example would be:
/unity/healthCheck/sps/airflowUi

Who Creates the SSM entries?

The Service Areas (not U-CS) are responsible for creating the SSM entries.

  • If the deployment occurs via the Management Console/Marketplace then the deployment infrastructure as code (IAC, usually terraform) will be responsible for creating the SSM values.
  • Otherwise the SSM entry can be created manually in the venue.

How does the querying occur?

A lambda function periodically fires off (nominally every 5 minutes -- probably leveraging AWS EventBridge) and:

  1. queries SSM for all params starting with /unity/healthCheck/
    • /unity/healthCheck/${PROJECT}/${VENUE}/<MARKETPLACE_ITEM>/<COMPONENT_NAME>
    • /unity/healthCheck/shared-services/<MARKETPLACE_ITEM>/<COMPONENT_NAME>
  2. gathers the health status of each of the URLs found in the /unity/healthCheck/... SSM values. For now, HTTP 200 represents HEALTHY, and anything else represents UNHEALTHY. Some of the URLs represented in the SSM values are endpoints in the shared services AWS account, and others are in the venue account.
  3. Generates the JSON status file, with the statuses (healthy or unhealthy). EXAMPLE JSON file:
{
  "services": [
    {
      "service": "airflow",
      "landingPage":"https://unity.com/project/venue/processing/ui",
      "healthChecks": [
        {
          "status": "HEALTHY",
          "date": "2024-04-09T18:01:08Z"
        }
      ]
    },
    {
      "service": "jupyter",
      "landingPage":"https://unity.com/project/venue/ads/jupyter",
      "healthChecks": [
        {
          "status": "HEALTHY",
          "date": "2024-04-09T18:01:08Z"
        }
      ]
    },
    {
      "service": "otherService",
      "landingPage":"https://unity.com/project/venue/other_service",
      "healthChecks": [
        {
          "status": "UNHEALTHY",
          "date": "2024-04-09T18:01:08Z"
        }
      ]
    }
  ]
}
  1. Upload JSON file to S3 bucket. Use the bucket defined in Create SSM parameter for monitoring S3 bucket name #370

What if the healthCheck endpoint is secured? How will I work around that?

@mike-gangl mentions that there is a methodology for getting the username/password from SSM, then getting a token.
See https://github.com/unity-sds/unity-data-services/blob/develop/cumulus_lambda_functions/lib/cognito_login/cognito_token_retriever.py for an example of how U-DS gets a token.that's getting the cognito login and then something like https://github.com/unity-sds/unity-data-services/blob/develop/cumulus_lambda_functions/stage_in_out/dapa_client.py uses that cognito token to make calls.
See also: https://github.com/unity-sds/sounder-sips-tutorial/blob/develop/jupyter-notebooks/tutorials/2_working_with_data.ipynb

See diagrams and other notes in unity-sds/unity-project-management#101

Dependencies

Other epics or outside tickets required for this to work

@galenatjpl galenatjpl converted this from a draft issue Apr 8, 2024
@galenatjpl galenatjpl added the U-CS label Apr 8, 2024
@galenatjpl galenatjpl changed the title Implement framework in Venue account to periodically gather health Implement lambda in Venue account to periodically gather health status Apr 10, 2024
@galenatjpl galenatjpl moved this from Todo to In Progress in Unity Project Board Jun 4, 2024
@rtapella
Copy link

updated json format: see unity-sds/unity-project-management#101 (comment)

@galenatjpl
Copy link
Collaborator Author

@mike-gangl This ticket is implemented, and we are closing this, to take credit for the work in 24.2. We can run everything manually, and it's fine. We will open up another ticket to do the final testing in 24.3. @hargitayjpl and @jdrodjpl will be getting together to run the test and confirm things.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Unity Project Board Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants