Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fallback nginx to serve "something went wrong" #218

Open
5 tasks
mrnicegyu11 opened this issue Jun 26, 2023 · 3 comments · May be fixed by #950
Open
5 tasks

Add fallback nginx to serve "something went wrong" #218

mrnicegyu11 opened this issue Jun 26, 2023 · 3 comments · May be fixed by #950
Assignees
Milestone

Comments

@mrnicegyu11
Copy link
Member

mrnicegyu11 commented Jun 26, 2023

Wishes

  • If we need to restart traefik, this shall ideally still be served
  • This is still present, even on force restart simcore
  • Can be updated without affecting running services
  • This can be tested
  • Whenever webserver / api-server / invitations are unhealthy, fallback service serves content instead of unhealthy services

Useful resources

Blocked until

Old description

The maintenance page should always run as a kind of fallback/404 as a last priority. We tried running an oSparc with the maintenance page up with prio=1 (lowest prio), but the e2e/p2e showed problems. I closer look at all traefik routes and a fine-tuning are required to make this work properly.

@mrnicegyu11 mrnicegyu11 added t:enhancement New feature or request p:mid-prio labels Jun 26, 2023
@mrnicegyu11 mrnicegyu11 changed the title EPIC: Refactor maintenance page related traefik routing Refactor maintenance page related traefik routing Oct 30, 2023
@YuryHrytsuk YuryHrytsuk added this to the Kobayashi Maru milestone Nov 27, 2023
@YuryHrytsuk
Copy link
Collaborator

@mrnicegyu11 Do we really want to show maintenance page in case webserver is down? I think that we better show maintenance page when we really do maintenance in which case we can always set up it explicitly. In case we show maintenance page every time the webserver cannot be reached. It can make users think that all fine and the maintenance is happening while we are actually have no clue that something is broken?

@YuryHrytsuk YuryHrytsuk removed their assignment Dec 22, 2023
@YuryHrytsuk YuryHrytsuk self-assigned this Mar 25, 2024
@mrnicegyu11 mrnicegyu11 changed the title Refactor maintenance page related traefik routing Add fallback nginx to serve "something went wrong" page Mar 25, 2024
@YuryHrytsuk YuryHrytsuk removed their assignment Mar 25, 2024
@mrnicegyu11 mrnicegyu11 removed this from the Kobayashi Maru milestone May 14, 2024
@YuryHrytsuk YuryHrytsuk self-assigned this Jan 20, 2025
@YuryHrytsuk YuryHrytsuk added this to the Singularity milestone Jan 20, 2025
@YuryHrytsuk YuryHrytsuk changed the title Add fallback nginx to serve "something went wrong" page Add fallback nginx to serve "something went wrong" Jan 23, 2025
@pcrespov
Copy link
Member

The public-api is a machine-to-machine API. A fallback for maintenance should respond with status 503

Image

{
  "errors": [ "Under maintenance ... (human readable message)" ]
}
content-type: application/json 
Retry-After: <http-date>
Retry-After: <delay-seconds>

@YuryHrytsuk
Copy link
Collaborator

After discussion with @sanderegg, we decided that this shall probably live in osparc-simcore repository (I am not sure why exactly so). To keep fallback service with main services in sync, we need to introduce ENV VARs for shared traefik configuration. In simcore we can also add integration tests to test fallback logic. In ops-env we will need to override configs definitions to ensure smooth config updates.

This is taking way more than expected. I will probably pause this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants