Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup ability to provide authenticated ingresses to arbitrary web apps #1502

Merged
merged 20 commits into from
Jul 11, 2022

Conversation

yuvipanda
Copy link
Member

@yuvipanda yuvipanda commented Jul 7, 2022

  • Setup dex to proivde oidc authentication

    JupyterHub provides oauth2, but not oidc. This sucks, but
    as with most problems in computer science, we can solve this
    with another layer of indirection.

    In this case, we use https://dexidp.io to setup an oidc provider
    that is itself authenticated via JupyterHub. This is transparent
    to the user (because we don't require any consent screens), so in
    practice it's gonna look like JupyterHub is providing oidc itself.

    Instead of automatically injecting config from the deployer,
    we're instead manually copy-pasting config into the .values
    files directly. This helps with right to replicate, and keeps our
    deployer smaller

  • Setup oauth2-proxy alongside dex

    We'll use nginx-ingress to authenticate against our hub, and we'll
    run an oauth2-proxy as a sidecar with dex so it can be used to
    authenticate requests!

  • Add staticSites that can be protected by oauth2-proxy

    • Uses nginx to serve these, and a 5min git pull to pull a git
      repo. Copied from earlier docs_service code
    • Use ingress provider to specify the path under which the
      static site is available, rather than the hub proxy. This is
      simpler and allows us to put them anywehere (like
      staging.2i2c.cloud/textbook rather than
      staging.2i2c.cloud/services/textbook)
    • Use nginx-ingress to provide authentication for the static
      server, via https://kubernetes.github.io/ingress-nginx/examples/auth/oauth-external-auth/
  • Add support for pulling from private github repos

    Instead of using deploy keys, we use
    https://github.com/yuvipanda/git-credential-helpers,
    which is also used for pulling from private repos with nbgitpuller.

    I've created a new GitHub app for the 2i2c staging hub,
    at https://github.com/apps/2i2c-staging-hub-static-pull and 'installed'
    it for the private repo at https://github.com/yuvipanda/test-repo-push.
    This allows the static-sites pod to pull it, and it's available at
    https://staging.2i2c.cloud/textbook/ now

  • Remove docs-service

TODO

  • Add documentation

Fixes #1500

@yuvipanda yuvipanda marked this pull request as draft July 7, 2022 04:07
@yuvipanda yuvipanda changed the title Setup ability to provide authenticated ingresses to arbtrary web apps Setup ability to provide authenticated ingresses to arbitrary web apps Jul 7, 2022
@damianavila
Copy link
Contributor

cc @GeorgianaElena for awareness since this one will be needed for the upcoming #1300.

@yuvipanda
Copy link
Member Author

You can test this out at staging.2i2c.cloud/textbook!

@choldgraf
Copy link
Member

choldgraf commented Jul 8, 2022

This looks great to me - I created a second issue (linked in the parent issue) to track documenting this as well.

One thing we should make sure to confirm and/or minimally document is how to do this for private repositories, since that's what @arokem was hoping for in the first place!

yuvipanda added 6 commits July 8, 2022 02:21
JupyterHub provides oauth2, but *not* oidc. This sucks, but
as with most problems in computer science, we can solve this
with another layer of indirection.

In this case, we use https://dexidp.io to setup an oidc provider
that is itself authenticated via JupyterHub. This is transparent
to the user (because we don't require any consent screens), so in
practice it's gonna look like JupyterHub is providing oidc itself.

Instead of automatically injecting config from the deployer,
we're instead manually copy-pasting config into the .values
files directly. This helps with right to replicate, and keeps our
deployer smaller
We'll use nginx-ingress to authenticate against our hub, and we'll
run an oauth2-proxy as a sidecar with dex so it can be used to
authenticate requests!
Was making deployment slower, and this will go away soon
- Uses nginx to serve these, and a 5min git pull to pull a git
  repo. Copied from earlier docs_service code
- Use ingress provider to specify the path under which the
  static site is available, rather than the hub proxy. This is
  simpler and allows us to put them anywehere (like
  staging.2i2c.cloud/textbook rather than
  staging.2i2c.cloud/services/textbook)
- Use nginx-ingress to provide authentication for the static
  server, via https://kubernetes.github.io/ingress-nginx/examples/auth/oauth-external-auth/
Add a how-to on enabling staticSites instead
@yuvipanda
Copy link
Member Author

@choldgraf i've added docs on how to enable it and how frequently it gets updated. Not user facing though.

yuvipanda added 2 commits July 8, 2022 15:11
Instead of using deploy keys, we use
https://github.com/yuvipanda/git-credential-helpers,
which is also used for pulling from private repos with nbgitpuller.

I've created a new GitHub app for the 2i2c staging hub,
at https://github.com/apps/2i2c-staging-hub-static-pull and 'installed'
it for the private repo at https://github.com/yuvipanda/test-repo-push.
This allows the static-sites pod to pull it, and it's available at
https://staging.2i2c.cloud/textbook/ now
@yuvipanda yuvipanda requested a review from a team July 9, 2022 00:07
@yuvipanda yuvipanda marked this pull request as ready for review July 9, 2022 00:07
@yuvipanda
Copy link
Member Author

I've added support for pulling from private repos, and also added docs on how to do that!

I'm going to mark this as ready to review. Need to write a topic/ help page on how this actually works though, but doesn't need to block this PR

Copy link
Contributor

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I nitpicked about YAML formatting details and reflected on more verbose naming of the config to help whoever ends up reading this config more confidently guessing what this is about just based on the config names.

Great feature and I think the implementation seems solid overall!

Comment on lines 5 to 12
staticSites:
enabled: true
repo: https://github.com/yuvipanda/test-repo-push
branch: master
host: staging.2i2c.cloud
path: /textbook
githubApp:
enabled: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an increased verbosity in the naming of these will be helpful. If you read the config names and feel very confident what they are about.

  • staticSites
    For now I understand there is an assumption to only support exposing one static content website. That seems fine to me and support for more can be added over time with some refactoring. The name is currently staticSites in plural. What about staticWebsite instead? I think that helps readers of this config better understand what this is about, because site is a little vague in the context of reading config assuming you try to understand what what the config is about.

  • repo and branch are related to a git repository to be synced. host and path is about Ingress resources and how to access them. githubApp was something I didn't understand what it was about until I read about it, but I now understand it as configuration of credentials to access the git repo to sync from.

    I propose more explicit names like gitRepoUrl, gitRepoBranch, ingressHost, ingressPath, and perhaps gitRepoCredentials.enabled, with nested values githubAppId, githubAppKey.

I've grown to value clearly named config as even more important than clearly named variables and functions, because config names are like variable names exposed to an even wider audience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@consideRatio thank you for this very helpful feedback. I've renamed and moved around config so it is more self descriptive. I have preferred nesting over compound words, as I think that expresses intent clearer in a hierarchical config format like YAML. I hope this is good from your pov.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

Thanks for iterating on this @yuvipanda! This is acceptable to me and I suggest we go for a merge right now without further iterations.

With that said, I still wanted to convey a brief discussion about aspects that I think is worth considering long term.

Long term reflections

For ease of reflection, this is the schema as it is after changes:

staticWebsite:
    enabled: bool
    source:
        git:
            repo: string
            branch: string
    githubAuth:
        enabled: bool
        githubApp: string
            id: string
            privateKey: string
    url:
        host: string
        path: string

Flat vs nested

There is a helm chart best practice about flat vs nested, and then there is for example how kubernetes ingress v1beta1 -> v1 transitioned to be nested from being flat.

My perception is that the k8s project benefits from re-using schema definitions for objects within objects (nested) and values that when going nested, while helm suggests ease of maintenance with flat.

I'm okay with either but have leaned flat especially before values schemas were around.

Content naming and/or purpose naming

With content naming I mean naming a configuration based on what kind of content it is, and with purpose naming i mean to name configuration based on what its purpose relates to.

For example staticWebsite.url.host probably quite uniquely describes what content is to be set which is good, but helps less with regards to convey how its used and its purpose or use in a technical level. I know its only used by the Ingress resource after inspection, but I didn't feel confident about that before inspection.

In my mind, the best is what you can combine both.

Based on experience on working with a helm chart that grows over time, it has been very helpful when its obvious where to put some new config. If the Helm chart involves config structured based on "content naming", I think it is hard to know where to put something, while when things are named and structured also based on "purpose naming", then its easier I think. I consider the typical labels and annotations config. That kind of config kind of must be nested under the respective k8s resource config wise, because there will be a few resources - Deployment, Service, Ingress, ServiceAccount - they may all need labels and annotations in the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this thoughtful response, @consideRatio. You've convinced me easily it should be called ingress and not url and I've made that change.

Re nested vs flat, I very much prefer nested for the following reasons:

  1. Avoids repetition in the name, making it easier to read the config.
  2. Allows expansion in the future - we can add a source.type that can be set to git or google-drive or whatever, and we can add another googleDrive key at that point without breaking anything under git. This combination of setting type and then nesting keys under that is one of my fav k8s patterns.

Thank you for thinking deeply about this, and communicating it clearly!

helm-charts/basehub/templates/dex/configmap.yaml Outdated Show resolved Hide resolved
helm-charts/basehub/templates/dex/deployment.yaml Outdated Show resolved Hide resolved
helm-charts/basehub/templates/dex/deployment.yaml Outdated Show resolved Hide resolved
helm-charts/basehub/templates/dex/secret.yaml Outdated Show resolved Hide resolved
helm-charts/basehub/templates/static/deployment.yaml Outdated Show resolved Hide resolved
helm-charts/basehub/templates/static/deployment.yaml Outdated Show resolved Hide resolved
helm-charts/basehub/templates/static/ingress.yaml Outdated Show resolved Hide resolved
helm-charts/basehub/templates/dex/pvc.yaml Outdated Show resolved Hide resolved
helm-charts/basehub/templates/static/ingress.yaml Outdated Show resolved Hide resolved
Co-authored-by: Erik Sundell <[email protected]>
@yuvipanda
Copy link
Member Author

@consideRatio I appreciate the thought you put into keeping things consistent, thank you. I've addressed all your concerns I think.

Copy link
Contributor

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yuvipanda!!!

Copy link
Member

@GeorgianaElena GeorgianaElena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@GeorgianaElena
Copy link
Member

Need to write a topic/ help page on how this actually works though, but doesn't need to block this PR

Should we open an issue to track this?

@yuvipanda yuvipanda merged commit 958f140 into 2i2c-org:master Jul 11, 2022
@github-actions
Copy link

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/workflows/deploy-hubs.yaml?query=branch%3Amaster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Authenticated and optionally private static websites served from a hub
5 participants