-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup ability to provide authenticated ingresses to arbitrary web apps #1502
Conversation
cc @GeorgianaElena for awareness since this one will be needed for the upcoming #1300. |
You can test this out at staging.2i2c.cloud/textbook! |
This looks great to me - I created a second issue (linked in the parent issue) to track documenting this as well. One thing we should make sure to confirm and/or minimally document is how to do this for private repositories, since that's what @arokem was hoping for in the first place! |
JupyterHub provides oauth2, but *not* oidc. This sucks, but as with most problems in computer science, we can solve this with another layer of indirection. In this case, we use https://dexidp.io to setup an oidc provider that is itself authenticated via JupyterHub. This is transparent to the user (because we don't require any consent screens), so in practice it's gonna look like JupyterHub is providing oidc itself. Instead of automatically injecting config from the deployer, we're instead manually copy-pasting config into the .values files directly. This helps with right to replicate, and keeps our deployer smaller
We'll use nginx-ingress to authenticate against our hub, and we'll run an oauth2-proxy as a sidecar with dex so it can be used to authenticate requests!
Was making deployment slower, and this will go away soon
- Uses nginx to serve these, and a 5min git pull to pull a git repo. Copied from earlier docs_service code - Use ingress provider to specify the path under which the static site is available, rather than the hub proxy. This is simpler and allows us to put them anywehere (like staging.2i2c.cloud/textbook rather than staging.2i2c.cloud/services/textbook) - Use nginx-ingress to provide authentication for the static server, via https://kubernetes.github.io/ingress-nginx/examples/auth/oauth-external-auth/
Add a how-to on enabling staticSites instead
Co-authored-by: Georgiana Elena <[email protected]>
And don't count overriding domains as a feature
@choldgraf i've added docs on how to enable it and how frequently it gets updated. Not user facing though. |
Instead of using deploy keys, we use https://github.com/yuvipanda/git-credential-helpers, which is also used for pulling from private repos with nbgitpuller. I've created a new GitHub app for the 2i2c staging hub, at https://github.com/apps/2i2c-staging-hub-static-pull and 'installed' it for the private repo at https://github.com/yuvipanda/test-repo-push. This allows the static-sites pod to pull it, and it's available at https://staging.2i2c.cloud/textbook/ now
I've added support for pulling from private repos, and also added docs on how to do that! I'm going to mark this as ready to review. Need to write a topic/ help page on how this actually works though, but doesn't need to block this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I nitpicked about YAML formatting details and reflected on more verbose naming of the config to help whoever ends up reading this config more confidently guessing what this is about just based on the config names.
Great feature and I think the implementation seems solid overall!
staticSites: | ||
enabled: true | ||
repo: https://github.com/yuvipanda/test-repo-push | ||
branch: master | ||
host: staging.2i2c.cloud | ||
path: /textbook | ||
githubApp: | ||
enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an increased verbosity in the naming of these will be helpful. If you read the config names and feel very confident what they are about.
-
staticSites
For now I understand there is an assumption to only support exposing one static content website. That seems fine to me and support for more can be added over time with some refactoring. The name is currentlystaticSites
in plural. What aboutstaticWebsite
instead? I think that helps readers of this config better understand what this is about, becausesite
is a little vague in the context of reading config assuming you try to understand what what the config is about. -
repo
andbranch
are related to a git repository to be synced.host
andpath
is about Ingress resources and how to access them.githubApp
was something I didn't understand what it was about until I read about it, but I now understand it as configuration of credentials to access the git repo to sync from.I propose more explicit names like
gitRepoUrl
,gitRepoBranch
,ingressHost
,ingressPath
, and perhapsgitRepoCredentials.enabled
, with nested valuesgithubAppId
,githubAppKey
.
I've grown to value clearly named config as even more important than clearly named variables and functions, because config names are like variable names exposed to an even wider audience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@consideRatio thank you for this very helpful feedback. I've renamed and moved around config so it is more self descriptive. I have preferred nesting over compound words, as I think that expresses intent clearer in a hierarchical config format like YAML. I hope this is good from your pov.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
Thanks for iterating on this @yuvipanda! This is acceptable to me and I suggest we go for a merge right now without further iterations.
With that said, I still wanted to convey a brief discussion about aspects that I think is worth considering long term.
Long term reflections
For ease of reflection, this is the schema as it is after changes:
staticWebsite:
enabled: bool
source:
git:
repo: string
branch: string
githubAuth:
enabled: bool
githubApp: string
id: string
privateKey: string
url:
host: string
path: string
Flat vs nested
There is a helm chart best practice about flat vs nested, and then there is for example how kubernetes ingress v1beta1 -> v1 transitioned to be nested from being flat.
My perception is that the k8s project benefits from re-using schema definitions for objects within objects (nested) and values that when going nested, while helm suggests ease of maintenance with flat.
I'm okay with either but have leaned flat especially before values schemas were around.
Content naming and/or purpose naming
With content naming I mean naming a configuration based on what kind of content it is, and with purpose naming i mean to name configuration based on what its purpose relates to.
For example staticWebsite.url.host
probably quite uniquely describes what content is to be set which is good, but helps less with regards to convey how its used and its purpose or use in a technical level. I know its only used by the Ingress resource after inspection, but I didn't feel confident about that before inspection.
In my mind, the best is what you can combine both.
Based on experience on working with a helm chart that grows over time, it has been very helpful when its obvious where to put some new config. If the Helm chart involves config structured based on "content naming", I think it is hard to know where to put something, while when things are named and structured also based on "purpose naming", then its easier I think. I consider the typical labels
and annotations
config. That kind of config kind of must be nested under the respective k8s resource config wise, because there will be a few resources - Deployment, Service, Ingress, ServiceAccount - they may all need labels
and annotations
in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this thoughtful response, @consideRatio. You've convinced me easily it should be called ingress
and not url
and I've made that change.
Re nested vs flat, I very much prefer nested for the following reasons:
- Avoids repetition in the name, making it easier to read the config.
- Allows expansion in the future - we can add a
source.type
that can be set togit
orgoogle-drive
or whatever, and we can add anothergoogleDrive
key at that point without breaking anything undergit
. This combination of settingtype
and then nesting keys under that is one of my fav k8s patterns.
Thank you for thinking deeply about this, and communicating it clearly!
Co-authored-by: Erik Sundell <[email protected]>
@consideRatio I appreciate the thought you put into keeping things consistent, thank you. I've addressed all your concerns I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @yuvipanda!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Should we open an issue to track this? |
Co-authored-by: Georgiana Elena <[email protected]>
🎉🎉🎉🎉 Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/workflows/deploy-hubs.yaml?query=branch%3Amaster |
Setup dex to proivde oidc authentication
JupyterHub provides oauth2, but not oidc. This sucks, but
as with most problems in computer science, we can solve this
with another layer of indirection.
In this case, we use https://dexidp.io to setup an oidc provider
that is itself authenticated via JupyterHub. This is transparent
to the user (because we don't require any consent screens), so in
practice it's gonna look like JupyterHub is providing oidc itself.
Instead of automatically injecting config from the deployer,
we're instead manually copy-pasting config into the .values
files directly. This helps with right to replicate, and keeps our
deployer smaller
Setup oauth2-proxy alongside dex
We'll use nginx-ingress to authenticate against our hub, and we'll
run an oauth2-proxy as a sidecar with dex so it can be used to
authenticate requests!
Add staticSites that can be protected by oauth2-proxy
repo. Copied from earlier docs_service code
static site is available, rather than the hub proxy. This is
simpler and allows us to put them anywehere (like
staging.2i2c.cloud/textbook rather than
staging.2i2c.cloud/services/textbook)
server, via https://kubernetes.github.io/ingress-nginx/examples/auth/oauth-external-auth/
Add support for pulling from private github repos
Instead of using deploy keys, we use
https://github.com/yuvipanda/git-credential-helpers,
which is also used for pulling from private repos with nbgitpuller.
I've created a new GitHub app for the 2i2c staging hub,
at https://github.com/apps/2i2c-staging-hub-static-pull and 'installed'
it for the private repo at https://github.com/yuvipanda/test-repo-push.
This allows the static-sites pod to pull it, and it's available at
https://staging.2i2c.cloud/textbook/ now
Remove
docs-service
TODO
Fixes #1500