From 42e3a65d15a8941aea44ba9f38c212e7c0d94ac1 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf Date: Wed, 23 Nov 2022 18:28:24 +0100 Subject: [PATCH] More updates to SRM --- .gitignore | 5 +- about/distributions/education.md | 5 +- about/distributions/index.md | 85 +++++++------ about/distributions/research.md | 4 +- about/infrastructure/index.md | 12 -- about/infrastructure/security.md | 35 ------ about/service/comparison.md | 45 +++---- about/service/index.md | 81 ++++++------ about/service/options.md | 122 ++++-------------- about/service/shared-responsibility.md | 168 +++++++++++++++++-------- about/service/team.md | 97 -------------- conf.py | 21 +++- index.md | 10 +- noxfile.py | 1 + topic/cloud-costs.md | 87 +++++++++++++ 15 files changed, 371 insertions(+), 407 deletions(-) delete mode 100644 about/infrastructure/index.md delete mode 100644 about/infrastructure/security.md delete mode 100644 about/service/team.md create mode 100644 topic/cloud-costs.md diff --git a/.gitignore b/.gitignore index 9a49481..7319250 100644 --- a/.gitignore +++ b/.gitignore @@ -133,4 +133,7 @@ _build/ # Docs data environments.txt -build_assets \ No newline at end of file +build_assets +images/shared_responsibility_diagram.png +images/collaborative_learning_hub.png +images/scalable_research_hub.png \ No newline at end of file diff --git a/about/distributions/education.md b/about/distributions/education.md index a49cdf3..6329ec3 100644 --- a/about/distributions/education.md +++ b/about/distributions/education.md @@ -1,5 +1,5 @@ (hub-types:education)= -# Collaborative learning hub +# Education and teaching The 2i2c Educational Hubs provide learning environments and infrastructure that is meant for teaching data science. These hubs are inspired by 2i2c's experience with the [DataHubs at UC Berkeley](https://docs.datahub.berkeley.edu/en/latest/) and the [Syzygy service](https://syzygy.ca/) for Canada. @@ -11,7 +11,8 @@ This hub deployment is designed for distributed learning for students with a var Below is a diagram that showcases some of the major components of this hub: -```{figure} https://drive.google.com/uc?export=download&id=1Mr51-s3D_KHPsAuTXbczaQ7mlPZUs9gm +% automatically downloaded in conf.py +```{figure} /images/collaborative_learning_hub.png A high level overview of major components in a collaborative learning hub. ``` diff --git a/about/distributions/index.md b/about/distributions/index.md index 5d2a957..ea655b0 100644 --- a/about/distributions/index.md +++ b/about/distributions/index.md @@ -3,12 +3,6 @@ 2i2c builds and operates **distributions of JupyterHubs** that are tailored for particular use-cases. These services share many of the same infrastructure components, but have customizations and optimizations that are more domain- or community-specific. -:::{note} -Our services are in an "alpha" state, and the service may change over the coming months! -See {external:tc:doc}`2i2c's strategy page in the Team Compass ` for an overview of what we're hoping to do and where we're headed next. -::: - - ```{figure} https://drive.google.com/uc?export=download&id=1vL8ekAtUQ4TEik4-oWIn36VAOITdlmpR :width: 80% @@ -16,33 +10,11 @@ A high-level technical overview of an Interactive Computing Service collaborativ ``` -For more information about specific hub distributions, see the links below. -Otherwise, read onward for high-level information about all of our Managed JupyterHubs. - -## What technology makes up each hub? - -🚀 core infrastructure -: Underneath each 2i2c JupyterHub is a [JupyterHub](https://jupyter.org/hub). These provide interactive computing sessions for each of your users, and connect to the other infrastructure in the cloud. We use [`auth0`](https://auth0.com/) and [CILogon](https://www.cilogon.org/) for authenticating users, which can connect to a number of other authentication protocols (such as OAuth2). - -💻 interfaces -: Each 2i2c JupyterHub has two main interactive interfaces: Jupyter interfaces (Notebook and Lab), and RStudio. Each of them is accessible from your session via `/tree`, `/lab`, and `/rstudio` endpoints in your URL. - -🌄 environment -: Your 2i2c JupyterHub has an environment that has been created for your particular use-case. It exists as a Docker image that your JupyterHub loads when a user starts a new session. These images can either be built with the tool [repo2docker](https://repo2docker.readthedocs.io/), or pulled directly from a Docker registry. The environment also comes pre-loaded with some tools that are helpful for working with JupyterHub, such as [nbgitpuller](https://jupyterhub.github.io/nbgitpuller). See [](environment/custom) for more information. - -🤖 hardware -: 2i2c JupyterHubs can run on most major cloud providers - the primary thing that is needed is a working Kubernetes deployment. By default, 2i2c runs its hubs on Google Cloud, but if communities wish to use a different provider, this can be accomplished as well. This also means that the hardware underlying the Kubernetes deployment is configurable. - -📦 data -: The data that is used by your 2i2c JupyterHub is provided by you! 2i2c JupyterHubs can connect with a variety of public data sources. We recommend using standard data structures or specifications via libraries like [Intake](https://intake.readthedocs.io/en/latest/). Note that 2i2c does not host this data itself, but can build connections between 2i2c JupyterHubs and these data sources. - -## Features of each hub - Here is a brief overview of the major features that are present in each. ```{csv-table} :header-rows: 1 -:widths: 20, 70, 5, 5 +:widths: auto :file: ../../build_assets/feature-matrix.csv ``` @@ -65,27 +37,66 @@ Here is a brief overview of the major features that are present in each. } -(note-on-urls)= -## Where are hubs accessed? -By default all 2i2c JupyterHub get their own URL with the following form: +## JupyterHub in the cloud + +At the core of a community service is one or more [JupyterHubs](https://jupyter.org/hub) that provide an access point for interactive computing and cloud infrastructure for your community members. + +You may access your community JupyterHub at a URL with the following form (though you may choose a custom URL if you wish): ``` ..2i2c.cloud ``` -Each 2i2c JupyterHub has a **hub name** (denoted by ``) and a **community name** (denoted by ``). Communities are collections of hubs around a particular community or collaboration. Each community infrastructure may be run by different teams. For more information, see [](../service/team.md). +JupyterHub provides interactive computing sessions for each of your users, and connect to the other infrastructure in the cloud. +Our JupyterHubs can run on Google Cloud, Amazon AWS, or Microsoft Azure. + +## Authentication + +We use [`auth0`](https://auth0.com/) and [CILogon](https://www.cilogon.org/) for authenticating users, which can connect to a number of other authentication protocols (such as OAuth2). -It is also possible to provide your own URL that points to a 2i2c JupyterHub. +## User interfaces -## Data outside of the hub +Each 2i2c JupyterHub has two main interactive interfaces: Jupyter interfaces (Notebook and Lab), and RStudio. Each of them is accessible from your session via `/tree`, `/lab`, and `/rstudio` endpoints in your URL. -If you wish to access data that exists outside of your 2i2c Hub, it is your responsibility to put this data in the cloud and manage the infrastructure around it. 2i2c does not control this data, it merely provides access to it via your hub infrastructure. +## Custom user environments -## Where are hubs configured and deployed? +Your 2i2c JupyterHub has an environment that has been created for your particular use-case. It exists as a Docker image that your JupyterHub loads when a user starts a new session. These images can either be built with the tool [repo2docker](https://repo2docker.readthedocs.io/), or pulled directly from a Docker registry. The environment also comes pre-loaded with some tools that are helpful for working with JupyterHub, such as [nbgitpuller](https://jupyterhub.github.io/nbgitpuller). See [](environment/custom) for more information. + +## Transparent infrastructure and operations All of the configuration and deployment scripts for the 2i2c JupyterHub can be found at [the `infrastructure/` repository](https://github.com/2i2c-org/infrastructure). This repository contains both the deployment code as well as documentation that explains how it works. It should be treated as "for advanced users only", and is provided for transparency and as a guide for the community to follow if they wish to manage their own infrastructure similar to 2i2c JupyterHub. To learn about how the `infrastructure/` repository works, we recommend checking out the [`infrastructure` documentation](infra:index). See the next sections for more information about each hub distribution. + +## Secure out of the box + +The cloud infrastructure that we manage follows best-practices in deploying cloud applications in a secure manner. +The [Zero to JupyterHub Helm Chart](https://zero-to-jupyterhub.readthedocs.io/en/latest/) is the community standard in deploying JupyterHub in the cloud, and is what 2i2c uses in all of its cloud hubs. +This project follows the principle of "secure by default", and has a number of configuration and design decisions that properly isolate user environments from one another, and prevent them from being able to access resources or data that is forbidden to them. + +As members of the JupyterHub team, we are constantly looking for ways to improve [the security of Zero to JupyterHub](https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html), and use our experience running these hubs to further improve JupyterHub's security. + +### Data privacy + +2i2c will not collect user data for any purpose beyond what is required in order to run a JupyterHub. +Depending on the choices of your community the hub might contain identifiable information (e.g., e-mail addresses used as usernames for authentication), but this will remain within your hub's configuration and is not shared publicly. + +Our {term}`Cloud Engineering Team` will have access to all of the information that is inside a hub (which it requires in order to debug problems and and assist with upgrades), however we will not retain any of this data or move it *outside* of the hub, and will not retain it once the hub is shut down (except in order to transfer data to you at your request). + +## Monitored for abuse and unexpected costs + +We deploy [Grafana Dashboards](https://grafana.com/grafana/dashboards/) along with a [Prometheus Server](https://prometheus.io/) to continuously monitor the usage across all of our hubs. +This provides visual dashboards that allow us to identify abnormal behavior on a hub (such as a single user using unusual amounts of RAM, using a lot of CPU, or making unusual networking requests). + +### Cryptocurrency mining + +Cryptocurrency mining abuse occurs when users take advantage of cloud CPU in order to make money by mining cryptocurrency. +It is a common problem with cloud-based services and platforms. + +There are many different cryptocurrencies out there, but the most common by-far for abuse is [the Monero cryptocurrency](https://www.getmonero.org/) due to its anonymous nature. + +We deploy an open-source tool called [`cryptnono`](https://github.com/yuvipanda/cryptnono) to each of the clusters we manage. +This tool monitors any process that runs on the 2i2c hubs, and automatically kills any that are associated with Monero. diff --git a/about/distributions/research.md b/about/distributions/research.md index 84d0ff8..28f0aad 100644 --- a/about/distributions/research.md +++ b/about/distributions/research.md @@ -1,5 +1,5 @@ (hub-types:scalable-research)= -# Scalable computing hub +# Research and collaboration Scalable computing hubs are designed to let researchers and data scientists leverage cloud infrastructure to facilitate collaboration and interactive computation. They are heavily inspired by [the Pangeo Community infrastructure](https://pangeo.io). @@ -10,7 +10,7 @@ This hub deployment is designed for researchers and teams that wish to do their Below is a diagram that showcases some of the major components of this hub: -```{figure} https://drive.google.com/uc?export=download&id=1gWAIQVKcB-uxuJsBHqlDlRTq88oki1zn +```{figure} /images/scalable_research_hub.png A high level overview of major components in a scalable computing hub. ``` diff --git a/about/infrastructure/index.md b/about/infrastructure/index.md deleted file mode 100644 index cb9eca1..0000000 --- a/about/infrastructure/index.md +++ /dev/null @@ -1,12 +0,0 @@ -# Infrastructure features - -These sections contain information about the technical and cloud infrastructure behind the {term}`Managed JupyterHub Service`. -They describe the major technologies that are used, what kinds of use-cases and workflows are possible, as well as some important considerations that may be relevant to your community. - -```{toctree} -:maxdepth: 2 -../distributions/index.md -../distributions/education -../distributions/research -security.md -``` diff --git a/about/infrastructure/security.md b/about/infrastructure/security.md deleted file mode 100644 index 416dcd4..0000000 --- a/about/infrastructure/security.md +++ /dev/null @@ -1,35 +0,0 @@ -# Security and Abuse - -The cloud infrastructure that we manage follows best-practices in deploying cloud applications in a secure manner. -This page describes a few ways in which we make our hubs more secure and prevent them from abuse. - -## Secure out of the box - -The [Zero to JupyterHub Helm Chart](https://zero-to-jupyterhub.readthedocs.io/en/latest/) is the community standard in deploying JupyterHub in the cloud, and is what 2i2c uses in all of its cloud hubs. -This project follows the principle of "secure by default", and has a number of configuration and design decisions that properly isolate user environments from one another, and prevent them from being able to access resources or data that is forbidden to them. - -As members of the JupyterHub team, we are constantly looking for ways to improve [the security of Zero to JupyterHub](https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html), and use our experience running these hubs to further improve JupyterHub's security. - - -## Data privacy - -2i2c will not collect user data for any purpose beyond what is required in order to run a JupyterHub. -Depending on the choices of your community the hub might contain identifiable information (e.g., e-mail addresses used as usernames for authentication), but this will remain within your hub's configuration and is not shared publicly. - -Our {term}`Cloud Engineering Team` will have access to all of the information that is inside a hub (which it requires in order to debug problems and and assist with upgrades), however we will not retain any of this data or move it *outside* of the hub, and will not retain it once the hub is shut down (except in order to transfer data to you at your request). - - -## Cryptocurrency mining - -Cryptocurrency mining abuse occurs when users take advantage of cloud CPU in order to make money by mining cryptocurrency. -It is a common problem with cloud-based services and platforms. - -There are many different cryptocurrencies out there, but the most common by-far for abuse is [the Monero cryptocurrency](https://www.getmonero.org/) due to its anonymous nature. - -We deploy an open-source tool called [`cryptnono`](https://github.com/yuvipanda/cryptnono) to each of the clusters we manage. -This tool monitors any process that runs on the 2i2c hubs, and automatically kills any that are associated with Monero. - -## Usage monitoring - -We deploy [Grafana Dashboards](https://grafana.com/grafana/dashboards/) along with a [Prometheus Server](https://prometheus.io/) to continuously monitor the usage across all of our hubs. -This provides visual dashboards that allow us to identify abnormal behavior on a hub (such as a single user using unusual amounts of RAM, using a lot of CPU, or making unusual networking requests). diff --git a/about/service/comparison.md b/about/service/comparison.md index 6e22afa..4f0bf4e 100644 --- a/about/service/comparison.md +++ b/about/service/comparison.md @@ -10,13 +10,6 @@ For some excellent comprehensive guides, we also recommend reading these two res - [The Principles of Open Scholarly Infrastructure](https://openscholarlyinfrastructure.org/) describes how infrastructure and services can align themselves with the mission and values of the scholarly community. We recommend that you use services that align closely with these principles. - [The Values and Principles Framework and Assessment Checklist](https://commonplace.knowledgefutures.org/pub/5se1i1qy/release/4) is an assessment checklist to help those in the scholarly community choose services that are aligned with the mission and values of the scholarly community. -:::{tip} -The content on this page can be re-used as a part of "price reasonableness and comparisons" forms when completing contracting for communities. - -In each section below, we'll list a few similar companies and services that can be compared with 2i2c. -Their presence and ordering do not constitute an "endorsement" and are not exhaustive - we are merely trying to be transparent and helpful about the other organizations in this space. -::: - ## 2i2c's qualifications ```{epigraph} @@ -33,7 +26,7 @@ This page describes why we believe that 2i2c and its service model is uniquely s The content on this page can be re-used as a part of "uniqueness and sole source justification" forms when completing contracting for communities. ::: -### 2i2c has expertise in managed cloud infrastructure in research and education +### Expertise in managed cloud infrastructure in research and education Our team has developed and managed cloud infrastructure for over 5 years - first at our previous institutions and now as a part of 2i2c. We follow modern practices for Site Reliability Engineering with cloud infrastructure like Kubernetes and JupyterHub. @@ -47,14 +40,14 @@ Here are a few of the major projects our team memebers have been involved in ove - [The Syzygy Project](https://syzygy.ca/) - A network of federated JupyterHubs for more than 15 Canadian Universities running on national infrastructure. - [The Jupyter Book](https://jupyterbook.org) and [MyST Markdown](https://myst.jupyterbook.org/) projects - A collection of tools and standards for improving scientific and technical communication and authoring with interactive computing. -### 2i2c has expertise in open source workflows and Jupyter +### Expertise in open source workflows and Jupyter 2i2c's team is comprised of several "[Distinguished Contributors](https://jupyter.org/about)" in the Jupyter ecosystem, which is a crucial technical component of this service. We are [core team members of JupyterHub and Binder](https://jupyterhub-team-compass.readthedocs.io/en/latest/team/index.html), and make regular contributions across the Jupyter ecosystem. Moreover, our team has many years of experience with all aspects of the Jupyter stack and we are comfortable interacting with open source communities everywhere. This makes 2i2c uniquely capable of both utilizing and improving this technology through upstream contributions. -### 2i2c has expertise with research and education workflows +### Expertise with research and education workflows 2i2c has years of experience managing cloud resources specifically for research and education communities. We have led and contributed to projects like [the Binder Project](https://docs.mybinder.org/), [the Pangeo Project](https://pangeo.io/), [the Syzygy Project](https://syzygy.ca/), [the UC Berkeley DataHubs](https://docs.datahub.berkeley.edu/en/latest/), and [the Jupyter Book project](https://jupyterbook.org) to serve thousands of users in the research and education community. @@ -62,17 +55,13 @@ As a non-profit, we have defined our mission in order to serve research and educ We strive to build an understanding of their needs, to represent their interests in the Jupyter and open source ecosystem, and to collaborate with them in our operations and development. 2i2c is uniquely positioned to serve as a collaborator for research and education via these efforts. -### 2i2c is a transparent, collaborative non-profit +### A transparent, collaborative non-profit 2i2c is a mission-driven non-profit organization that has a commitment to doing its work openly, transparently, and inclusively. Our mission is to provide researchers and educators with the infrastructure they need to do their work, and to support open source communities that underlie this infrastructure. 2i2c is governed by a [Steering Council](tc:structure:steerco) made of members from the research and education community. 2i2c manages all of our work in public spaces, including [all of our infrastructure](http://github.com/2i2c-org/infrastructure) as well as [all of our organizational strategy and practices](http://team-compass.2i2c.org/). -### The bottom line - -In short, there is no other organization in existence with a focus on open source workflows with Jupyter, extensive expertise in cloud infrastructure and JupyterHub, a commitment to managing non-proprietary and vendor-agnostic tools, a core practice of making upstream contributions to community-run infrastructure, and a non-profit and mission-driven structure. - ## Major factors to consider @@ -202,7 +191,7 @@ How closely does this infrastructure track the latest developments in data scien ## Overview of services Below is a short table summarizing the kinds of services discussed below, and how they (roughly) perform for each of the factors discussed above. -It makes some simplifications and assumptions, and is meant to be a quick and "glanceable" way to compare options: +It makes some simplifications and assumptions, and is meant to be a quick and "glanceable" way to compare options. ::::{grid} 3 :margin: 1 @@ -292,8 +281,15 @@ It makes some simplifications and assumptions, and is meant to be a quick and "g - 🟧 ::: +In addition, jump to a quick explanation of each type of service below: + +:::{contents} Jump to service description +:depth: 1 +:local: +::: + (compare:2i2c)= -## 2i2c's managed cloud service +### 2i2c's managed cloud service As a non-profit, we choose our prices to move forward on a sustainable path to achieve our mission according to {external:tc:ref}`our cost model ` as well as {external:tc:ref}`our growth model `. Our service entails developing and managing entirely open-source, vendor-agnostic, and community-driven infrastructure that is customized for research and education. @@ -336,7 +332,7 @@ Updates : 2i2c's team follows the latest developments in Jupyter and cloud infrastructure, and continuously incorporates them into our managed hubs. (compare:internal)= -## Internal staffing +### Internal staffing The most common way for organizations to achieve similar services is to staff their own internal teams. 2i2c encourages this, as it is aligned with our commitment to open source, vendor-agnostic tools, and the [Right to Replicate your infrastructure](https://2i2c.org/right-to-replicate). @@ -386,7 +382,7 @@ We constantly adjust our own prices and team compensation to be responsive to th ::: (compare:public-infra)= -## Large-scale public infrastructure +### Large-scale public infrastructure Depending on the state or country that you live in, you may be able to access large-scale shared infrastructure that is run by government agencies. For example, the [XSEDE](https://www.xsede.org/) program in the United States provides shared infrastructure that you can access with an application. @@ -431,7 +427,7 @@ Updates : There are more complex processes, bureaucracy, and constraints that manage the maintenance of large-scale infrastructure, and this means it tends to evolve and improve more slowly. (compare:consulting)= -## Consulting companies +### Consulting companies Many companies specialize in technical consulting that is flexible and tailored to an organization's needs. They can build bespoke infrastructure using an open source stack that is similar to the one that 2i2c offers. @@ -475,7 +471,7 @@ Updates : Depends on the consultancy, and their expertise in cloud infrastructure. (compare:saas)= -## Software as a Service Products +### Software as a Service Products There are many companies offering services and platforms via a subscription fee. The experience from a user's perspective may be similar and they may offer some open source tools as part of their services. @@ -514,10 +510,3 @@ Accessible Updates : Dependent on the platform. Most SaaS providers do a reasonable job of staying up to date with modern data and cloud workflows, though they tend to include new features in the form of custom or proprietary workflows. - - -## Bottom line - -There is a large ecosystem of vendors and services available for interactive data science. -2i2c believes that interactive computing is emerging as the vital medium for communications in research and education communities. As a result, we suggest that universities and research communities should build atop non-proprietary tools and commit to services that are vendor-agnostic and respect your [Right to Replicate your infrastructure](https://2i2c.org/right-to-replicate). -You should think about the constraints and principles that you'd like your infrastructure to follow, and choose the right approach for your organization. diff --git a/about/service/index.md b/about/service/index.md index f20b25a..2eff874 100644 --- a/about/service/index.md +++ b/about/service/index.md @@ -1,46 +1,28 @@ (about-the-project)= -# Service model +# Our collaborative service model -This section describes a high-level overview of our {term}`Managed JupyterHub Service` and the major teams, processes, and expectations around this service for both 2i2c and the partner community we work with. -This page provides some high-level information to help you get started, and the sections below go into more detail on our service model and structure. - -```{toctree} -:maxdepth: 1 -team -shared-responsibility -service-objectives.md -comparison -``` +Here we provide a high-level overview of our Managed JupyterHub Service and the major teams, processes, and expectations for both 2i2c and the partner communities we work with. +:::{admonition} Want to partner with us? If you're interested in setting up a service for your community, click the button below to send us an email. ```{button-link} mailto:hello@2i2c.org :color: primary Send us an email. ``` +::: -## What is the hub service? - -```{glossary} -Managed JupyterHub Service - An open, scalable, sustainable cloud service for interactive computing environments in research and education. - It follows a "DevOps as a Service" model where communities in research and education can pay for managed cloud infrastructure that runs on an entirely open source stack, and give you [the right to replicate your infrastructure](https://2i2c.org/right-to-replicate). - - It is run by [2i2c](https://2i2c.org), a non-profit organization that develops and operates interactive computing infrastructure for research and education. - 2i2c values transparency and community driven infrastructure. - The sections below describe the Managed JupyterHub Service, its strategy and goals, as well as information about its major features and pricing. -``` -## Who is this service for? +## Our shared responsibility model -2i2c's Managed JupyterHub Service is designed for communities in research and education who want the following things: +Our hub service is a collaboration between 2i2c and one or more communities. +We break down the responsibilities that must be carried out in order to successfully run a service. +We can then assign or share these responsibilities with partner communities according to their needs and interests. -1. Access to the latest technology in Jupyter and interactive computing for collaborative and scalable data science running in the cloud. -2. To utilize open source, community-driven tools and standards. -3. To partner with a mission-aligned organization that transparently and collaboratively runs infrastructure as a team. -4. To use infrastructure that they could take control of themselves, and that gives the user the [Right to Replicate](overview/right-to-replicate). -5. To use infrastructure that is designed by and for individuals in research and education. -6. To support infrastructure from a non-profit organization that is committed to communities in research, education, and open source. +```{toctree} +:maxdepth: 2 +shared-responsibility +``` (overview/right-to-replicate)= ## Your Right to Replicate your infrastructure @@ -52,15 +34,36 @@ One way in which we adhere to this principle is by respecting the [Community Rig The Right to Replicate gives communities the right to replicate their infrastructure in its entirety elsewhere, with or without 2i2c. ``` -We believe that following this principle will lead to a more equitable and more productive ecosystem for research and education in the cloud. It also helps avoid many of the potential downsides of relying on a cloud vendor for infrastructure. Read the [Right to Replicate](https://2i2c.org/right-to-replicate/) documentation for more information about what this means. +Following this principle leads to a more equitable and more productive ecosystem for research and education in the cloud, and helps avoid many of the potential downsides of relying on a cloud vendor for infrastructure. +Read the [Right to Replicate](https://2i2c.org/right-to-replicate/) documentation for more information about what this means. -:::{seealso} -Check out [](../../admin/howto/replicate.md) for information about replicating a 2i2c JupyterHub. -::: +## Service Level Objectives -## Sustaining open source +Our Service Level Objectives define our **goals in running the service** for each partner community. +This includes goals like service uptime and support responsiveness. -Everything that 2i2c deploys is open source and community-driven projects. -We prioritize using multi-stakeholder projects that are well-supported by a diverse community of contributors. -The resources that we receive to run 2i2c JupyterHubs thus **also go towards making open-source improvements** in these communities so that others may benefit from them. -We see this as an opportunity to solve two problems with one stream of funding: support research and education, and [support open source communities](https://2i2c.org/about#values) in the Jupyter ecosystem and beyond. +```{toctree} +:maxdepth: 2 +service-objectives.md +``` + +## Cost model + +There are two types of costs associated with our service. +We treat each of these separately in order to be transparent about where community costs are coming from. +They will be covered as either two separate invoices, or two different line items on the same invoice. + +**Staff costs** cover all of the human time that goes into managing, supporting, developing, and improving our hub service. +See [](service-offerings) for details, and {external:tc:ref}`the Cost Model section in our Team Compass ` for our staffing cost model. + +**Cloud costs** cover the cost we pay a cloud provider for the infrastructure that powers your service. +This is either on a dedicated cloud cluster, or on cluster that you share with other communities. + See [](costs:cloud) for a guide on how to estimate your community's cloud costs. + +## Comparison to similar services + +A comparison with similar kinds of services, to help you understand your options and the considerations you may want to take. + +```{toctree} +comparison +``` diff --git a/about/service/options.md b/about/service/options.md index 0ca264e..20abc85 100644 --- a/about/service/options.md +++ b/about/service/options.md @@ -1,14 +1,15 @@ -# Services options and cost +(service-offerings)= +# Usecases and prices -2i2c pools resources from communities in order to sustain and grow our team. -We do this by charging fees for our services, and supplementing these fees with grants and donations. -These sections are living documents, and we update them regularly as we learn more. +Our Hub Service is an open, scalable, sustainable cloud service for interactive computing environments. +We offer cloud infrastructure hubs that are designed for use-cases in research and education, and flexible enough to be tailored to the needs of each community. -(service-offerings)= -## Our service offerings and pricing +They run entirely on community-driven and open-source infrastructure, +follow a [community-centric collaborative service model](./index.md), and give you [the right to replicate your infrastructure](https://2i2c.org/right-to-replicate). -A matrix of our services and their prices are at the link below. -It is a living document, and we will continue to update it as we learn more. +A table summarizing our services and their prices are at the link below. +The rest of the pages in this section describe the cloud services that we offer and the use-cases they are designed for. +See [](./index.md) for more about our collaborative service model. ```{button-link} https://docs.google.com/document/d/1FNiDyKNDoe_TgU2WxuNZ5CayYD56tlNJpImQsAIGOmg/edit?usp=sharing :color: primary @@ -16,100 +17,27 @@ It is a living document, and we will continue to update it as we learn more. Our service offerings and prices ``` -## Types of costs - -There are two types of costs associated with our service: **human costs** and **cloud costs**. -We treat each of these separately in order to be transparent about where community costs are coming from. - -- **Staff costs** cover all of the human time that goes into managing, supporting, developing, and improving our hub service. - See [](service-offerings) for details, and {external:tc:ref}`the Cost Model section in our Team Compass ` for our staffing cost model. -- **Cloud costs** cover the cost we pay a cloud provider for the infrastructure that powers your service. - This is either on a dedicated cloud cluster, or on cluster that you share with other communities. - See [](costs:cloud) for more information. - -(costs:cloud)= -## Estimating cloud costs - -We pass through cloud costs directly to our communities in a transparent manner. -This encourages us to continually reduce the cloud costs for our communities, and helps them understand how their decisions affect their cloud bill. - -### What components make up my cloud bill - -There are a few kinds of infrastructure that make up your cloud bill. -Here is a short summary: - -- **Nodes for user sessions**: A "node" is kind-of like a virtual machine or a dedicated computer. It is reserved cloud infrastructure that you can use as you wish. Nodes have resources allocated to them (e.g., `100GB` of RAM). JupyterHub uses dedicated nodes for user sessions, so more users == more nodes. You generally pay cloud providers by the minute for each node used. -- **Storage costs**: In order for users to persist their work over time, we must pay for filesystem storage. This is used to store user notebooks and content, data, etc. You generally pay cloud providers by the `GB` over time. -- **Nodes for hub infrastructure**: In addition to the cloud nodes for user sessions, there are also nodes to run the JupyterHub and supporting infrastructure to manage user log-ins, do monitoring and reporting of activity, etc. -- **Nodes for specialized computing**: For hubs that have scalable computing resources like a Dask Gateway, these generally request special nodes _on the fly_. When a scalable computation is executed, the cloud quickly requests many new nodes to complete the computation, and then removes them when it is done. You pay for the time used for each node during this computation. - -There are some other components that go into your cloud bill (e.g., "networking costs") but these are the major pieces. - -### User actions that impact cloud costs - -Cloud costs depend on a few key factors that you and your community has control over. -Here we list some major considerations (in decreasing order of importance): - -- **Base user resources needed**: The power and complexity of the user environment is the biggest driver of "base cost per user". This is largely driven by the amount of memory (RAM) each user needs. See below for a more in-depth explanation. -- **Community usage over time**: Resources are requested from the cloud "on-demand", meaning that your cloud costs will scale up and down with number of active users at any given moment. -- **User storage over time**: User storage is different from on-demand resources, because it's "always being used" even when you're not logged-in. We recommend storing large datasets and such in cloud object storage, which is much cheaper. -- **Dedicated vs. shared infrastructure**: If your community requires their own dedicated cloud infrastructure (for example, a dedicated Kubernetes cluster) then this will boost your cloud costs because you will not be sharing this cost with other communities. -- **Cloud optimizations**: There are many ways to make cloud infrastructure more efficient and scalable, and the 2i2c engineering team is constantly experimenting with ways to lower costs for communities. For many non-2i2c hubs, inefficiency is a large source of cloud cost, though the 2i2c hubs are already well-optimized. +## An overview of our infrastructure -### Estimate my cloud costs +The section below provides an overview of our infrastructure and the technical features that are available in any of our hubs. -The following is a very rough guideline to follow in order to understand and estimate what your cloud costs might be. -These are similar whether you're using 2i2c to manage your hub, or running it yourself. - -Generally speaking, **the biggest technical driver of cloud costs is user memory (RAM)**. -This is because RAM must be "reserved" on a node, and each node has a finite amount of memory available to it. - -Let's say a user node costs `$100.00` an hour, and comes with `100GB` total RAM. -If each user is guaranteed `1GB` of RAM, then the node can theoretically fit `100` users at a time. -`100` simultaneous users will cost `$100.00` an hour, or roughly `$1 / user / hour`. - -If we double the guaranteed RAM available to users, then the node can now fit `50` users at once (`100 GB / 2 GB per user = 50 users total`). -We now need twice the number of nodes to handle the same number of users. -`100` simultaneous users will now cost `$200.00` an hour, or roughly `$2 / user / hour`. - -In practice, the cost per node depends heavily on the cloud provider, and is constantly in-flux. -**To estimate your own cloud costs**, follow these steps: - -1. **Estimate memory available to each user**. The amount of RAM needed for each user is often the biggest driver of cloud cost. Decide the "maximum" amount of RAM that a user % will generally need, and multiply that by 1.5x. -2. **Determine how many average simultaneous users you'd like a hub to support**. This isn't necessarily the total size of your community, but how many people you think will be % using the hub *at the same time*. -3. **Look up the monthly price for an `n1-highmem-4` node**. This is a basic node type that serves most use-cases and can be used as a benchmark for comparison. - 1. [Go to the Google Cloud pricing page](https://cloud.google.com/compute/vm-instance-pricing). This lists prices for many kinds of nodes with Google Cloud Platform. - 2. Go to the `N1 high-memory machine types` section. This contains prices for all `N1` node types with high memory. - 3. Look at the hourly price for `n1-highmem-4`. - 4. Divide this amount by `n_simultaneous_users_per_hour * GB_per_user`. - 5. This is your estimated extra cost per hour per user. -4. **Estimate storage costs**. Estimate your storage costs based on the expected storage each user will take up. 2i2c's hubs use a standard NFS File Storage for most hubs, which has very fast latency for interactive computing. [Here are Google's file storage prices](https://cloud.google.com/storage/pricing#price-tables), for example. You can estimate these costs based on the expected storage used across all of your users. - -:::{seealso} -We recommend checking out the following resources to learn more about cloud costs. -None of these are guarantees about costs, but should give you a general idea. - -- For general information and explanation, see [the Zero to JupyterHub cost projection documentation](z2jh:cost). -- For educational or "lightweight resources" hubs, see [this rough cost analysis notebook from the UC Berkeley DataHub](https://nbviewer.jupyter.org/github/berkeley-dsep-infra/datahub-usage-analysis/blob/master/notebooks/03-visualize-cost-and-usage.ipynb). -- For data- and compute-intensive hubs, see the Pangeo two-part series on their Kubernetes costs. ([part 1 link](https://medium.com/pangeo/pangeo-cloud-costs-part1-f89842da411d), [part 2 link](https://medium.com/pangeo/pangeo-cloud-cluster-design-9d58a1bf1ad3)) -::: - -### How we estimate cloud costs for communities - -The previous sections give a high-level overview of how to think about cloud costs and how they'll reflect your community's usage. -This section describes how the 2i2c team calculates cloud costs and passes this on to communities. +```{toctree} +../distributions/index.md +``` -Over time, we will refine this process to make it more precise and (as much as possible) directly tied to the usage a community incurs. +## Education use-cases -#### Shared kubernetes clusters +JupyterHub is excellent for educational use-cases, such as providing a cloud-based learning environment for large-scale data science teaching or domain-specific cloud-enabled science. -For hubs that run on **shared Kubernetes clusters**, we estimate their cloud costs via the following process: +```{toctree} +../distributions/education +``` -1. Calculate the monthly cloud bill for this cluster. -2. Calculate the % usage for a specific community, based on the % of RAM requested throughout the month. -3. Estimate a community's cloud costs for that month by calculating `(monthly_cloud_bill_for_cluster * %_usage_for_this_community)`. +## Research use-cases -#### Dedicated kubernetes clusters +JupyterHub is an excellent gateway to cloud-based resources and data analytics environments. +It can be used as a part of distributed scientific collaborations, scientific communities with cloud-based worklfows, and scalable analytics environments for research teams. -For hubs that run on a **dedicated Kubernetes cluster**, a cloud bill will be generated by the cloud provider, 2i2c will pay it in advance, and we will include this cost in the next month's invoice. -This will exactly reflect the cloud charges incurred by the hub in that time. +```{toctree} +../distributions/research +``` diff --git a/about/service/shared-responsibility.md b/about/service/shared-responsibility.md index 74aa5b8..b0c71c5 100644 --- a/about/service/shared-responsibility.md +++ b/about/service/shared-responsibility.md @@ -1,80 +1,146 @@ -# Shared Responsibility Model +```{team} Service Team +``` +# Shared responsibility model 2i2c **shares responsibility for each hub** with the communities we serve.[^similar-models] This aligns with our mission of promoting collaborative and open workflows in research and education. -It also leads to a more effective, more sustainable, and more transparent service[^ironies-automation]. It also helps ensure that the community has the [Right to Replicate](https://2i2c.org/right-to-replicate) their infrastructure. +It also leads to a more effective, more sustainable, and more transparent service[^ironies-automation], and ensures that the community has the [Right to Replicate](https://2i2c.org/right-to-replicate) their infrastructure. Here's how it works: + +1. Define the major responsibilities needed to run a hub service with a community, and categorize them broadly by skillset. +2. Assign responsibilities that are well-suited to the skills and the interests of each group. +3. Choose an operational and cost model for the group so that each actor is empowered to carry out their responsibilities. + +Below is a high-level summary of the major areas of responsibility in this service and how they work together. + +% This figure is not stored with the repository, it is downloaded at build time +% Diagram here: https://drive.google.com/uc?export=download&id=16r5xE7SguunLfMh5LhSynSUfjb7IXs_n +```{figure} /images/shared_responsibility_diagram.png +:figwidth: 80% +An overview of the major teams that collaborate around the cloud service in order to serve a community of users. +``` + +Below we describe these areas in more detail, and define the roles that 2i2c and our partner communities take on in the service. + +:::{contents} +:local: +:depth: 1 +::: + +## Site Reliability Engineering -We define and divide responsibilities via the following process: +**Key goal**: Ensure that the cloud infrastructure is reliable, robust, and scalable. -- Define the major responsibilities needed to run a hub service with a community, and categorize them broadly by skillset. -- Assign responsibilities that are well-suited to the skills and the interests of each group. -- Choose a cost recovery model according to the responsibilities that 2i2c is taking on. -- Choose an operational model for the group so that each actor is empowered to carry out their responsibilities. +### Responsibilities -This section describes the default model that we use with most communities. +% NOTE: Goal is to have max 5 responsibilities per category, to avoid overwhelming people. +1. **Monitor infrastructure for errors**. Continuously monitor cloud infrastructure to identify usability problems before they affect users. +2. **Respond to incidents**. When incidents are identified or reported, carry out an incident response process to diagnose and resolve the incident. +3. **Deploy and configure the cloud environment**. Make the necessary service connections and technical changes to set up the community's environment (e.g., authenetication, connecting with a database or defining RAM per user, etc). +4. **Enhance and develop cloud infrastructure**. Continuously develop and deploy software improvements with the goal of boosting service reliability and scalability. +5. **Operate a Kubernetes cluster**. This is the cloud platform that manages all of a community's infrastructure, and may be shared between many communities. -## Engineering responsibilities +```{role} Site Reliability Engineer +``` -Engineering responsibilities are technical changes needed to configure the hub for a community and to keep it running over time. -Below are a range of Engineering responsibilities. +```{admonition} Role: Site Reliability Engineer +A team of engineers with expertise in cloud infrastructure and open source tools that we use as part of our services. This group of people oversees the cloud infrastructure that a community uses. They perform new development and upgrades, make changes per the request of {team}`Community Representatives`, and coordinate with the {team}`Community Support Team` during incidents and outages. +This is roughly equivalent to a [Site Reliability Engineering Team](https://en.wikipedia.org/wiki/Site_reliability_engineering). -```{figure} https://drive.google.com/uc?export=download&id=1SIhHrzPXSFBZ0yyVpxHm0WYs63k0SBRQ -:width: 80% +See [our Infrastructure documentation](https://infrastructure.2i2c.org/en/latest/) for more information. -An overview of some categories of shared responsibility between the {term}`Cloud Engineering Team` and the {term}`Community Leadership Team`. +**Usually filled by 2i2c team members.** Though we are experimenting with ways to involve community members in our cloud operations. ``` -::::{grid} +### Responsibility breakdown + +Usually, 2i2c assumes responsibility for all of the above, though we are experimenting with ways to involve community members in our cloud operations. + +## Service applications support -:::{grid-item} -:columns: 2 +**Key goal**: Ensure that communities have the skills and understanding needed to use the cloud infrastructure to have an impact. -**Less technical** +### Responsibilities -```{div} mt-auto -**More technical** +1. **Create documentation and training material**. Write and improve content that helps users learn cloud-native workflows and use the infrastructure effectively. +2. **Provide support to community leaders**. Follow our service {external:tc:ref}`support:guide` for community leaders. +3. **Assist with user environment creation**. Provide domain expertise about how to configure and set up the proper environment using [Binder-style repositories](../../admin/howto/environment/index.md). +4. **Create and manage data in the cloud**. If your communities requires access to a cloud-native dataset, format it properly and put it in a place that the hub can connect to. +5. **Run workshops and training**. Training workshops are geared towards community leaders, with the goal of helping them share knowledge with others in their community. + +```{role} Community Guide ``` +```{admonition} Role: Community Guide -::: +An expert practitioner with familiarity in user workflows as well as the technical use-cases that 2i2c's cloud services enable. +Acts as a bridge between the communities we work with and our {role}`Site Reliability Engineer`s. Facilitates information transfer, signal-boosts community needs and requests, and guides communities in utilizing the infrastructure more effectively. -:::{grid-item} -:columns: 10 - -1. **Use** the interactive computing sessions to accomplish the goals of the community. -2. **Advocate and onboard** new users to the hub to grow its user community. -3. **General user support** for generic questions about interactive computing. -4. **Provide user access** via the JupterHub admin panel to create new usernames and administrative users. -5. **Develop domain-specific software** that is relevant to your community members for their specific use-cases. -6. **Define the basic environment** via a Binder-style repository. -7. **Manage data** that is accessed by users. -8. **Guide 2i2c** with requests and feedback for changes to infrastructure -9. **Escalate to 2i2c** when something is wrong. -10. **Complex environment changes** that require more expertise in packaging and environment design. -11. **Develop software for interactive computing** to improve the underlying infrastructure that provides user sessions (e.g., JupyterHub, JupyterLab, etc). -12. **Support open source communities** so that the service infrastructure has a solid and reliable foundation of tools on which it runs, and so that the communities that produce those tools are healthy. -13. **Communicate with cloud provider** for issues related to infrastructure (e.g., requesting resource limit increases). -14. **Manage Kubernetes configuration** to perform updates to a hub or cluster (e.g., changing RAM available). -15. **Deploy and configure hubs** including configuration, guidance for setting up environment, some connections to cloud resources, etc. -16. **Monitor for incidents** to identify usability problems before they affect users. -17. **Develop software for cloud infrastructure** to improve the performance, features, and robustness of Kubernetes and other cloud infrastructure. -18. **Configure Kubernetes** upgrades and improvements for cloud infrastructure. -19. **Respond to incidents** when a more complex or cloud-related change is needed. -20. **Operate a Kubernetes cluster** that is configured to manage one or more JupyterHubs. +See the {ref}`Support Team Documentation ` for more information. -::: +**Usually filled by 2i2c team members.** Though communities with "Power Users" or those with exceptional engineering and computational skills may serve in this role as well. +``` + +### Responsibility breakdown + +Generally shared between 2i2c and the community partners it works with. +We focus our efforts on general use-case training for community leaders, as well as documentation. +However, our base service model does not allow us to spend extensive time managing complex environments or cloud-native datasets on behalf of communities. -:::: +## Community leadership and management +**Key goal**: Ensure that a hub's community has the structure, support, and leadership to make the most of the hub. -## Community guidance +### Responsibilities -```{figure} https://drive.google.com/uc?export=download&id=1S6Y9TQcXXLkrGrhgXQc7kLzq7dxcuw9a -:width: 80% +A team of leaders *within the community that we work with* who act as {team}`Community Representatives` on behalf of their community. This team coordinates more closely with our {team}`Community Support Team`, facilitates the transfer of knowledge between 2i2c teams and communities of users, and helps manage the structure and dynamics of these communities. They also define the strategic mission and goals of each user community, and help us define the definition of "success" for the hub service. -An overview of some categories of shared responsibility between the {term}`Community Support Team` and the {term}`Community Leadership Team`. +1. **Define success for the hub's community**. Community leaders understand the goals of a community's users, and define whether the hub is meeting their needs. +2. **Oversee user access policy**. Decide who has access to the hub, and what permissions they have. Generally done via the JupterHub admin panel. +3. **Manage and cultivate a community around the hub.** Define the community events, processes, structure, and communication channels that are best for a hub's community. +4. **Represent community in decisions and feedback**. Serve as a point of contact for {role}`Site Reliability Engineer`s, make requests for changes to the hub, and surface incidents or problems if they arise. +5. **Make financial decisions about the hub**. Have decision authority for changes that have a financial impact on the infrastructure, and serve as a point of contact for billing matters. + +```{role} Hub Administrator +``` +```{admonition} Role: Hub Administrator + +Trusted community members that perform common administrative operations on a hub that do not require intervention from a Hub Engineer. +{team}`Community Representatives` are the first Hub Administrators, and they may add new Hub Administrators via the JupyterHub interface. +They are able to add users, start/stop servers, and generally have more control over operations on the hub. + +**Filled by a community member**. +``` + +```{role} Community Representative +``` +```{admonition} Role: Community Representative + +Acts as the primary point of contact for a community, and ensures that the interests of the {team}`Hub Community` are represented in the infrastructure, and that the hub serves their needs. + +They have the authority to speak on behalf of the community, and make decisions about the infrastructure that the community uses. + +**Filled by one or two community leaders**. ``` +### Responsibility breakdown + +Community management and leadership is generally the responsibility of the community. + +## Software engineering + +**Key goal**: Improve and maintain open source tools to support community workflows. + +### Responsibilities + +1. **Develop domain-specific software** that is relevant to your community members for their specific use-cases. +2. **Develop software for interactive computing** to improve the underlying infrastructure that provides user sessions (e.g., JupyterHub, JupyterLab, etc). +3. **Support open source communities** so that the service infrastructure has a solid and reliable foundation of tools on which it runs, and so that the communities that produce those tools are healthy. + +**There are no formal roles for this area**. 2i2c does not currently have the capacity for dedicated software development, though it hopes to grow this capacity in the future. + +### Responsibility breakdown + +Software development is performed by community members or their partners. [^ironies-automation]: Even when collaborating with engineering expertise in other organizations, we describe our service model in terms of areas of responsibility, rather than "tiers" of service that provide "burst capacity" or support only on an as-needed basis. This is because service "tiers" often leads to anti-patterns where support is needed from a person that is not empowered to be efforted in their duties (e.g., if they have been away from infrastructure for many months, and only after a series of escalations are needed to debug something). For more information on this, see [the Ironies of Automation](https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdf) as well as [this post](https://blog.acolyer.org/2020/01/08/ironies-of-automation/) and [this post](https://www.thinkautomation.com/automation-advice/the-ironies-of-automation-explored/) explaining its relevance to technology and service delivery. -[^similar-models]: This is inspired by the **Shared Responsibility Model** that is often used to describe cloud services. For example, see [the AWS Shared Responsibility model for compliance](https://aws.amazon.com/compliance/shared-responsibility-model/) and for [Best Practices](https://aws.amazon.com/blogs/industries/applying-the-aws-shared-responsibility-model-to-your-gxp-solution/), the [GxP whitepaper from Google Cloud](https://cloud.google.com/security/compliance/cloud-gxp-whitepaper), and the [Azure Shared Responsibility Model](https://docs.microsoft.com/en-us/azure/security/fundamentals/shared-responsibility). \ No newline at end of file +[^similar-models]: This is inspired by the **Shared Responsibility Model** that is often used to describe cloud services. For example, see [the AWS Shared Responsibility model for compliance](https://aws.amazon.com/compliance/shared-responsibility-model/) and for [Best Practices](https://aws.amazon.com/blogs/industries/applying-the-aws-shared-responsibility-model-to-your-gxp-solution/), the [GxP whitepaper from Google Cloud](https://cloud.google.com/security/compliance/cloud-gxp-whitepaper), and the [Azure Shared Responsibility Model](https://docs.microsoft.com/en-us/azure/security/fundamentals/shared-responsibility). diff --git a/about/service/team.md b/about/service/team.md deleted file mode 100644 index 7fdab74..0000000 --- a/about/service/team.md +++ /dev/null @@ -1,97 +0,0 @@ -(about/roles-for-service)= -# Team structure and roles - -The Managed JupyterHub Service is a **collaborative cloud service** run in partnership with the communities that we serve. -This page describes the major teams and roles that are involved in running this service. - -## Teams and key stakeholders - -The Managed JupyterHub Service is composed of a {term}`Service Team` along with three sub-teams. - -```{figure} https://drive.google.com/uc?export=download&id=16r5xE7SguunLfMh5LhSynSUfjb7IXs_n -An overview of the major teams that collaborate around the cloud service in order to serve a community of users. There are three main teams, and this diagram shows the major traits of each team, as well as a few ways in which they interact with one another. -``` - -### Service team structure - -```{glossary} -Managed JupyterHub Service Team -Service Team - The group of people that collaborate together to run a collaborative cloud service. It is comprised of three major sub-teams: - - 1. The {term}`Cloud Engineering Team` - 2. The {term}`Community Support Team` - 3. The {term}`Partnerships Team` - 4. The {term}`Community Leadership Team` - -Cloud Engineering Team -Engineering Team - A team of engineers with expertise in cloud infrastructure and open source tools that we use as part of our services. This group of people oversees the cloud infrastructure that a community uses. They perform new development and upgrades, make changes per the request of {term}`Community Representatives`, and coordinate with the {term}`Community Support Team` during incidents and outages. - This is roughly equivalent to a [Site Reliability Engineering Team](https://en.wikipedia.org/wiki/Site_reliability_engineering). - - See [our Infrastructure documentation](https://infrastructure.2i2c.org/en/latest/) for more information. - -Community Support Team -Support Team - A team of expert practitioners with familiarity in user workflows as well as the technical use-cases that 2i2c's cloud services enable. This group acts as a bridge between the communities we work with and our {term}`Cloud Engineering Team`, facilitating information transfer, signal-boosting community needs and requests, and guiding communities in utilizing the infrastructure more effectively. - - See the {ref}`Support Team Documentation ` for more information. - -Partnerships Team - A team of experts in building cross-organizational collaborations, contracts, and grants. This team is tasked with forging new partnerships with communities and their organizations, identifying the resources needed to make these partnerships sustainable, and leading the contracting and invoicing process (when needed) to recover our costs. They are the primary interface with our {term}`tc:Fiscal Sponsor`, {term}`Code for Science and Society`. - -Community Leadership Team -Community Team - A team of leaders *within the community that we work with* who act as {term}`Community Representatives` on behalf of their community. This team coordinates more closely with our {term}`Community Support Team`, facilitates the transfer of knowledge between 2i2c teams and communities of users, and helps manage the structure and dynamics of these communities. They also define the strategic mission and goals of each user community, and help us define the definition of "success" for the hub service. -``` - -### Key stakeholders - -In addition, there are two groups of stakeholders that are not directly involved in running the service, but that are important to consider to ensure that each service has the impact that we wish to achieve. -Our {term}`Service Team` spends extra effort interacting with and getting feedback from these stakeholder communities. - -```{glossary} -User Community -User Communities - Anybody that uses the infrastructure on a given hub. These tend to be students, researchers, collaborators, or workshop attendees. They come from a variety of backgrounds and skillsets, but they are all considered to be a part of the community that a hub serves (even if only for a short time). This community is important to our services because the impact of the service is ultimately driven by the work that this community does. - -Open Source Communities -Open Source Community - The distributed communities that lead, develop, and support the open source infrastructure that is used in our collaborative cloud service. Members of the {term}`Service Team` are often also members of these open source communities, and act as liasons to help upstream improvements and lead discussions that are surfaced as part of running our cloud service together. This community is important to our services because part of 2i2c's mission involves using its resources and experience to support and improve the open source communities that underlie our service. -``` - -## Community roles - -The following roles are overseen by one or more members of the user community. -They help direct the infrastructure and service in order to help the community accomplish its goal, and act as leaders to empower the community in using the infrastructure. - -```{glossary} -Community Representative -Community Representatives - Acts as the primary point of contact for a community, and ensures that the interests of the {term}`Hub Community` are represented in the infrastructure, and that the hub serves their needs. - They have the authority to speak on behalf of the community, and make decisions about the infrastructure that the community uses. - - There must be **one or two community representatives for a given community**. - This role is usually filled by someone that is a member of the hub's community of practice. - - Their main responsibilities include: - - - The main point of contact between the hub engineer and the {term}`Hub Community`. - - Collect feedback and questions from users on a hub. - - Surface questions and requests to Hub Engineers via support tickets. - - Oversee the {term}`Hub Administrators`. - -Hub Administrator -Hub Administrators - Trusted community members that perform common administrative operations on a hub that do not require intervention from a Hub Engineer. - {term}`Community Representatives` are the first Hub Administrators, and they may add new Hub Administrators via the JupyterHub interface. - They are able to add users, start/stop servers, and generally have more control over operations on the hub. - - Their responsibilities include: - - - Provide support to users of a hub for common problems that don't require a Hub Engineer to resolve. - - Add new users to a hub, including administrative users. - - Surface major issues or requests to the Community Representative(s). -``` - -Roles that are specific to 2i2c are defined [in the 2i2c Team Compass](https://team-compass.2i2c.org). diff --git a/conf.py b/conf.py index 61031b9..cf6818f 100644 --- a/conf.py +++ b/conf.py @@ -75,11 +75,30 @@ def setup(app): app.add_css_file("custom.css") + app.add_crossref_type("team", "team") + app.add_crossref_type("role", "role") +# -- Custom scripts ------------------------------------------------- -# Scripts to run +# Generate the feature table import subprocess from pathlib import Path build_assets = Path("build_assets") build_assets.mkdir(exist_ok=True) subprocess.run(["python", "feature-table.py"], cwd="scripts") + +# Download figures we keep in Google Drive +from requests import get +figures = { + "https://drive.google.com/uc?export=download&id=1Mr51-s3D_KHPsAuTXbczaQ7mlPZUs9gm": "collaborative_learning_hub.png", + "https://drive.google.com/uc?export=download&id=16r5xE7SguunLfMh5LhSynSUfjb7IXs_n": "shared_responsibility_diagram.png", + "https://drive.google.com/uc?export=download&id=1gWAIQVKcB-uxuJsBHqlDlRTq88oki1zn": "scalable_research_hub.png", +} +for url, filename in figures.items(): + path_image = Path(__file__).parent / "images" / filename + if not path_image.exists(): + print(f"Downloading {filename}...") + resp = get(url) + path_image.write_bytes(resp.content) + else: + print(f"Diagram image exists, delete this file to re-download: {path_image}") diff --git a/index.md b/index.md index 367feb4..1a690a5 100644 --- a/index.md +++ b/index.md @@ -6,8 +6,8 @@ It is divided into a number of **roles and personas** with relevant topics for e :::{seealso} Here are a few other locations with relevant information about 2i2c's services. -- [`team-compass.2i2c.org/managed-hubs/index`](https://team-compass.2i2c.org/en/latest/projects/managed-hubs/index.html): Documentation about {term}`Service Team` processes that are primarily relevant to 2i2c team members. We put this documentation here to prevent [`docs.2i2c.org`](https://docs.2i2c.org) from getting too cluttered. -- [`infrastructure.2i2c.org`](https://infrastructure.2i2c.org): Our {term}`Cloud Engineering Team` and cloud infrastructure documentation. +- [`team-compass.2i2c.org/managed-hubs/index`](https://team-compass.2i2c.org/en/latest/projects/managed-hubs/index.html): Documentation about {team}`Service Team` processes that are primarily relevant to 2i2c team members. We put this documentation here to prevent [`docs.2i2c.org`](https://docs.2i2c.org) from getting too cluttered. +- [`infrastructure.2i2c.org`](https://infrastructure.2i2c.org): Our {team}`Cloud Engineering Team` and cloud infrastructure documentation. ::: This documentation is structured into sections that are meant for various **roles and personas**. @@ -23,7 +23,6 @@ They are meant for individuals who wish to learn about the service for their own :caption: About the service about/service/options about/service/index -about/infrastructure/index ``` ## Use the hub @@ -71,7 +70,7 @@ community/strategy.md ## Community representatives Documentation for those serving as _Community Representatives_. -These tend to cover technical, administrative, and collaborative processes for interacting with 2i2c's team on behalf of your community. +These tend to cover technical, administrative, invoicing, and collaborative processes for interacting with 2i2c's team on behalf of your community. ```{toctree} :caption: Community representatives @@ -80,6 +79,7 @@ These tend to cover technical, administrative, and collaborative processes for i admin/howto/new-hub admin/howto/replicate admin/howto/create-billing-account +topic/cloud-costs ``` ## Reference material @@ -91,4 +91,4 @@ Lists and programmatically-generated content to serve as a quick reference. :maxdepth: 2 about/terminology -``` \ No newline at end of file +``` diff --git a/noxfile.py b/noxfile.py index 31a61f3..f7f0e1c 100644 --- a/noxfile.py +++ b/noxfile.py @@ -11,6 +11,7 @@ def docs(session): AUTOBUILD_IGNORE = [ "_build", "build_assets", + "images/shared_responsibility_diagram.png", ] cmd = ["sphinx-autobuild"] for folder in AUTOBUILD_IGNORE: diff --git a/topic/cloud-costs.md b/topic/cloud-costs.md new file mode 100644 index 0000000..692d324 --- /dev/null +++ b/topic/cloud-costs.md @@ -0,0 +1,87 @@ + +(costs:cloud)= +# Estimat cloud costs + +We pass through cloud costs directly to our communities in a transparent manner. +This encourages us to continually reduce the cloud costs for our communities, and helps them understand how their decisions affect their cloud bill. + +## What components make up my cloud bill + +There are a few kinds of infrastructure that make up your cloud bill. +Here is a short summary: + +- **Nodes for user sessions**: A "node" is kind-of like a virtual machine or a dedicated computer. It is reserved cloud infrastructure that you can use as you wish. Nodes have resources allocated to them (e.g., `100GB` of RAM). JupyterHub uses dedicated nodes for user sessions, so more users == more nodes. You generally pay cloud providers by the minute for each node used. +- **Storage costs**: In order for users to persist their work over time, we must pay for filesystem storage. This is used to store user notebooks and content, data, etc. You generally pay cloud providers by the `GB` over time. +- **Nodes for hub infrastructure**: In addition to the cloud nodes for user sessions, there are also nodes to run the JupyterHub and supporting infrastructure to manage user log-ins, do monitoring and reporting of activity, etc. +- **Nodes for specialized computing**: For hubs that have scalable computing resources like a Dask Gateway, these generally request special nodes _on the fly_. When a scalable computation is executed, the cloud quickly requests many new nodes to complete the computation, and then removes them when it is done. You pay for the time used for each node during this computation. + +There are some other components that go into your cloud bill (e.g., "networking costs") but these are the major pieces. + +## User actions that impact cloud costs + +Cloud costs depend on a few key factors that you and your community has control over. +Here we list some major considerations (in decreasing order of importance): + +- **Base user resources needed**: The power and complexity of the user environment is the biggest driver of "base cost per user". This is largely driven by the amount of memory (RAM) each user needs. See below for a more in-depth explanation. +- **Community usage over time**: Resources are requested from the cloud "on-demand", meaning that your cloud costs will scale up and down with number of active users at any given moment. +- **User storage over time**: User storage is different from on-demand resources, because it's "always being used" even when you're not logged-in. We recommend storing large datasets and such in cloud object storage, which is much cheaper. +- **Dedicated vs. shared infrastructure**: If your community requires their own dedicated cloud infrastructure (for example, a dedicated Kubernetes cluster) then this will boost your cloud costs because you will not be sharing this cost with other communities. +- **Cloud optimizations**: There are many ways to make cloud infrastructure more efficient and scalable, and the 2i2c engineering team is constantly experimenting with ways to lower costs for communities. For many non-2i2c hubs, inefficiency is a large source of cloud cost, though the 2i2c hubs are already well-optimized. + +## Estimate my cloud costs + +The following is a very rough guideline to follow in order to understand and estimate what your cloud costs might be. +These are similar whether you're using 2i2c to manage your hub, or running it yourself. + +Generally speaking, **the biggest technical driver of cloud costs is user memory (RAM)**. +This is because RAM must be "reserved" on a node, and each node has a finite amount of memory available to it. + +Let's say a user node costs `$100.00` an hour, and comes with `100GB` total RAM. +If each user is guaranteed `1GB` of RAM, then the node can theoretically fit `100` users at a time. +`100` simultaneous users will cost `$100.00` an hour, or roughly `$1 / user / hour`. + +If we double the guaranteed RAM available to users, then the node can now fit `50` users at once (`100 GB / 2 GB per user = 50 users total`). +We now need twice the number of nodes to handle the same number of users. +`100` simultaneous users will now cost `$200.00` an hour, or roughly `$2 / user / hour`. + +In practice, the cost per node depends heavily on the cloud provider, and is constantly in-flux. +**To estimate your own cloud costs**, follow these steps: + +1. **Estimate memory available to each user**. The amount of RAM needed for each user is often the biggest driver of cloud cost. Decide the "maximum" amount of RAM that a user % will generally need, and multiply that by 1.5x. +2. **Determine how many average simultaneous users you'd like a hub to support**. This isn't necessarily the total size of your community, but how many people you think will be % using the hub *at the same time*. +3. **Look up the monthly price for an `n1-highmem-4` node**. This is a basic node type that serves most use-cases and can be used as a benchmark for comparison. + 1. [Go to the Google Cloud pricing page](https://cloud.google.com/compute/vm-instance-pricing). This lists prices for many kinds of nodes with Google Cloud Platform. + 2. Go to the `N1 high-memory machine types` section. This contains prices for all `N1` node types with high memory. + 3. Look at the hourly price for `n1-highmem-4`. + 4. Divide this amount by `n_simultaneous_users_per_hour * GB_per_user`. + 5. This is your estimated extra cost per hour per user. +4. **Estimate storage costs**. Estimate your storage costs based on the expected storage each user will take up. 2i2c's hubs use a standard NFS File Storage for most hubs, which has very fast latency for interactive computing. [Here are Google's file storage prices](https://cloud.google.com/storage/pricing#price-tables), for example. You can estimate these costs based on the expected storage used across all of your users. + +:::{seealso} +We recommend checking out the following resources to learn more about cloud costs. +None of these are guarantees about costs, but should give you a general idea. + +- For general information and explanation, see [the Zero to JupyterHub cost projection documentation](z2jh:cost). +- For educational or "lightweight resources" hubs, see [this rough cost analysis notebook from the UC Berkeley DataHub](https://nbviewer.jupyter.org/github/berkeley-dsep-infra/datahub-usage-analysis/blob/master/notebooks/03-visualize-cost-and-usage.ipynb). +- For data- and compute-intensive hubs, see the Pangeo two-part series on their Kubernetes costs. ([part 1 link](https://medium.com/pangeo/pangeo-cloud-costs-part1-f89842da411d), [part 2 link](https://medium.com/pangeo/pangeo-cloud-cluster-design-9d58a1bf1ad3)) +::: + +## How we estimate cloud costs for communities + +The previous sections give a high-level overview of how to think about cloud costs and how they'll reflect your community's usage. +This section describes how the 2i2c team calculates cloud costs and passes this on to communities. + +Over time, we will refine this process to make it more precise and (as much as possible) directly tied to the usage a community incurs. + +### Shared kubernetes clusters + +For hubs that run on **shared Kubernetes clusters**, we estimate their cloud costs via the following process: + +1. Calculate the monthly cloud bill for this cluster. +2. Calculate the % usage for a specific community, based on the % of RAM requested throughout the month. +3. Estimate a community's cloud costs for that month by calculating `(monthly_cloud_bill_for_cluster * %_usage_for_this_community)`. + +### Dedicated kubernetes clusters + +For hubs that run on a **dedicated Kubernetes cluster**, a cloud bill will be generated by the cloud provider, 2i2c will pay it in advance, and we will include this cost in the next month's invoice. +This will exactly reflect the cloud charges incurred by the hub in that time.