Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[831] Remove references to GOV.UK PaaS #190

Merged
merged 1 commit into from
Dec 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
# Ignore bundler config
/.bundle

# bundled gems
/vendor

# Ignore cache
/.sass-cache
/.cache
Expand Down
14 changes: 1 addition & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,11 @@ How we build and operate products at the Department for Education. This repo
is inspired by (and steals shamelessly from) the [GDS Way](https://gds-way.cloudapps.digital) and the
[Ministry of Justice Technical Guidance](https://ministryofjustice.github.io/technical-guidance/#moj-technical-guidance).

It's built using the GOV.UK [tech-docs-template](https://github.com/alphagov/tech-docs-template), and hosted on [GOV.UK PaaS][].
It's built using the GOV.UK [tech-docs-template](https://github.com/alphagov/tech-docs-template).

## Add a new guidance
See the [Adding a new guidance][/guides/adding-new-guidance] section.

## GOV.UK PaaS set-up
The application is called `dfe-technical-guidance` and is supported by the [Staticfile buildpack][] . It is deployed in the space
`technical-architecture`, in the `dfe` organisation.

The custom domain, SSL certificate and CDN are provided by the `technical-guidance` [cdn-route][] service.

The deploy workflow connects to paas using service account [email protected] (a Google group).

The review apps are deployed to the `technical-architecture-dev` space and their name is suffixed by the PR number. There is no
cdn-route service, we simply use the default `.london.cloudapps.digital` domain.

## Licence

The documentation is [© Crown copyright][copyright] and available under the terms of the [Open Government 3.0][ogl] licence.
Expand All @@ -29,7 +18,6 @@ The documentation is [© Crown copyright][copyright] and available under the ter
[mit]: LICENCE
[copyright]: http://www.nationalarchives.gov.uk/information-management/re-using-public-sector-information/uk-government-licensing-framework/crown-copyright/
[ogl]: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
[GOV.UK PaaS]: https://www.cloud.service.gov.uk/
[Staticfile buildpack]: https://docs.cloudfoundry.org/buildpacks/staticfile/index.html
[cdn-route]: https://docs.cloud.service.gov.uk/deploying_services/use_a_custom_domain/#managing-custom-domains-using-the-cdn-route-service
[govuk-tech-docs gem]: https://github.com/alphagov/tech-docs-gem
Expand Down
2 changes: 0 additions & 2 deletions data/site.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
{
"digital_tools": "/infrastructure/support/#digital-tools-support",
"gov_uk_paas": "/infrastructure/hosting/govuk-paas/",
"service_portal": "/infrastructure/support/#access-to-service-portal",
"gov_uk_paas_decommission": "GOV.UK PaaS will be decommissioned at the end of 2023, no new service should be built using it. It is still available for running existing services and creating prototypes."
}
6 changes: 2 additions & 4 deletions source/guides/adding-new-guidance/index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ Note that review apps only worked for pull requests raised in the original repos
Make sure to make changes in a branch. Every change should be reviewed in a pull request, no matter how minor, and we've enabled
[branch protection](https://help.github.com/articles/about-protected-branches/) to enforce this.

Once the pull request is merged, the deploy Github action workflow runs the build and pushes the static site to GOV.UK PaaS.
Once the pull request is merged, the deploy Github action workflow runs the build and pushes the static site to AKS.

## Review apps
Every pull request builds a separate _review app_. It is a unique version of the documentation implementing the changes from
the pull request and pushed to GOV.UK PaaS with a unique URL so it can be shared and peer reviewed. The URL is posted in a
the pull request and pushed to AKS with a unique URL so it can be shared and peer reviewed. The URL is posted in a
comment on the pull request.

Any change to the branch is automatically pushed to the review app after a few minutes.
Expand All @@ -31,8 +31,6 @@ When the pull request is closed or merged, the review app is deleted.
In the [DfE Digital technology guidance repository](https://github.com/DFE-Digital/technology-guidance), in the `source` directory, create a
subdirectory representing the name of the guidance.

For example: `source/guides/govuk-paas`.

Inside the new directory, create an `index.html.md.erb` file following this pattern:

```markdown
Expand Down
2 changes: 1 addition & 1 deletion source/guides/continuous-delivery/index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The live application or website should always be up and running and should never
1. **Unit and linting tests** are usually run by the developers manually and by the automated workflow when they push a branch. They allow testing a new feature in isolation, but also in integration with other features or different dataset.
1. **Security tests** check there are no known vulnerabilities in the produced code or image
1. **Smoke tests**, **integration tests**, **acceptance tests** validate the application in a production-like environment to iron out issues with the environment, data or dependencies. A staging or pre-production environment with the same configuration as production may be used. The data may refreshed daily with sanitised production data.
1. When a new version of the code is deployed to production, there should be **zero downtime** and it should be transparent to end users. Blue-green, rolling or canary deployments may be used. We typically use the blue-green deployment on GOV.UK PaaS and deployment slot swap on Azure.
1. When a new version of the code is deployed to production, there should be **zero downtime** and it should be transparent to end users. Blue-green, rolling or canary deployments may be used. We typically use the rolling deployment on kubernetes and deployment slot swap on Azure.
1. **Monitoring** should run continuously to check the production application health. [StatusCake](https://www.statuscake.com/) or [Azure ping tests](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability) are used for simple and fast pings. Smoke tests may also validate the business logic, as implemented in Teaching vacancies via Github actions.

## Repeatability
Expand Down
10 changes: 0 additions & 10 deletions source/guides/default-technology-stack/index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -43,16 +43,6 @@ For more information about CIP and the onboarding process of services and users

Community support for Azure use in general can also be gained from the community in the [#cloud-platform Slack channel](https://ukgovernmentdfe.slack.com/app_redirect?channel=C7L4D0LM9).

### GOV.UK Platform as a Service

<%= warning_text(data.site.gov_uk_paas_decommission) %>

We also offer _GOV.UK Platform as a Service_ (GOV.UK PaaS) for applications requiring less customisation than provided by a full
infrastructure-as-a-service platform such as Azure. It is suitable for web services following typical GOV.UK patterns and allows for
rapid deployment without requiring technical expertise.

For more details please refer to the [GOV.UK Platform as a Service](<%= data.site.gov_uk_paas%>)

### Infrastructure as code

DfE uses [Terraform](https://www.terraform.io/) and [Azure Resource Manager (ARM)](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-authoring-templates) templates for automating and scripting Azure infrastructure creation and changes.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ For further guidance on the C4 model, visit [here](https://c4model.com/).

## Adding C4 generation to the Tech Docs template

**Prerequisites:** you have some C4 diagrams-as-code ready to publish, and your technical documentation site is already using GitHub Actions to publish to GOV.UK PaaS.
**Prerequisites:** you have some C4 diagrams-as-code ready to publish, and your technical documentation site is already using GitHub Actions to publish to AKS.

The [structurizr.com](https://structurizr.com/) website provides a SaaS offering to generate diagrams from this code. There is a Free Tier Cloud offering available which allows for one workspace to be created ([feature comparison](https://structurizr.com/products)).

Expand Down
5 changes: 0 additions & 5 deletions source/infrastructure/dev-tools/index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,6 @@ The Azure Command-Line Interface (CLI) is a cross-platform command-line tool to

[How to Install the Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)

## Cloud Foundry CLI
Cloud Foundry is an open source cloud computing platform. It is the platform the [UK Gov PaaS](https://www.cloud.service.gov.uk/) is delivered on.

[GOV.UK PaaS Getting Started (Cloud Foundry)](https://docs.cloud.service.gov.uk/get_started.html)

## GIT
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

Expand Down
36 changes: 13 additions & 23 deletions source/infrastructure/disaster-recovery/index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,21 @@ The application may crash because of a bug, memory leak, high utilisation…
|-|-|
|Impact|It may or may not impact end users as a service may deploy multiple application instances.|
|Prevention|Crashes may happen because of high memory, CPU or disk usage. These metrics should be monitored and notify in advance to avoid the crash entirely.|
|Detection|Endpoint monitoring like `StatusCake` would notify of a total outage impacting users, if the whole application crashes. An application _instance_ crash may be reported by monitoring.<br/>In the case of GOV.UK PaaS, the Prometheus metric `crash` increases if an instance crashes.|
|Remediation|The quickest action is to roll back the problematic change or roll forward with a fix. Ideally the platform detects a failing application and restarts it.<br/>For example GOV.UK PaaS detects the failure by running frequent healthchecks. Then it deploys a new container and kills the failed one.<br/>If there is no such feature, the application may be restarted manually. If the restart doesn't work, the application and infrastructure must be investigated manually.|
|Detection|Endpoint monitoring like `StatusCake` would notify of a total outage impacting users, if the whole application crashes. An application _instance_ crash may be reported by monitoring.|
|Remediation|The quickest action is to roll back the problematic change or roll forward with a fix. Ideally the platform detects a failing application and restarts it.<br/>For example kubernetes detects the failure by running frequent healthchecks. Then it deploys a new container and kills the failed one.<br/>If there is no such feature, the application may be restarted manually. If the restart doesn't work, the application and infrastructure must be investigated manually.|

## Data corruption
The data in the database is corrupted because of a bug, human error, malicious activity… and cannot be recovered.

|||
|-|-|
|Impact|Some data may be lost, updated with incorrect value or may be presented to the wrong users.|
|Prevention|GOV.UK PaaS keeps backups of the database and transaction logs. We can recreate the database with daily or point-in-time (1s resolution) backup|
|Prevention|Azure postgres keeps backups of the database and transaction logs. We can recreate the database with daily or point-in-time (1s resolution) backup|
|Detection|Smoke tests may detect corruptions in some critical data.|
|Remediation|Access to the service should be stopped immediately.<br/>The data may be fixed manually if the change is simple. If the change is complex or if we don't know the extent of the issue, it may be necessary to recover the database from a backup whether daily, hourly or point-in-time using transaction logs.<br/>[Restore database](https://docs.cloud.service.gov.uk/deploying_services/postgresql/#postgresql-service-backup) with latest snapshot or point in time|

## Loss of database instance
It is possible to lose the database instance and the associated backups. For example, if a database service is deleted from GOV.UK PaaS, in case of human or automation error, the whole instance is deleted, including its backups.
It is possible to lose the database instance and the associated backups. For example, if the database server is deleted from Azure, in case of human or automation error, the whole instance is deleted, including its backups.

|||
|-|-|
Expand All @@ -50,12 +50,12 @@ It is possible to lose the database instance and the associated backups. For exa
|Remediation|Restore database from external daily or most recent backup|

## Loss of Azure/AWS availability zone
We deploy to PaaS London region which has 3 separate availability zones (AZ). It may happen that one of them is unavailable: either network, compute or storage services are affected.
We deploy to the UK South or West Europe regions which have 3 separate availability zones (AZ). It may happen that one of them is unavailable: either network, compute or storage services are affected.

|||
|-|-|
|Impact|Applications may be slow or unavailable|
|Prevention|Applications should be built with failure in mind: deploy multiple application instances and deploy databases in cluster mode. Spread them across multiple AZs for high availability.<br/>GOV.UK PaaS is PaaS is spread across 3 AZs. Scale applications to more than 1 instance and choose `HA` database plans.|
|Prevention|Applications should be built with failure in mind: deploy multiple application instances and deploy databases in cluster mode. Spread them across multiple AZs for high availability.<br/>Our AKS clusters are spread across 3 AZs. Scale applications to more than 1 replicas and enable zone redundancy.|
|Detection|Endpoint monitoring checking for uptime and response time|
|Remediation|If not handled automatically by the platform, redeploy applications and fail over clusters|

Expand All @@ -69,17 +69,7 @@ In some rare cases, an entire region might become unavailable.
|Detection|Endpoint monitoring checking for uptime|
|Remediation|Start services in backup region, trigger DNS failover|

## GOV.UK PaaS unavailable
When our services are on GOV.UK PaaS, any problem with platform may impact us. See [GOV.UK PaaS Support](<%= data.site.gov_uk_paas%>overview/#platform-support).

|||
|-|-|
|Impact|Services may be slow or unavailable. Or the service may be available but operations and deployments are broken.|
|Prevention|For critical applications, it is possible to deploy to 2 different regions (London and Ireland), synchronise the data, configure a DNS based failover or GSLB. We don’t usually protect against this risk as it is not worth the complexity of the required set-up.|
|Detection|Endpoint monitoring checking for uptime|
|Remediation|Start services in backup region, trigger DNS failover|

## Azure issues impacting GOV.UK PaaS
## Azure issues impacting delivery
We often rely on Azure for:

- Terraform state in Azure Storage
Expand All @@ -99,7 +89,7 @@ An attacker may send a high number of requests to overload the service and make
|||
|-|-|
|Impact|The service is unavailable or slow for users|
|Prevention|Every property in Azure is protected by [Azure's infrastructure DDoS (Basic) Protection](https://docs.microsoft.com/en-us/azure/ddos-protection/ddos-protection-overview).<br/>All apps on GOV.UK PaaS using a [custom domain](https://docs.cloud.service.gov.uk/deploying_services/use_a_custom_domain/) are protected by [AWS Shield Standard](https://aws.amazon.com/shield/features/)<br/>Depending on the criticality of the service, it is possible to use Azure DDoS Protection Standard instead.|
|Prevention|Every resource in Azure is protected by [Azure's infrastructure DDoS (Basic) Protection](https://docs.microsoft.com/en-us/azure/ddos-protection/ddos-protection-overview)<br/>Depending on the criticality of the service, it is possible to use Azure DDoS Protection Standard instead.|
|Detection|Endpoint monitoring checking for uptime and response time|
|Remediation|Protection measures are triggered automatically. It is also possible to analyse the traffic pattern and change the application accordingly.|

Expand All @@ -110,13 +100,13 @@ A malicious actor steals credentials or an ex employee still has working credent
|-|-|
|Impact|They may break the app, read or change confidential data|
|Prevention|Separate production environment and tighten security. Non production environments should only hold test or anonymised data.<br/>Revoke access every day or use [Azure PIM](https://docs.microsoft.com/en-us/azure/active-directory/privileged-identity-management/pim-configure) to give users temporary access. Make sure the offboarding process is followed. Use single-sign-on and 2FA when possible.<br/>Do not give databases a public IP.|
|Detection|Azure audit logs, GOV.UK PaaS audit log|
|Detection|Azure audit logs|
|Remediation|Revoke access of the suspicious user, investigate their actions<br/>Rotate secrets they may know and possibly restore the database to a known good state.|

## Disclosure of secrets
Different kind of sensitive information may be posted online accidentally by a developer. On a website like [pastebin](https://pastebin.com/) or committed to a GitHub public repository. Examples:

- _Deployment secrets_ like GOV.UK PaaS credentials, AWS API key
- _Deployment secrets_ like AWS API key
- _Application secrets_ like Google API key
- _Application data_ like a database dump

Expand All @@ -134,7 +124,7 @@ Each service must have a valid SSL certificate otherwise clients cannot connect.
|||
|-|-|
|Impact|Users can't access the website. Or they may ignore browser warnings and could then be tricked into a malicious website.|
|Prevention|Set up auto renewal of certificates stored in [Azure Key Vault](https://docs.microsoft.com/en-us/azure/key-vault/certificates/tutorial-rotate-certificates). Services on PaaS are configured with a custom domain which generates a certificate and renews it automatically. If not auto renewed, set up monitoring of expiry date. Certficates created on DfE's Globalsign are monitored by Operations and owners receive notifications.|
|Prevention|Set up auto renewal of certificates stored in [Azure Key Vault](https://docs.microsoft.com/en-us/azure/key-vault/certificates/tutorial-rotate-certificates). Services using Azure front door are configured with a custom domain which generates a certificate and renews it automatically. If not auto renewed, set up monitoring of expiry date. Certficates created on DfE's Globalsign are monitored by Operations and owners receive notifications.|
|Detection|Email from Operations or notification from monitoring|
|Remediation|If not auto renewed, issue a new DigiCert certificate and install it on the website|

Expand All @@ -144,7 +134,7 @@ A sudden spike in user traffic due to an announcement, a product launch or a coi
|||
|-|-|
|Impact|The system is slow or unresponsive|
|Prevention|Set up response time monitoring.<br/>Run load testing to determine bottlenecks and know how to scale up.<br/>Use CDN for web page caching and internal caching like Redis or Memcached. On GOV.UK PaaS, serve web assets (javascript, CSS...) without [forwarding any header](https://docs.cloud.service.gov.uk/deploying_services/use_a_custom_domain/#forwarding-headers) to optimise caching.|
|Prevention|Set up response time monitoring.<br/>Run load testing to determine bottlenecks and know how to scale up.<br/>Use CDN for web page caching and internal caching like Redis or Memcached.|
|Detection|Alert from response time monitoring, high CPU or memory usage, instances crashing|
|Remediation|Scale applications and services horizontally and vertically<br/>Disable expensive features|

Expand All @@ -165,7 +155,7 @@ A sudden spike in user traffic due to an announcement, a product launch or a coi
|-|-|
|Impact|Users are not impacted, but we would not be able to deploy via automation|
|Prevention|Plan to be able to deploy manually. Have DockerHub or Azure container registry ready as backup registry.|
|Detection|[GitHub status page](https://www.githubstatus.com/). Updates are posted to the #govuk-paas Slack channel.<br/>Notification of pipeline failures|
|Detection|[GitHub status page](https://www.githubstatus.com/)|
|Remediation|Build and deploy manually|

## DockerHub
Expand Down
2 changes: 0 additions & 2 deletions source/infrastructure/hosting/azure-cip/index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ It provides preconfigured [Azure](https://azure.microsoft.com/) subscriptions in

It provides access to most Azure resources including App Services, Container Instances, Virtual Networks, managed databases, Front Door, Key Vault, storage accounts, etc.

It can be used for any workload, especially ones which don’t fit on [GOV.UK PaaS](<%= data.site.gov_uk_paas%>). It can also be used alongside GOV.UK PaaS to complement it. For example: to store secrets, Terraform state, backups, etc.

Portal: [https://portal.azure.com/](https://portal.azure.com/)

## Platform documentation
Expand Down
Loading