Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor how data is persisted in Azure #2153

Closed
10 tasks done
thekaveman opened this issue Jun 10, 2024 · 5 comments · Fixed by #2166
Closed
10 tasks done

Refactor how data is persisted in Azure #2153

thekaveman opened this issue Jun 10, 2024 · 5 comments · Fixed by #2166
Assignees
Labels
bug Something isn't working docker Application container, devcontainer, Compose, etc. infrastructure Terraform, Azure, etc.

Comments

@thekaveman
Copy link
Member

thekaveman commented Jun 10, 2024

Recently, we have been seeing errors in dev related to attempting sign in to the Admin and the database being in a locked state:

OperationalError: database is locked

While we have only seen this in dev so far, it seems inevitable to come up in test and/or prod, where it will really be a problem if we can't access the Admin or if the app is locked for user requests.

The root of the issue is due to the changes from #1773 to persist the (Sqlite) database file in a mounted Azure file share. This mostly works for reads; but now that we have the potential for more writes via the Admin, it is starting to break down (as expected, see e.g. this SO post).

The idea is to reuse as much of the existing infrastructure and process as possible, with two changes:

  • Azure Web App storage is turned on, to enable Azure to persist the /home directory
  • The database is moved under /home to a non-file share directory

This should solve the Sqlite-over-a-network issue, while allowing us to maintain data configured via the Admin across restarts/deployments.

Acceptance Criteria

  • The setting WEBSITES_ENABLE_APP_SERVICE_STORAGE=true
  • The database file sits within the app container in the /home directory, not on a network file share

Plan

Additional context

This FAQ article about Azure Web App Containers has been very informative: https://techcommunity.microsoft.com/t5/apps-on-azure-blog/things-you-should-know-web-apps-and-linux/ba-p/392472

We can also think about dumping fixture data on some frequency into the network file share for a backup, following the commands from the notes in #1856 (comment)

One idea for implementing this is by using Azure WebJobs, which is now in Preview for Linux Containers and can run simple bash .sh scripts.

@thekaveman thekaveman added bug Something isn't working docker Application container, devcontainer, Compose, etc. infrastructure Terraform, Azure, etc. labels Jun 10, 2024
@thekaveman thekaveman moved this from Todo to In review in Digital Services Jun 10, 2024
@thekaveman thekaveman moved this from In review to In progress in Digital Services Jun 10, 2024
@thekaveman thekaveman self-assigned this Jun 10, 2024
@angela-tran
Copy link
Member

One idea for implementing the frequent fixture dumping is using Azure WebJobs, which is now in Preview for Linux Containers and can run simple bash .sh scripts.

A small wrinkle is that this feature requires the setting WEBSITES_ENABLE_APP_SERVICE_STORAGE=true

Another idea could be to install cron in the Docker container and set up a cron job to run whatever script / commands we need to. Maybe this would get around needing to move all our files.

@thekaveman
Copy link
Member Author

thekaveman commented Jun 11, 2024

Another idea could be to install cron in the Docker container and set up a cron job to run whatever script / commands we need to. Maybe this would get around needing to move all our files.

This is definitely another option. There are a couple issues to work through, most notably cron wants to run as root but we use a non-root user inside the container. We're also already running 2 processes inside the container (nginx and gunicorn) which is against how Docker is "supposed" to be used (1 process per container). cron is finicky to get output from and will often silently fail, we'd need to redirect the output to the Docker logs to be able to see anything (not impossible, but tricky I think).

With a WebJob, we would get some type of logging in the Azure UI and we don't have to worry about the underlying runtime. I'm open to other ideas though if it seems like too much of a change to uproot our source tree.

@angela-tran
Copy link
Member

This is definitely another option. There are a couple issues to work through, most notably cron wants to run as root but we use a non-root user inside the container. We're also already running 2 processes inside the container (nginx and gunicorn) which is against how Docker is "supposed" to be used (1 process per container). cron is finicky to get output from and will often silently fail, we'd need to redirect the output to the Docker logs to be able to see anything (not impossible, but tricky I think).

With a WebJob, we would get some type of logging in the Azure UI and we don't have to worry about the underlying runtime. I'm open to other ideas though if it seems like too much of a change to uproot our source tree.

I agree with the lack of visibility into failures if we just run cron in the container, and I hadn't thought about the user and running processes. Azure WebJobs seems like a good direction to continue with.

@thekaveman
Copy link
Member Author

thekaveman commented Jun 11, 2024

Perhaps another option (this seems difficult to get working correctly as well): avoid cron and WebJobs all together, use GitHub Actions to initiate the dumpdata command via a Remote SSH Connection to the running container.

We would need to modify the container to enable SSH.

This also seems hard to get the SSH server running in the container as a non-root user: https://www.golinuxcloud.com/run-sshd-as-non-root-user-without-sudo/

@thekaveman
Copy link
Member Author

I rewrote this ticket to focus on the just the updates for setting WEBSITES_ENABLE_APP_SERVICE_STORAGE=true and seeing how far that gets us.

The automated fixture dump/restore could be a follow-up if these changes end up working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working docker Application container, devcontainer, Compose, etc. infrastructure Terraform, Azure, etc.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants