Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Earthquake detection use case tutorial #2647

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DarthReca
Copy link
Contributor

This is the draft tutorial for QuakeSet. I have provided two different pipelines:

  • Full training using TorchGeo
  • RandomForest with TorchGeo model embeddings

I am open to suggestions before I finish adding the explanations.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 15, 2025
@adamjstewart adamjstewart added this to the 0.7.0 milestone Mar 15, 2025
@calebrob6
Copy link
Member

calebrob6 commented Mar 19, 2025

Thanks @DarthReca!!

Quick link to the notebook

@calebrob6
Copy link
Member

Notebook ran perfectly for me! The issue with the failing test is because the test is trying to run the notebook end-to-end on a small test-runner VM without a GPU (which will take forever), then timing out.

@adamjstewart -- do you recommend basically skipping all the logic for test purposes?

@adamjstewart
Copy link
Collaborator

adamjstewart commented Mar 19, 2025

I don't skip any logic, I just make it faster. Specifically, use smaller dataset subsets, or train for a single step instead of tens of epochs. We use nbmake for notebook testing, and nbmake specifically added support for variable mocking just for TorchGeo: https://github.com/treebeardtech/nbmake?tab=readme-ov-file#mock-out-variables-to-simplify-testing

We use this in many of our other tests, see the Trainer tutorial for an example, search for fast_dev_run. You have to edit the .ipynb directly using something like vim though, I don't know how to do this in Jupyter itself.

P.S. I realize that this is annoying and makes contributing tutorials prohibitively difficult, but in my experience, any code or documentation that isn't actively tested becomes broken in a matter of months, not years.

@DarthReca
Copy link
Contributor Author

DarthReca commented Mar 19, 2025

Thanks for the suggestions. If I see it correctly from the logs, the system fails at inference rather than training.

A subset will probably work to pass the tests. I can go for a customizable islice. If you think it is fine.

@adamjstewart
Copy link
Collaborator

Another CI limitation is that these runners have very limited storage space, so cutting down the area used for inference could help a lot there too. Also starting with a pretrained model allows you to train on a much smaller dataset much more quickly.

@DarthReca
Copy link
Contributor Author

Thanks for all the info

@adamjstewart adamjstewart removed this from the 0.7.0 milestone Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants