Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/stateful workflow #29

Merged
merged 39 commits into from
Feb 4, 2025
Merged

Feat/stateful workflow #29

merged 39 commits into from
Feb 4, 2025

Conversation

rmm-ch
Copy link
Collaborator

@rmm-ch rmm-ch commented Jan 31, 2025

  1. Added a state machine to control progression through the user workflow. The state is used to determine what to present, and is stable even when user interactions (e.g. tabs are switched) cause app reruns. Resolves bug: elements shown are not persistent if tabs are switched #23
  2. Input handling now incorporates coordinate extraction when present. Resolves feat: implement coordinate extraction from metadata #19
  3. InputObservation class got an overhaul as part of the stateful workflow. Resolves chore: cleanup InputObservation class #27
  4. We can also close bug: time_input failing with parsed dates #21 , the present implementation correctly populates the date if the file exifdata has it.

rmm-ch added 29 commits January 25, 2025 14:43
- classifier_image gets split into multiple functions: run inference;
  manual review/validation (with display of results); just display of
results
- workflow_state implements a FSM using `transitions`. It is a bit too
  simple in the end, as the idea was to have a trigger that allows
  moving to the next state without needing to know what that state is
  called (otherwise the specification of pathways by data structs
  doesn't simplify it). But it is too buggy this way, you can advance
  in places that you shouldn't.  So will refactor this.
- in main
    - we convert some more session_state to dicts, to handle image batches
    - add a simple widget to show progress through the workflow
    - the main effort to stop losing progress is on the tab_inference.
      lots of testing state to check what action to take. you can see
      several attempts, to clean up now I understand a big bug was in
      gating everything by the inference button.
This reverts commit 80b4be6.

- I learned what I needed to but I don't like the FSM implementation,
  and I created plenty of mess in main that doesn't need to remain.

  --> reverting.
- fsm implementation uses the `transitions` package.
- added unique keys to the input forms, so can check when all are filled
- included a basic viz/feedback on the state
they were singular and just overwritten on each image.
- dropped the "ML running" phase for now as we don't do it async
- main bug was that every interaction with the UI led to the
  file_uploader being re-instantiated, and then all the inputs
  got re-parsed, the hashes recalculated, and the data lost.

- solution is via callback, and using the session state to implicitly
  store the file_uploader return value (not well documented)

  - on change of the file_uploader state, we dynamically generate
    the input elements to supply the metadata. And process them inline.
  - TODO: the data is stable in the session_state, but the UI loses the
    elements for the list -- because the list hasn't changed! the
    callback doesn't get triggered.
    - Good: we don't overwrite our loaded data, and the ML/presentation
      can continue,  but...
    - Bad: we don't redraw the elements. -> more caching I suppose.
- testing for the presence of the key in session_state is no longer
  sufficient with current implementation (we explicitly set the keys
  so we can handle >1 file)
- can't generate widgets within callbacks, they are not stable
- flow instead is:
  1. normal flow: add file_uploader with callback
  2. buffer files in the callback (st.session_state)
  3. normal flow: add UI elements to get metadata, for each file in
     buffer
see issue #27.
- added image_md5 as input (1)
- removed duplicate methods (3)
- removed unused hash func (5)
- checked by inspection the attribute coverage (4) - still to test
- the timestamp is taken from the clock at the moment, infeasible to
  be the same so would always update (overwrite) all observations on
  reruns.
- also added a difference highlighter method for inspection
- hash is passed to Observation constructor
see issue #27.
- added typehints (6) for all arguments, except (date, time) which
  always seem to be none, and need investigation to resolve (see 2)
- updated the attribute uploaded_filename to uploaded_file, since it
  is *not* a filename, but a BytesIO-like object, `UploadedFile`.
- also stopped passing the viewcontainer arg to setup_input, since it
  uses the `with st.sidebar` context, which is simpler.
- still some ambiguity with the names "date_option" and "time_option"
  but the present change is more involved, while renaming those two
  can happen after if valid
- renamed the session_state variable from singular to plural
- was initialised as a dict, but then overwritten each time, meaning
  access was order-dependent and lossy
- current implementation is to open the HF handle once, then prepare and
  push each observation individually. Could check docs about pushing
  multiple observations in one transaction.

- At present the `api.upload_file` call is commented out, just get
  log/visual info about the actions
- no more "options", too ambiguous. The goal of this implementation is
  to have one date and one time for the observation, which could have
  been corrected manually.

- see beca8fa where I did the first update
- basically all phases seem ok, almost ready for validation
- when manual validation is performed (dropdown selection among
  species), it is written to the observations (And not the
  dynamically-created dicts).

- TODO: decide if we need to retain public_observations in
  session_state, or just generate the dict each time it is needed.
@rmm-ch rmm-ch requested a review from vancauwe January 31, 2025 16:03
@rmm-ch
Copy link
Collaborator Author

rmm-ch commented Jan 31, 2025

Maybe important to note:

  • the point I branched from had the huggingface api.upload_file commented out.
  • I made it more explicit by adding a boolean to the new function handling this: push_all_observations(enable_push=False).
  • I did not try pushing new observations to huggingface, only generating JSON output that would be pushed.

Copy link
Contributor

@vancauwe vancauwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested:

  • add some documentation to some of the functions
  • ensure variable names are explicit and easily understandable
  • make the main file as small as possible. Anything that can be outsourced elsewhere should be

src/main.py Outdated Show resolved Hide resolved
src/main.py Outdated Show resolved Hide resolved
src/main.py Outdated Show resolved Hide resolved
src/utils/workflow_state.py Show resolved Hide resolved
src/utils/workflow_ui.py Outdated Show resolved Hide resolved
@vancauwe
Copy link
Contributor

Screenshot 2025-01-31 at 19 35 16 From a user perspective, I cannot launch the code as is. Can you please review your type definitions for the get_image_datetime?

- basic markdown colouring doesn't support using :primary[<text>] in our
  version, so the code gets messy for an 'icing' feature.
st_logs provide the logging functionality, and the current method
for the few places used was not persistent to tab-switches.
removed unused code, primarily commented out / older versions
@rmm-ch
Copy link
Collaborator Author

rmm-ch commented Feb 1, 2025

From a user perspective, I cannot launch the code as is. Can you please review your type definitions for the get_image_datetime?

I'm developing in a venv with python 3.10 as per the deployed version. I can't actually quickly test locally as the oldest I have is 3.10 (10-2021). Step 1: can you verify it is ok now? step 2: can you upgrade your dev environment to match the deployment? (Unless we need to support older versions?)

@rmm-ch
Copy link
Collaborator Author

rmm-ch commented Feb 1, 2025

Changes requested:

  • add some documentation to some of the functions
    done, I think all functions have docstrings now
  • ensure variable names are explicit and easily understandable
    done
  • make the main file as small as possible. Anything that can be outsourced elsewhere should be

I did a first round of main file cleaning (mentioned above re: session_state; and also obselete code removed).
I'd rather not refactor the state machine code until we have validated the basic implementation here (and tested on the dev delopyment too).

Copy link
Contributor

github-actions bot commented Feb 3, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
src
   hf_push_observations.py42420%1–90
   main.py1411410%1–318
   whale_gallery.py33330%1–105
   whale_viewer.py27389%142, 146, 150
src/input
   input_validator.py52590%25–28, 82–83
TOTAL30622427% 

Tests Skipped Failures Errors Time
29 0 💤 0 ❌ 0 🔥 1.348s ⏱️

Copy link
Contributor

@vancauwe vancauwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first implementation looks very nice. good to merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants