Feat/stateful workflow #29

rmm-ch · 2025-01-31T16:03:12Z

Added a state machine to control progression through the user workflow. The state is used to determine what to present, and is stable even when user interactions (e.g. tabs are switched) cause app reruns. Resolves bug: elements shown are not persistent if tabs are switched #23
Input handling now incorporates coordinate extraction when present. Resolves feat: implement coordinate extraction from metadata #19
InputObservation class got an overhaul as part of the stateful workflow. Resolves chore: cleanup InputObservation class #27
We can also close bug: time_input failing with parsed dates #21 , the present implementation correctly populates the date if the file exifdata has it.

- classifier_image gets split into multiple functions: run inference; manual review/validation (with display of results); just display of results - workflow_state implements a FSM using `transitions`. It is a bit too simple in the end, as the idea was to have a trigger that allows moving to the next state without needing to know what that state is called (otherwise the specification of pathways by data structs doesn't simplify it). But it is too buggy this way, you can advance in places that you shouldn't. So will refactor this. - in main - we convert some more session_state to dicts, to handle image batches - add a simple widget to show progress through the workflow - the main effort to stop losing progress is on the tab_inference. lots of testing state to check what action to take. you can see several attempts, to clean up now I understand a big bug was in gating everything by the inference button.

This reverts commit 80b4be6. - I learned what I needed to but I don't like the FSM implementation, and I created plenty of mess in main that doesn't need to remain. --> reverting.

- fsm implementation uses the `transitions` package. - added unique keys to the input forms, so can check when all are filled - included a basic viz/feedback on the state

they were singular and just overwritten on each image.

- dropped the "ML running" phase for now as we don't do it async

- main bug was that every interaction with the UI led to the file_uploader being re-instantiated, and then all the inputs got re-parsed, the hashes recalculated, and the data lost. - solution is via callback, and using the session state to implicitly store the file_uploader return value (not well documented) - on change of the file_uploader state, we dynamically generate the input elements to supply the metadata. And process them inline. - TODO: the data is stable in the session_state, but the UI loses the elements for the list -- because the list hasn't changed! the callback doesn't get triggered. - Good: we don't overwrite our loaded data, and the ML/presentation can continue, but... - Bad: we don't redraw the elements. -> more caching I suppose.

- testing for the presence of the key in session_state is no longer sufficient with current implementation (we explicitly set the keys so we can handle >1 file)

- can't generate widgets within callbacks, they are not stable - flow instead is: 1. normal flow: add file_uploader with callback 2. buffer files in the callback (st.session_state) 3. normal flow: add UI elements to get metadata, for each file in buffer

see issue #27. - added image_md5 as input (1) - removed duplicate methods (3) - removed unused hash func (5) - checked by inspection the attribute coverage (4) - still to test

- the timestamp is taken from the clock at the moment, infeasible to be the same so would always update (overwrite) all observations on reruns. - also added a difference highlighter method for inspection

- hash is passed to Observation constructor

see issue #27. - added typehints (6) for all arguments, except (date, time) which always seem to be none, and need investigation to resolve (see 2) - updated the attribute uploaded_filename to uploaded_file, since it is *not* a filename, but a BytesIO-like object, `UploadedFile`.

- also stopped passing the viewcontainer arg to setup_input, since it uses the `with st.sidebar` context, which is simpler.

- resolves #19

- still some ambiguity with the names "date_option" and "time_option" but the present change is more involved, while renaming those two can happen after if valid

- renamed the session_state variable from singular to plural - was initialised as a dict, but then overwritten each time, meaning access was order-dependent and lossy

(see beca8fa)

- current implementation is to open the HF handle once, then prepare and push each observation individually. Could check docs about pushing multiple observations in one transaction. - At present the `api.upload_file` call is commented out, just get log/visual info about the actions

- no more "options", too ambiguous. The goal of this implementation is to have one date and one time for the observation, which could have been corrected manually. - see beca8fa where I did the first update

- basically all phases seem ok, almost ready for validation

- when manual validation is performed (dropdown selection among species), it is written to the observations (And not the dynamically-created dicts). - TODO: decide if we need to retain public_observations in session_state, or just generate the dict each time it is needed.

rmm-ch · 2025-01-31T16:15:40Z

Maybe important to note:

the point I branched from had the huggingface api.upload_file commented out.
I made it more explicit by adding a boolean to the new function handling this: push_all_observations(enable_push=False).
I did not try pushing new observations to huggingface, only generating JSON output that would be pushed.

vancauwe

Changes requested:

add some documentation to some of the functions
ensure variable names are explicit and easily understandable
make the main file as small as possible. Anything that can be outsourced elsewhere should be

src/main.py

src/utils/workflow_state.py

src/utils/workflow_ui.py

vancauwe · 2025-01-31T18:38:56Z

From a user perspective, I cannot launch the code as is. Can you please review your type definitions for the get_image_datetime?

- basic markdown colouring doesn't support using :primary[<text>] in our version, so the code gets messy for an 'icing' feature.

st_logs provide the logging functionality, and the current method for the few places used was not persistent to tab-switches.

removed unused code, primarily commented out / older versions

rmm-ch · 2025-02-01T10:43:54Z

From a user perspective, I cannot launch the code as is. Can you please review your type definitions for the get_image_datetime?

I'm developing in a venv with python 3.10 as per the deployed version. I can't actually quickly test locally as the oldest I have is 3.10 (10-2021). Step 1: can you verify it is ok now? step 2: can you upgrade your dev environment to match the deployment? (Unless we need to support older versions?)

rmm-ch · 2025-02-01T10:48:12Z

Changes requested:

add some documentation to some of the functions
done, I think all functions have docstrings now

ensure variable names are explicit and easily understandable
done

make the main file as small as possible. Anything that can be outsourced elsewhere should be

I did a first round of main file cleaning (mentioned above re: session_state; and also obselete code removed).
I'd rather not refactor the state machine code until we have validated the basic implementation here (and tested on the dev delopyment too).

github-actions · 2025-02-03T16:28:57Z

Coverage Report

File	Stmts	Miss	Cover	Missing
src
hf_push_observations.py	42	42	0%	1–90
main.py	141	141	0%	1–318
whale_gallery.py	33	33	0%	1–105
whale_viewer.py	27	3	89%	142, 146, 150
src/input
input_validator.py	52	5	90%	25–28, 82–83
TOTAL	306	224	27%

Tests	Skipped	Failures	Errors	Time
29	0 💤	0 ❌	0 🔥	1.348s ⏱️

vancauwe

first implementation looks very nice. good to merge!

rmm-ch added 29 commits January 25, 2025 14:43

feat: added member var for top predictions in the observation class

f824145

Revert "feat: first implementation of an FSM to keep track of phase"

5a21040

This reverts commit 80b4be6. - I learned what I needed to but I don't like the FSM implementation, and I created plenty of mess in main that doesn't need to remain. --> reverting.

Merge remote-tracking branch 'origin/dev' into feat/stateful-workflow

b384db4

feat: implementation of FSM, and invokation for first phases

00bdefd

- fsm implementation uses the `transitions` package. - added unique keys to the input forms, so can check when all are filled - included a basic viz/feedback on the state

fix: classification_done and prediction1 now ok for image batches

7a5f0ca

they were singular and just overwritten on each image.

feat: using FSM for full workflow, with some steps mocked

4854d2c

- dropped the "ML running" phase for now as we don't do it async

feat: separate functions for ML inference, manual validation, display

d4ec4a0

feat: using separate functions for ML flow (WIP)

4d0f7fd

fix: checking input requires testing for empty strings/lists

3eaf0a5

- testing for the presence of the key in session_state is no longer sufficient with current implementation (we explicitly set the keys so we can handle >1 file)

chore: cleaning up InputObservation

fb505f3

see issue #27. - added image_md5 as input (1) - removed duplicate methods (3) - removed unused hash func (5) - checked by inspection the attribute coverage (4) - still to test

fix: InputObservation comparison ok for images, and skipping time

d6d4e4e

- the timestamp is taken from the clock at the moment, infeasible to be the same so would always update (overwrite) all observations on reruns. - also added a difference highlighter method for inspection

feat: InputObservations are compared and only updated if new

18e57c7

- hash is passed to Observation constructor

chore: cleanup input_handling - removing unused functions

a491624

- also stopped passing the viewcontainer arg to setup_input, since it uses the `with st.sidebar` context, which is simpler.

chore: reorganise code out of main, and docstrings in input_handling

1311e0c

chore: reorganise code out of main

01fa6a9

feat: get coordinates from file, populate input boxes

e90cc61

- resolves #19

chore: removing duplicate file (right one is in classifier subdir)

c3cf604

fix: removed time from InputObservation, renamed date for clarity

beca8fa

- still some ambiguity with the names "date_option" and "time_option" but the present change is more involved, while renaming those two can happen after if valid

fix: public_observations now a dict, for multi-image handling

0e02e00

- renamed the session_state variable from singular to plural - was initialised as a dict, but then overwritten each time, meaning access was order-dependent and lossy

fix: instantiating InputObservation with new call signature

3e2cb2f

(see beca8fa)

fix: InputObservation uses date, time, and raw_image_datetime

1ba3d0b

- no more "options", too ambiguous. The goal of this implementation is to have one date and one time for the observation, which could have been corrected manually. - see beca8fa where I did the first update

chore: tidy up of workflow and debug clutter

5823912

- basically all phases seem ok, almost ready for validation

test: updated test to match interface; solved other xfail tests

a068827

rmm-ch requested a review from vancauwe January 31, 2025 16:03

vancauwe requested changes Jan 31, 2025

View reviewed changes

src/main.py Outdated Show resolved Hide resolved

src/main.py Outdated Show resolved Hide resolved

src/main.py Outdated Show resolved Hide resolved

src/utils/workflow_state.py Show resolved Hide resolved

src/utils/workflow_ui.py Outdated Show resolved Hide resolved

rmm-ch added 9 commits February 1, 2025 10:30

chore: clean up function naming

1021b6c

chore: moved debug function to relevant module, and docstring

4d650d5

chore: cleanup variable names and add docstrings

a5ced3e

fix: remove highlighting for completion, not valid before v1.41.0

18a41f5

- basic markdown colouring doesn't support using :primary[<text>] in our version, so the code gets messy for an 'icing' feature.

doc: added docstrings to new functions

60a7864

fix: typehints compatible with py3.9

be5e8dd

chore: moved all session state init to relevant modules

94698a8

fix: retired use of tab_log in session state

d0c3bfa

st_logs provide the logging functionality, and the current method for the few places used was not persistent to tab-switches.

chore: tidy up main

d0e6c24

removed unused code, primarily commented out / older versions

feat: extra CI action that adds coverage reports to PR conversation

8e4ef44

vancauwe approved these changes Feb 4, 2025

View reviewed changes

vancauwe merged commit 8ccb11f into dev Feb 4, 2025
2 checks passed

This was referenced Feb 6, 2025

feat: implement coordinate extraction from metadata #19

Closed

chore: cleanup InputObservation class #27

Closed

bug: elements shown are not persistent if tabs are switched #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/stateful workflow #29

Feat/stateful workflow #29

rmm-ch commented Jan 31, 2025

rmm-ch commented Jan 31, 2025

vancauwe left a comment

vancauwe commented Jan 31, 2025

rmm-ch commented Feb 1, 2025

rmm-ch commented Feb 1, 2025

github-actions bot commented Feb 3, 2025

vancauwe left a comment

Feat/stateful workflow #29

Feat/stateful workflow #29

Conversation

rmm-ch commented Jan 31, 2025

rmm-ch commented Jan 31, 2025

vancauwe left a comment

Choose a reason for hiding this comment

vancauwe commented Jan 31, 2025

rmm-ch commented Feb 1, 2025

rmm-ch commented Feb 1, 2025

github-actions bot commented Feb 3, 2025

vancauwe left a comment

Choose a reason for hiding this comment