Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup branch: PMON #37

Open
wants to merge 24 commits into
base: develop
Choose a base branch
from
Open

Cleanup branch: PMON #37

wants to merge 24 commits into from

Conversation

mslaffin
Copy link
Collaborator

This PR is for misc cleanup and changes related to the ProcessMonitorSubsystem

@mslaffin mslaffin self-assigned this Dec 15, 2024
@mslaffin
Copy link
Collaborator Author

This was tested and confirmed to be working on 12/16.

Video recording:
Recording 2024-12-16 174706.zip

Log file:
log_2024-12-16_17-32-58.txt

There's still a bug where one or more units briefly fail to respond in time, despite increasing the fault tolerance at various levels (_poll_single_unit, poll_all_units and the subsytem's update_temperatures). This presents itself as a brief flash of the grey "Disconnected" error bar before recovering back to the nominal colored bar

I do not think this is a pressing problem though, based on how quickly recovers from the lack of response. The problem doesn't seem to be isolated to any particular unit. My hunch is that we're pushing an aggressive polling cycle that involves multiple reads (status, process_value) every time and occasionally the the DP16 units are polled in an unprepared state, but like I said, I'm not too concerned about this as an immediate issue.

@mslaffin
Copy link
Collaborator Author

I mentioned this in our meeting on 12/20, but I think there's a problem with how the modbus_lock is implemented inside poll_all_units that could be leading to slower/less predictable update cycles.

The temperatures are updating slower than they should. This behavior can be seen in the video I posted in PR#34.

The modbus_lock is held for ALL units and is probably an inefficient way to do this. Currently looks like:

with self.modbus_lock:  # Lock held for all units
    for unit in self.unit_numbers:
        self._poll_single_unit(unit)  # Each unit can take up to 0.5s before timing out
time.sleep(0.1) # single sleep at end

The get_all_temperatures in the subsystem class returns a copy of the cached readings under response_lock, so we're always updating the GUI every 500ms, but not necessarily with new values until all units have been polled or timed out.

I'm going to pursue some changes to acquire and release modbus_lock between each unit, and try to track the actual time spent polling so that we're not making unnecessarily long time.sleep() calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants