Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'energy' #2114

Open
kinow opened this issue Feb 7, 2025 · 9 comments
Open

AttributeError: 'NoneType' object has no attribute 'energy' #2114

kinow opened this issue Feb 7, 2025 · 9 comments
Assignees
Labels
bug Something isn't working working on Someone is working on it
Milestone

Comments

@kinow
Copy link
Member

kinow commented Feb 7, 2025

Found in t02z. This morning we were looking at the logs of t02z (@ainagaya & @ialsina were both helping me), when I noticed I hadn't scrolled up enough in the log to see an error that occurred due to MAX_WALLCLOCK.

But looking at the rest of the logs, I also found this exception:

2025-02-06 19:44:08,681 Historical Database error: 'NoneType' object has no attribute 'energy' Traceback (most recent call last):
  File "/appl/AS/4.1.12-dev-e326af7/lib64/python3.9/site-packages/autosubmit/history/experiment_history.py", line 203, in _verify_slurm_monitor
    if not slurm_monitor.steps_plus_extern_approximate_header_energy():
  File "/appl/AS/4.1.12-dev-e326af7/lib64/python3.9/site-packages/autosubmit/history/platform_monitor/slurm_monitor.py", line 76, in steps_plus_extern_approximate_header_energy
    return abs(self.steps_energy + self.extern.energy - self.header.energy) <= 0.01*self.header.energy
AttributeError: 'NoneType' object has no attribute 'energy'

2025-02-06 19:44:09,161 t02z_19900101_fc0_1_10_DN_STAT file have been transferred

Creating this issue now so we don't forget it... and will assign 4.1.12 in case we have time to work on this (models are late; we are still working with Feb 14th as deadline... but who knows 😬 ).

@kinow kinow added the bug Something isn't working label Feb 7, 2025
@kinow kinow added this to the 4.1.12 milestone Feb 7, 2025
@kinow
Copy link
Member Author

kinow commented Feb 7, 2025

Also, this appears in the _run.log, but not in the run_err, which only contains,

[ERROR] 2025-02-06 20:06:30,626 check_job() The job id (9437551) status is -2.
[ERROR] 2025-02-06 20:06:45,646 Job t02z_19900101_fc0_1_SIM is UNKNOWN. Checking completed files to confirm the failure...[eCode=6009]

I thought that exception had to be in the err, or in both/

@dbeltrankyl
Copy link
Contributor

@kinow

I think that validating that MAX_WALLCLOCK is higher than all jobs.job wallclock ( which can be done in the autosubmit config parser) should be enough, no?

@kinow
Copy link
Member Author

kinow commented Feb 7, 2025

@kinow

I think that validating that MAX_WALLCLOCK is higher than all jobs.job wallclock ( which can be done in the autosubmit config parser) should be enough, no?

I think so, but my comment was more about how I missed seeing the MAX_WALLCLOCK message in the logs. This issue is about the exception that appeared in the logs about the None object without energy attribute.

But I do agree that we should be able to validate the MAX_WALLCLOCK before 👍

@dbeltrankyl
Copy link
Contributor

dbeltrankyl commented Feb 10, 2025

I thought that the message was raised due to the max_wallclock issue as it was the same expid, is this not the case then?

@kinow
Copy link
Member Author

kinow commented Feb 10, 2025

I do not think so, @dbeltrankyl . I just happened to see the exception while searching for the wallclock error. But I don't see a relation between the two errors (at least not without looking at the code more closely).

@dbeltrankyl
Copy link
Contributor

Ok, I'll take a look at this and other issues like the stop... and put hold on the srun wrappers for a bit ( that is only pending tests)

@dbeltrankyl dbeltrankyl self-assigned this Feb 10, 2025
@dbeltrankyl dbeltrankyl added the working on Someone is working on it label Feb 10, 2025
@dbeltrankyl
Copy link
Contributor

dbeltrankyl commented Feb 10, 2025

I've made a branch to normalize and validate the wallclock.

Tomorrow, I'll open the PR for that ( in the config parser ), as I have not tested it yet ( and already over work hours ).

After that, my idea is to see why the energy received a None object.

@kinow
Copy link
Member Author

kinow commented Feb 17, 2025

AS config parser PR approved! I hope it helps understanding the energy and None bug. Thanks Dani!

@dbeltrankyl
Copy link
Contributor

dbeltrankyl commented Feb 20, 2025

@kinow

Just had this issue

[ERROR] 2025-02-06 20:06:30,626 check_job() The job id (9437551) status is -2.
[ERROR] 2025-02-06 20:06:45,646 Job t02z_19900101_fc0_1_SIM is UNKNOWN. Checking completed files to confirm the failure...[eCode=6009]

I'm 60% sure that is this function

    # TODO Duplicated for wrappers and jobs to fix in 4.1.X but in wrappers is called _is_over_wallclock for unknown reasons
    def is_over_wallclock(self):
        """
        Check if the job is over the wallclock time, it is an alternative method to avoid platform issues
        :return:
        """
        elapsed = datetime.datetime.now() - self.start_time
        if int(elapsed.total_seconds()) > self.wallclock_in_seconds:
            Log.warning(f"Job {self.name} is over wallclock time, Autosubmit will check if it is completed")
            return True
        return False

It happens when you resume a stopped experiment and the job is running, when I have time I'll apply a fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working working on Someone is working on it
Projects
None yet
Development

No branches or pull requests

2 participants