Detector cold start #332

mexanick · 2020-04-02T12:01:05Z

Brief summary of issue

At the moment there's no mechanism to provide a cold start (after powercycle) for the front-end. If the requested OH is not programmed, the system will go into error state, with no recovering actions possible inside cmsgemos. We need to provide the mechanism of cold start either during "Initialize" or "Configure" state transition.

Types of issue

Bug report (report an issue with the code)
Feature request (request for change which adds functionality)

Expected Behavior

Freshly powercycled front end should be initialized and configured correctly

Current Behavior

The system will go into error state during the "Initialize" FSM transition

Steps to Reproduce (for bugs)

Powercycle OH, start the cmsgemos gemsupervisor application and press "Initialize" button

Context

Requires manual intervention outside the cmsgemos to recover the front end to operational state. Significantly complicates debugging with the templated rpc modules as no tool for automatic front end recovery with templated rpc modules is present at the moment.

Your Environment

Version used: valid for all stable and dev releases
Shell used:

The text was updated successfully, but these errors were encountered:

lpetre-ulb · 2020-04-06T17:13:00Z

While this a really needed feature, what would be a realistic timeline to correctly implement it? To do everything right, it may require the configuration tree, the DB, the config blaster, ... Of course some of those prerequisites can be worked around, but it may be more effective to tackle some of them first.

First, when do we need to access the front-end? IMHO, we should try not to access it at all during the initialization stage. At the configuration stage, the configuration is retrieved from the DB and be pushed to the front-end, establishing communication.

The following steps, at configure time, should get us started:

Get the hardware layout tree. It is more or less implemented but needs refactoring.
Configure the GBTx:

Get the configuration from file. It should be possible to use the same configuration files for all chambers of the same type, if not all chambers.
Or, get the GBTx configuration from a local DB.

Configure the OptoHybrid:

Use only RPC methods: RPC method exists for the GBTx configuration, one method needs to be written to reset the SCA and the RPC method to program the OptoHybrid was never accept.
Or, use the configuration blaster (it can only exist for the GBTx configuration).

What do you think? Of course, this mean we cannot get the featue right now, but reasonably soon.

mexanick · 2020-04-06T18:16:28Z

I agree with the way you're thinking. First of all we need a simplified cold start model, so there's no immediate need of configuration tree, DB and blaster (well, blaster will be used anyway).
Indeed, we need to configure the GBTs and FPGA. For GBTs, we always use the same config, so I don't see a problem in continuing using the file-based configs until we get the DB-based configs working. For the FPGA it is even simpler, we do not really need to configure anything, at this level we only need to program - i.e. send a hard reset.
I think the above is doable within a day. As for the the accessing it at the init - yes, this will be stripped and moved to configure step. But related to the issue that we don't have any good mechanism of front end recovery with templated rpc still hods, so that's the reason I assigned "High" priority.

P.S. Finally, another thing to think about is... recovery of the CTP7 in case something happened...

lpetre-ulb · 2020-04-07T08:15:30Z

Indeed, we need to configure the GBTs and FPGA. For GBTs, we always use the same config, so I don't see a problem in continuing using the file-based configs until we get the DB-based configs working.

Yes, we always use the same GBT configurations... which don't have the right RX phases. That forces us to perform the GBT phase scan. One may want to produce new GBT configurations based on the known LUT.

But related to the issue that we don't have any good mechanism of front end recovery with templated rpc still hods, so that's the reason I assigned "High" priority.

I see, it is not very difficult to swap the RPC modules, but far from being practical. Still, I would try to get the hardware layout tree (currently gem::onlinedb::SystemTopology) reworked first, since it is going to be needed by all applications very soon. That may be a starting point to simplified XML configuration files if the hardware layout would be provided globally on the DAQ machine.

Anyway, if you want to make it work with the current xDAQ-based XML configuration files and you have time, feel free to work on the issue. ;)

P.S. Finally, another thing to think about is... recovery of the CTP7 in case something happened...

That's a good point that we only discussed very little bit about until now. It could be done at the initialization step while pushing the artifacts. The problem is, in which application? If multiple xDAQ applications are connected to the same CTP7, which one would be responsible for that operation? How to avoid race conditions/conflicts? A supervisor/bookkeeper could maybe do the job.

lpetre-ulb · 2020-04-21T11:56:53Z

Issue moved to the new GitLab project.

mexanick added Priority: High Type: Feature Request labels Apr 2, 2020

lpetre-ulb closed this as completed Apr 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detector cold start #332

Detector cold start #332

mexanick commented Apr 2, 2020

lpetre-ulb commented Apr 6, 2020

mexanick commented Apr 6, 2020

lpetre-ulb commented Apr 7, 2020

lpetre-ulb commented Apr 21, 2020

Detector cold start #332

Detector cold start #332

Comments

mexanick commented Apr 2, 2020

Brief summary of issue

Types of issue

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

lpetre-ulb commented Apr 6, 2020

mexanick commented Apr 6, 2020

lpetre-ulb commented Apr 7, 2020

lpetre-ulb commented Apr 21, 2020