Replies: 1 comment
-
As it was mentioned at the LiteBIRD Simulation Telecon held on December ,12th 2023, if we implement flags as bitmasks, it would be handy to have a module that flags samples whenever a bright sources crosses the main beam of a detector. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am opening this issue to discuss about how to implement proper flagging in the timelines.
The main purpose of flagging is to tell if a sample in a timeline is suitable for being used as input in an analysis. There are several reasons why a sample should be avoided:
The instrument was not working properly. For instance, the detector was saturating, the star tracker was not properly reporting the orientation of the telescope, a cosmic ray hit the detector while the detector was acquiring the sample, temperature fluctuations on the focal plane are too large, weird pop-corn noise jumps, etc.
The kind of data is not suitable for the analysis. For instance, the code used to calibrate the beam using bright celestial sources should only use those samples in the timeline acquired when the source was closer than some angle θ to the main beam axis.
How to encode flags
The easiest way to use flags is to employ NumPy masked arrays, which are handy and straightforward to use. However, they are quite limited, as they can only be used as Boolean flags (yes/no). This is a quite common case in simulations, but it's not the way real data is flagged.
In Planck, flags were encoded using bit masks, i.e., through unsigned integers. For each scientific timeline, there is another timeline of integers with some specified witdh (16? 32?), where each flag is in a one-to-one relationship with a scientific sample. (Thus, the array of flags has the same number of elements as the TOD: they can be quite large!). Each number codifies the occurrence of any of the conditions listed above, codified as the
|
operation (bitwise OR) applied to one or more of the following values:00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000
Planck implemented the concept of “local flags” and “global flags”, in an attempt to save memory. The point is that some flags are specific for one detector (e.g., cosmic rays, pop-corn noise…), but other ones are common to all the detectors (unreliable pointings, etc.) and thus it would be a waste of memory to duplicate their value across all the detectors. Therefore, two flag arrays were kept:
Observation
, e.g.,obs.global_flags
(1D array);obs.det_flags
and should be a 2D matrix whose shape matchesobs.tod
.How to handle integer flags
If we decide to use integer flags, we can use bitmask operations to quickly decide whether a sample should be used or not. (NumPy provides vectorized operations for bitwise operations, so it is fast to apply them to all the flags.). For instance, to signal that there was a cosmic ray hitting the sample with index
i
while the detector was drifting its signal because of the time constant, one would write:The advantage of using bitmasks is that they are memory-efficient (only one bit per each flag) and can be checked quickly:
Questions
There are a number of questions that we should address before implementing this functionality in the framework:
Should we use masked arrays, or should we go with integer flags like in Planck? My feeling is that masked arrays are much handier and will cover 99% of the needs of our simulation framework. On the other hand, writing code that handles integer flags is more similar to what we will do in data reduction pipelines, and it might be needed for some kinds of simulations.
If we go with integer flags, should we decide which flags to use, or should we just ask the caller to provide their own bitmasks? I don't think that we can reasonably figure out all the possible flags that could be useful for simulations, so my impression is that it's better to let users of our framework decide the meaning of the flags. (Ideally, one might want the flags to be part of the IMo…)
We might provide the required flexibility by adding a new parameter
bitmask
to the analysis modules:We should clearly specify the meaning of the bitmask in the documentation, as the call is ambiguous to read:
In my opinion, the last solution is the best to use. In the case of Planck, it happened that new flags had to be decided while data were being acquired during the mission. Telling the code which bits are acceptable while preventing it from accepting other flags is more future-proof.
Again, if we go with integer flags but we do not fix the meaning of each bit (leaving the burden to the user), should we fix one size for the flags or let the user provide their size depending on the simulation? (The latter solution matches the ability to specify if
float32
orfloat64
is to be used when aSimulation
object is instantiated.) I would go for the latter: flags can take much space, and it would be pointless to use a 64-bit mask if the user is going to use fewer of them. (I reckon that most of the simulations will use just one or two bits…)Should we allocate all the flags once
create_observations
is created, so that they are always available, or should we allocate them only if the user provides a flagallocate_flags=True
? There are pros and cons with the two solutions:if
s scattered here and there).Should we exploit global/local flags or just provide one solution for simplicity? (This is probably relevant only if we go with integer flags.)
Which modules should be modified to take flags into account? The most obvious place is the destriper, but it's not trivial to determine how to handle gaps. What are other places?
Comments are welcome!
Beta Was this translation helpful? Give feedback.
All reactions