-
Notifications
You must be signed in to change notification settings - Fork 24
Overhaul_of_Silo_Differencing
miller86 edited this page Mar 24, 2019
·
1 revision
Silo’s differencing feature is currently embedded in Silo’s interactive browser tool. Diffing should really be part of the Silo library itself so that, for example, both browser and silex could compute differences. But, then so could any Silo application. There are some other features we’d like to achieve with an overhaul to Silo’s differencing…
- Browser logic was designed for interactive use. It may not have the desired performance characteristics.
- Adding support for diffing using parallelism/threads, maybe GPU enabled code.
- Output the differences as separate silo object/file
Presently, one can only see differences as textual interactive output from browser - Allow user to program the particular difference algorithm
Over the years, some users have submitted a variety of difference algorithms. This could be programmable to allow users to pick from among those that come pre-packaged with Silo as well as define ones of their own. - In Silex, annotate the difference result in the GUI with some visual queue showing where differences are so that users can descend into those parts of the file that are different
- In Silex, provide multicolumn tables for differences when displaying object details
- Add functions in the Silo library such as
DBDiffUcdmesh(DBucdmesh leftOp, DBUcdmesh *rightOp, DBUcdmesh *result, DBOptlist opts);
DBDiffUcdvar(DBucdvar *leftOp, DBUcdvar *rightOp, DBUcdvar *result, DBOptlist *opts);
The above would allow callers to diff individual objects easily. But, diffing whole files would still require more complex recursive coding to follow directory hierarchy. Again, currently all the smarts for that is embedded into browser. - Interpolation is useful too and maybe very useful for certain use cases involving diffing data
- This was inspired by an email conversation with Matt O’Brien. Suppose a code has been using Silo for baselining results for years. As the code evolves, occasionally things change such that the baseline results occur at slightly different timesteps. Maybe the code’s convergence has been improved. In any case, the situation where silo files need to be diff’d but from slightly different time steps occurs routinely. There is a good way to support this if the code dumps results at high time resolution around the current baseline time(s). For example, suppose baseline results are known at times 0.5, 2.33 and 3.7, but the new code dumps equivalent results at 0.492, 2.315 and 3.679. If the new code could dump results around 0.5, 2.33 and 3.7 at the highest time resolution of the code, then linearly interpolating those results would result in useable data at the desired times of 0.5, 2.33 and 3.7 for a better comparison. So, the diffing algorithm in this case involves
- Silo baseline file(s) and their associated (target) times
- For each baseline, 2 or more silo files surrounding the target times and as close together in time as the code that generates the data can allow (e.g. highest time resolution of the code)
- A new Silo difference tool that takes the above as input, linearly interpolates the new data to produce data at the target times followed by differencing that data with the baseline data.
- If any wrapper scripts (like current silodiff shell script) are used, code them in platform-independent python
- Support a silodiff configuration file that allows for variable-by-variable difference thresholds and algorithms as well a ability to add comments that might, for example, be used by developers to store information about each variable’s range. Enabling Silo’s browser to quickly return per-variable statistics such as range (e.g. min/max/average, etc. would also be helpful because it can then facilitiate the threshold’s a developer might select for a given variable (Tom Brunner)