Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-active scalars to be modified in python operators #2061

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

psavery
Copy link
Contributor

@psavery psavery commented Mar 5, 2020

This primarily adds a new function, dataset.set_scalars(), which allows the user to modify non-active scalars in the python operators. Additionally, the user can add new scalars with dataset.set_scalars().

If there are other scalars that are present in the dataset when dataset.set_scalars() is called, and if the dimensions of the other scalars do not match the dimensions of the new scalars, the other scalars are discarded. This is because all scalars must have the same dimensions.

Some other new parts from this PR:

  • Remove the deep copy of the parent in the external dataset.create_child_dataset(), because that isn't consistent with the internal dataset.
  • Add a new option to dataset.create_child_dataset(), to specify whether the child should be a volume (default is True) or not. Child datasets that aren't volumes will retain their tilt angles.
  • Enforce a spacing of 1.0 for the tilt axis in a few places, so that the data does not change spacing in certain operation paths.

Some examples of operations involving multiscalars are attached here:
multiscalars_operators.tar.gz
They include:

  • Make 3 channels from one channel
  • Invert all scalars
  • Bin 2x Tilt Images on multi-scalar data
  • SIRT Recon on multi-scalar data

Some caveats:

  1. For the internal pipeline, a description.json file must be present that specifies what the result will be named in the results dict, or else it won't read the output. This is not a requirement for the external pipeline.
  2. Live reconstruction (setting self.progress.data in the python operators) does not currently work for multiple scalars, and causes a crash. This might should be fixed. The crash is fixed, but it appears that only the active scalars get updated in the render window during live updates. Need to fix that.
  3. In the internal pipeline only, if a live update of the data is performed on data in an operator without a description.json file, tomviz crashes.

Update: Caveat 1 is part of the master branch (it is not new with this PR), and it is now documented in #2075.

Update: Caveat 3 is also a part of the master branch (it is not new with this PR), but it is not documented anywhere. It is related, however, to an issue mentioned in #2075, where a child data source does not appear to get created properly if there is no description.json file for the internal pipeline.

@psavery
Copy link
Contributor Author

psavery commented Mar 7, 2020

I should also mention that when I perform in an external pipeline these multiscalars operations from above: Multiscalar Bin2x Tilt Series -> Multiscalar Recon, it does not appear to produce reconstruction output (but it produces an output identical to the input instead). It seems like it might be caused by a rare bug under specific circumstances. I'll look into it.

Update: this problem is part of the master branch, and it is not new here. It is now documented as part of #2075.

@psavery psavery force-pushed the dataset-set-scalars branch 2 times, most recently from 6018bca to fe7fbfd Compare March 10, 2020 13:55
@psavery psavery force-pushed the dataset-set-scalars branch 2 times, most recently from ce045ee to 514cc36 Compare April 7, 2020 09:28
@psavery
Copy link
Contributor Author

psavery commented Apr 8, 2020

This PR is now functional. It has some bigger changes to the pipeline in 514cc36, which were needed primarily to make the behavior of child-producing operators consistent (#2075), which are needed because multi-scalars operators use child-producing operators. These changes are also a step towards saving intermediate data sources.

The changes in the pipeline result in the snapshot operator not working. But the addition of intermediate data sources should bring the snapshot operator back (since the snapshot operator is essentially creating an intermediate data source).

@psavery psavery marked this pull request as ready for review April 8, 2020 14:52
@psavery psavery force-pushed the dataset-set-scalars branch 2 times, most recently from 643374c to ea9ce44 Compare April 9, 2020 19:08
psavery added 13 commits April 9, 2020 16:38
Previously, the child dataset for the external pipeline would create
a deep copy of the parent dataset. This was not done in the internal
dataset. Instead, in the internal dataset, the child dataset would be
empty, and it would only have spacing and dimensions that matched that
of the parent (with the exception of when the parent was a tilt series,
then the spacing of the tilt axis of the child would be altered).

Additionally, in the internal dataset, if the active_scalars were set
on the empty child dataset, a new scalars named 'Scalars' would be
created.

The changes in this commit cause the external dataset to match the
behavior of the internal dataset, in that, the child dataset is an
empty dataset with the same spacing as that of the parent (except for
the tilt series parent case), and if the active scalars get set on an
empty child dataset, a new scalars named 'Scalars' would be created.

Signed-off-by: Patrick Avery <[email protected]>
Previously, child data sources from python operators could only be
volumes. However, new changes coming soon will allow child data sources
to be tilt series as well.

Add in a check so that a child data source is appropriately marked as
a tilt series if it has tilt angles.

Signed-off-by: Patrick Avery <[email protected]>
In the python operator "Dataset" classes, allow the non-active
scalars to be modified via "set_scalars()". If there are scalars
already present in the dataset, and the newly set scalars have different
dimensions compared to the other scalars, the other scalars are
discarded.

Additionally, this allows a child dataset to be either a volume
or another tilt series, so that the user can indicate whether the tilt
angles and spacing should be preserved.

Signed-off-by: Patrick Avery <[email protected]>
In the EMD format, the spacing of the tilt axis was previously
calculated by comparing the distance between the first and second
angles. Other formats, though, like the GenericHDF5 format, do not
give the tilt axis a spacing.

For consistency, and so that the data does not change spacing when
a GenericHDF5 file is loaded and used in an external pipeline, use
a spacing of 1.0 both in the EMD format, and in the pipeline
executor.

Signed-off-by: Patrick Avery <[email protected]>
For the internal pipeline, try to set the child data sources'
active scalars to match that of the parent if possible. This
prevents the active scalars changing due to operations that
produce child data sources.

Signed-off-by: Patrick Avery <[email protected]>
When live updates are performed on data that contains multiple
scalars, the number of scalars can change in between updates.

Add in a safety check in the DataPropertiesPanel to account for
this. This prevents a crash.

Signed-off-by: Patrick Avery <[email protected]>
This is for a little more general use than the previous
"Pipeline::moveModulesDown()" was.

Signed-off-by: Patrick Avery <[email protected]>
Signed-off-by: Patrick Avery <[email protected]>
Previously, if other operators were performed after a child-producing
operator (such as reconstruction), the behavior would be somewhat
unpredictable, depending on several factors, including whether a
description.json file was present, and whether the pipeline was
internal or external.

This fixes the behavior so that operators can be performed after
child-producing operators. This was done by making sure that the output
from child-producing operators gets passed down as input to the next
operator. This needed to be done both in the external pipeline, and in
the internal pipeline. Additionally, this also required that modules be
passed down to the output of the next operator as well.

This commit additionally allows child-producing operators to work in the
internal pipeline without the description.json file present, which was
not possible before (it required that the child be named explicitly).

The external pipeline should not emit that the pipeline is finished if
a child datasource is encountered. That was removed.

The snapshot operator does not have any useful behavior, currently. But
it should work well with intermediate data sources, since it will
essentially create an intermediate data source.

Signed-off-by: Patrick Avery <[email protected]>
ssize_t is producing an error on Windows.

Signed-off-by: Patrick Avery <[email protected]>
This ensures the modules get cleaned up.

Signed-off-by: Patrick Avery <[email protected]>
@psavery
Copy link
Contributor Author

psavery commented Apr 9, 2020

To provide an update on this branch: most things seem to be working very well. You can perform multi-scalars operations (which produce children) and non-child-producing operators, and they can both be used as a part of the pipeline, internal and external. You can add/remove operators as well - the pipeline appears to be flexible.

There may be other bugs we haven't found yet, but the one big bug that's hanging up this PR is that, if you have operators in your pipeline, and you close the application, it crashes. Unfortunately, the backtrace from gdb varies from run to run, and it is not very clear what the problem is. Most likely, it is some kind of memory issue (deleting an object twice, accessing an array out-of-bounds, etc.). Once we figure out what the issue is, this will hopefully be ready to merge.

There is a new datasource every time now, so this is not necessary.
Additionally, this is causing crashing when closing the application
when operators are present, and it is also causing crashing
occasionally when deleting datasources that have operators.

Signed-off-by: Patrick Avery <[email protected]>
@psavery psavery force-pushed the dataset-set-scalars branch from ea9ce44 to b4616b7 Compare April 9, 2020 21:24
@psavery
Copy link
Contributor Author

psavery commented Apr 9, 2020

The bug seems to be fixed. This PR now seems fairly robust - I have a hard time making it crash. I did encounter one crash, though, that is not fixed:

If you have two operators, and a volume module on the output, and you delete the first operator, it crashes. This doesn't occur for me with the slice/contour modules, nor with only one operator present, nor by deleting the second operator. It's a very specific crash it seems.

crash

Backtrace shows it has something to do with the histogram:

Thread 1 "tomviz" received signal SIGSEGV, Segmentation fault.
0x00005555559db872 in QScopedPointer<tomviz::DataSource::DSInternals, QScopedPointerDeleter<tomviz::DataSource::DSInternals> >::operator-> (this=0x10) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qscopedpointer.h:118
118	        return d;
(gdb) where
#0  0x00005555559db872 in QScopedPointer<tomviz::DataSource::DSInternals, QScopedPointerDeleter<tomviz::DataSource::DSInternals> >::operator->() const (this=0x10) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qscopedpointer.h:118
#1  0x00005555559d6cfa in tomviz::DataSource::proxy() const (this=0x0) at ../tomviz/DataSource.cxx:625
#2  0x00005555559d9694 in tomviz::DataSource::algorithm() const (this=0x0) at ../tomviz/DataSource.cxx:1250
#3  0x00005555559d96be in tomviz::DataSource::dataObject() const (this=0x0) at ../tomviz/DataSource.cxx:1255
#4  0x0000555555b67148 in tomviz::ModuleVolume::<lambda(vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData>)>::operator()(vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData>) const (__closure=0x55555c2085b0, image=..., histogram2D=...)
    at ../tomviz/modules/ModuleVolume.cxx:43
#5  0x0000555555b69fc3 in QtPrivate::FunctorCall<QtPrivate::IndexesList<0, 1>, QtPrivate::List<vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData> >, void, tomviz::ModuleVolume::ModuleVolume(QObject*)::<lambda(vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData>)> >::call(tomviz::ModuleVolume::<lambda(vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData>)> &, void **) (f=..., arg=0x7fffffffd4a0) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:130
#6  0x0000555555b69f2b in QtPrivate::Functor<tomviz::ModuleVolume::ModuleVolume(QObject*)::<lambda(vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData>)>, 2>::call<QtPrivate::List<vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData> >, void>(tomviz::ModuleVolume::<lambda(vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData>)> &, void *, void **) (f=..., arg=0x7fffffffd4a0) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:240
#7  0x0000555555b69e73 in QtPrivate::QFunctorSlotObject<tomviz::ModuleVolume::ModuleVolume(QObject*)::<lambda(vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData>)>, 2, QtPrivate::List<vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData> >, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void **, bool *) (which=1, this_=0x55555c2085a0, r=0x55555c0af410, a=0x7fffffffd4a0, ret=0x0) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobject_impl.h:168
#8  0x00007fffed89366f in QMetaObject::activate(QObject*, int, int, void**) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#9  0x00005555559767ba in tomviz::HistogramManager::histogram2DReady(vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData>) (this=0x5555560c69c0 <tomviz::HistogramManager::instance()::theInstance>, _t1=..., _t2=...)
    at tomviz/tomvizlib_autogen/EWIEGA46WW/moc_HistogramManager.cpp:164
#10 0x0000555555a155e1 in tomviz::HistogramManager::histogram2DReadyInternal(vtkSmartPointer<vtkImageData>, vtkSmartPointer<vtkImageData>) (this=0x5555560c69c0 <tomviz::HistogramManager::instance()::theInstance>, image=..., histogram=...)
    at ../tomviz/HistogramManager.cxx:316
#11 0x00005555559763ea in tomviz::HistogramManager::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (_o=0x5555560c69c0 <tomviz::HistogramManager::instance()::theInstance>, _c=QMetaObject::InvokeMetaMethod, _id=3, _a=0x7fffac106de0)
    at tomviz/tomvizlib_autogen/EWIEGA46WW/moc_HistogramManager.cpp:95
#12 0x00007fffed8940c2 in QObject::event(QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#13 0x00007ffff5d7983c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
---Type <return> to continue, or q <return> to quit---
#14 0x00007ffff5d81104 in QApplication::notify(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#15 0x00007fffed8648d8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007fffed86704d in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) ()
    at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#17 0x00007fffed8be263 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#18 0x00007fffde996417 in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#19 0x00007fffde996650 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#20 0x00007fffde9966dc in g_main_context_iteration () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#21 0x00007fffed8bd88f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
    at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#22 0x00007fffed86290a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#23 0x00007fffed86b9b4 in QCoreApplication::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#24 0x0000555555951deb in main(int, char**) (argc=1, argv=0x7fffffffde18) at ../tomviz/main.cxx:60

@psavery
Copy link
Contributor Author

psavery commented Apr 10, 2020

Here are a few things to think about for the design of this branch when we return to it:

  1. In both master and this branch, python operators create a new child data source when:
    a) There is a progress update with data
    b) There is a result dict returned from the operator

If a child data source was not created, it is assumed that the operator's input vtkDataObject was modified in place. That input vtkDataObject is what gets passed along to the next operator.

In the master branch, if a child data source was created, the input vtkDataObject is just ignored and the child data source is used as the output. This would result in future operators "not seeing" the changes made by child-producing operators. However, in this branch, the data inside the child data source is set on the input vtkDataObject, and since that gets passed along to the next operator, the next operator can see the changes. In this way, this branch moves us more toward a proper pipeline approach, I think, where the input data object gets modified is the output.

  1. In both master and this branch, there is always a child data source at the end of a chain of operators, that is considered the final output. In master, this data source gets re-used and moved up/down when operators are removed/added. In this branch, a new data source is created each time to be used at the end when operators are added or taken away. This is something that can change, I think we might be able to move this branch back to re-using data sources if that is desired.

When we introduce intermediate/persistent data sources into the pipeline chain, however, the intermediate data sources will not be moved up/down when the operators are removed/added, but a new one will need to be created. It's something we'll need to consider, and we might need to consider moving modules to/from the intermediate data sources when they are created/destroyed.

@cjh1
Copy link
Member

cjh1 commented Dec 1, 2020

@psavery We should revisit this and get it merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants