-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCons build with ROOT 6 causes interpreter segfaults #141
Comments
Not sure that I will be able to solve this quickly, or at all. But I will try to look into it further at least.
Quick question: what does one need to install (from EPEL) for root on RHEL7 systems?
yum install root
installs a lot of things, but I don’t see a “thisroot.sh” for setup, for example.
Best,
E.
On Sep 20, 2017, at 3:17 PM, Ole Hansen ***@***.***> wrote:
This is a nasty one, I think.
If, and apparently only if, I build the analyzer with scons (v2.5.1 from EPEL) and then issue some C++11 commands from the interpreter, I frequently (but not always!) get segfaults. Example:
************************************************
* *
* W E L C O M E to the *
* H A L L A C++ A N A L Y Z E R *
* *
* Release 1.6.0-beta3 Sep 20 2017 *
* Based on ROOT 6.10/04 Jul 28 2017 *
* *
* For information visit *
* http://hallaweb.jlab.org/podd/ *
* *
************************************************
analyzer [0] vector<int> vi { 1,2,4,5,6,9,-10,-20 }
(std::vector<int> &) { 1, 2, 4, 5, 6, 9, -10, -20 }
analyzer [1] for( auto& i : vi ) cout << i << endl;
*** Break *** segmentation violation
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0 0x00007fbf19abddbc in __libc_waitpid (pid=11594, stat_loc=stat_loc
entry=0x7fff734c7f60, options=options
entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
#1 0x00007fbf19a40cc2 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
#2 0x00007fbf1d7a47df in TUnixSystem::StackTrace (this=0x6298e0) at /opt/ROOT/root-6.10.04/core/unix/src/TUnixSystem.cxx:2412
#3 0x00007fbf1d7a6f2c in TUnixSystem::DispatchSignals (this=0x6298e0, sig=kSigSegmentationViolation) at /opt/ROOT/root-6.10.04/core/unix/src/TUnixSystem.cxx:3643
#4 <signal handler called>
#5 0x00007fbf1a58d183 in std::ostream::operator<< (this=0x7fbf1a7fb700 <std::cout>, __n=1) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream.tcc:110
#6 0x00007fbf1e4f60a7 in ?? ()
#7 0x00007fff734ca6e8 in ?? ()
#8 0x0000000001b12f60 in ?? ()
#9 0x0000000001b12f80 in ?? ()
#10 0x00007fff734caab0 in ?? ()
#11 0x0000000000000000 in ?? ()
===========================================================
The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum.
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5 0x00007fbf1a58d183 in std::ostream::operator<< (this=0x7fbf1a7fb700 <std::cout>, __n=1) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream.tcc:110
#6 0x00007fbf1e4f60a7 in ?? ()
#7 0x00007fff734ca6e8 in ?? ()
#8 0x0000000001b12f60 in ?? ()
#9 0x0000000001b12f80 in ?? ()
#10 0x00007fff734caab0 in ?? ()
#11 0x0000000000000000 in ?? ()
===========================================================
Root >
Here's where it gets nasty:
It isn't 100% reproducible. You may have to try several times (start analyzer, issue interactive commands, exit and restart if it doesn't crash).
I am unable to reproduce this crash with the scons build when running under gdb. Under the debugger, it just seems to work.
I have never been able to trigger this crash with a make build of the analyzer
hcana's SCons build seems unaffected as well.
The crash does not occur on macOS when building with either scons or make. So far, I have only seen it on RHEL7 and CentOS7. I have tried both the ROOT version from EPEL (currently 6.10/02) and a self-built ROOT 6.10/04 installation. I am using the standard compiler there: g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16). It happens on several different machines, including the VirtualBox image we made for the analysis workshop this summer.
I have already tried a number of variations on the compiler flags used by SCons, but so far nothing has made a difference. In particular, I have prevented -rdynamic to be parsed into the CXXFLAGS and used it only as a link flag, as the make build does. I've also reordered linker flags and manually re-linked libHall.so, libdc.so and the main executable. At this point, I'm stumped.
This problem was already present in June before the analysis workshop, so it is not due to a recent change.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_analyzer_issues_141&d=DwMCaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=QQI4wgZ48DvzGull4QAPAA&m=oiWBhRfOqMQ1gwJuwfMNa05WxLOp54YAYiZ1kJeY2ws&s=gg7g677au0FyrZMhTuLlkapygPtbQqEwr7FFDWE-aTw&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AE1Pu8lGXZjQDp-2Dj5xzilUXCJ6CMlCc-5Fks5skWTVgaJpZM4PeWMB&d=DwMCaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=QQI4wgZ48DvzGull4QAPAA&m=oiWBhRfOqMQ1gwJuwfMNa05WxLOp54YAYiZ1kJeY2ws&s=iSwfsiqAfsR4KSxIGNDjEcHaKIvDNaDoFh2mmEtq20I&e=>.
Dr. Edward J. Brash
Professor of Physics - Christopher Newport University
Staff Scientist - Thomas Jefferson National Accelerator Facility
Honorary Senior Research Fellow - University of Glasgow
Office: 757-594-7451
Mobile: 757-753-2831
FAX: 757-594-7919
|
Hi Ed,
I think "yum install root" should be all. No need to run thisroot.sh
because the EPEL version of ROOT is installed in system directories like
/usr/lib64, /usr/include/root etc. which are already in the various
PATHs. There is no top-level ROOTSYS directory in that case. The output
of "root-config" reflects that.
BTW, the problem also occurs with a self-compiled version of ROOT that
IS installed under a top-level ROOTSYS. It looks like this is not a
problem specific to the ROOT installation. So if you already have a
non-EPEL version of ROOT set up, you could use that.
Ole
|
Well, I am able now to reproduce the segfault … both with the EPEL version of ROOT, and with a previous local ROOT installation (6.06/08).
Sigh … I also see the same sort of very intermittent behavior … sometimes it works fine, and sometimes it segfaults.
Best,
E.
On Sep 27, 2017, at 2:48 PM, Ole Hansen ***@***.***> wrote:
Hi Ed,
I think "yum install root" should be all. No need to run thisroot.sh
because the EPEL version of ROOT is installed in system directories like
/usr/lib64, /usr/include/root etc. which are already in the various
PATHs. There is no top-level ROOTSYS directory in that case. The output
of "root-config" reflects that.
BTW, the problem also occurs with a self-compiled version of ROOT that
IS installed under a top-level ROOTSYS. It looks like this is not a
problem specific to the ROOT installation. So if you already have a
non-EPEL version of ROOT set up, you could use that.
Ole
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_analyzer_issues_141-23issuecomment-2D332619344&d=DwMFaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=QQI4wgZ48DvzGull4QAPAA&m=QbWufbFIgUeqwexUWORu8Rnd_1J9sQq9FgmzsAbpYzg&s=dozaPf0OHuBV3xvvJFHdy-HNzSrmVkFUeNmzeRmNChY&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AE1PuzxUJ35WszY7LGIbjRxm59Vc0YBvks5smph0gaJpZM4PeWMB&d=DwMFaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=QQI4wgZ48DvzGull4QAPAA&m=QbWufbFIgUeqwexUWORu8Rnd_1J9sQq9FgmzsAbpYzg&s=LjLq1QmLjA9d7yG9DSMZTGcNpEfADCa-NmFJQXfj0lM&e=>.
Dr. Edward J. Brash
Professor of Physics - Christopher Newport University
Staff Scientist - Thomas Jefferson National Accelerator Facility
Honorary Senior Research Fellow - University of Glasgow
Office: 757-594-7451
Mobile: 757-753-2831
FAX: 757-594-7919
|
Well, that's good news. Being able to reproduce a bug is like the
alcoholic admitting that he/she's got a problem ... the first step
towards recovery ;)
Hopefully you'll be able to track it down. I've been pulling my hair out
over this.
Ole
|
I am noticing another, possibly related difference between the Make-compiled and the SCons-compiled versions of the analyzer. If I log into a machine without X forwarding and then start the analyzer version compiled with make, I always get the familiar warning about DISPLAY not set:
Perhaps worth noting, I even get this warning when running with the (new) Now, if I do the same with the version compiled with SCons, the DISPLAY warning is never shown, even when starting a session where it normally would appear. No error appears when trying to open windows from such a session, for example
and no window appears anywhere. It seems like the make-compiled version initializes some ROOT component that includes DISPLAY handling, while the SCons-compiled version doesn't. I am not sure if this is related to the interactive interpreter crashes, but it's certainly another indication of a significant difference between the build systems, and fixing one could fix the other as well. |
Hi Ole,
I had noticed this as well. This may be related to the fact that in the Makefile, the ROOT libraries that are linked to at compile time are defined with ‘root-config —glibs’, whereas in the SConstruct, it uses ‘root-config —libs’. The difference results in the analyzer being linked to libGui.so (-lGui) as well, using make. I updated the SConstruct so that the ROOT library list is now the same as for the Makefile. Unfortunately, that did not fix the problem. But, it is still good to find and fix these differences anyway.
I also found another “bug” in the SCons configure scripts. I was confused, apparently, about what the flags NDEBUG and WITH_DEBUG actually mean. I had thought that (as the names might suggest), that one would pass one of these when compiling in debug mode, and the other when not. But, now that I look into it, that is not what they mean. I see the in the Makefile, the standard is to pass both of these. I have updated the SCons configure scripts to do things as the Makefile does in this respect.
The effect of this change is that now the *.o (which make creates) and *.os files (which SCons creates) in the src/ directory are literally all identical to one another. For the hana_decode directory, this is not quite true, because make actually goes into that directory to do the compilation of the source files, whereas SCons does it from the main directory. This causes the *.os and *.o object files to be different from one another in a binary sense. But, I have verified by using the command line that the make and SCons compilation commands do produce identical object files to one another for the hana_decode directory as well.
With that said, this change did not fix the problem either. Sigh …
I’m continuing to look at it … just going through things systematically and eliminating possibilities at this point.
Cheers,
E.
On Oct 1, 2017, at 1:59 PM, Ole Hansen ***@***.***> wrote:
I am noticing another, possibly related difference between the Make-compiled and the SCons-compiled versions of the analyzer. If I log into a machine without X forwarding and then start the analyzer version compiled with make, I always get the familiar warning about DISPLAY not set:
***@***.*** analyzer]$ ./analyzer -v
Warning in <UnknownClass::SetDisplay>: DISPLAY not set, setting it to 192.168.88.2:0.0
Podd 1.6.0-beta3 Linux-4.12.13-1-ARCH-x86_64 git @1bc2030 ROOT 6.10/04
Perhaps worth noting, I even get this warning when running with the (new) -v flag, which does not even create a THaInterface, but just runs a few cout commands in main() before exiting.
Now, if I do the same with the version compiled with SCons, the DISPLAY warning is never shown, even when starting a session where it normally would appear. No error appears when trying to open windows from such a session, for example
analyzer [0] auto b = new TBrowser
and no window appears anywhere.
It seems like the make-compiled version initializes some ROOT component that includes DISPLAY handling, while the SCons-compiled version doesn't. I am not sure if this is related to the interactive interpreter crashes, but it's certainly another indication of a significant difference between the build systems, and fixing one could fix the other as well.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_analyzer_issues_141-23issuecomment-2D333394538&d=DwMFaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=QQI4wgZ48DvzGull4QAPAA&m=SM8G6lEOarbbgGfL2QXSiScPC6plwCvJH_1o5DJ0e5g&s=GABIBehZwumsGCizzSmRirHwYLWApC6SE7rSIPgu6vM&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AE1Pu6j0DJdaQ2ZWsZWoKfFkW2IvB1Nsks5sn9MfgaJpZM4PeWMB&d=DwMFaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=QQI4wgZ48DvzGull4QAPAA&m=SM8G6lEOarbbgGfL2QXSiScPC6plwCvJH_1o5DJ0e5g&s=myUlmMegAO_LhYyF5uLYbE7ndGGEBN3I96EMqDADLPQ&e=>.
Dr. Edward J. Brash
Professor of Physics - Christopher Newport University
Staff Scientist - Thomas Jefferson National Accelerator Facility
Honorary Senior Research Fellow - University of Glasgow
Office: 757-594-7451
Mobile: 757-753-2831
FAX: 757-594-7919
|
I think that I have found and fixed the problem!
In the SCons build, the -fPIC flag was NOT being included in the building of src/main.C, whereas in the make build is was/is. The -fPIC flag is set in different ways in the two build systems, and the way that it was being done in SCons resulted in this inconsistency. The fix was pretty simple (just a small change to the configuration files).
I also fixed a couple of other inconsistencies (the way that the ROOT libraries were being defined, and the way that the -DNDEBUG and -DWITH_DEBUG flags were being set) … now SCons and make handle these in the same way.
I have tested on Centos7, and upon starting and stopping the analyzer about 20 times, and executing the c++11 code below, I have seen no segfaults. When I go back to the old SCons way, without the -fPIC flag in the src/main.C compilation, the segfault issue returns. So, I am moderately confident that this is the issue, and that it is now fixed.
I also updated the appropriate files in the SDK as well, and did a pull request of all of this.
Cheers,
E.
… On Oct 1, 2017, at 3:03 PM, Edward Brash ***@***.***> wrote:
Hi Ole,
I had noticed this as well. This may be related to the fact that in the Makefile, the ROOT libraries that are linked to at compile time are defined with ‘root-config —glibs’, whereas in the SConstruct, it uses ‘root-config —libs’. The difference results in the analyzer being linked to libGui.so (-lGui) as well, using make. I updated the SConstruct so that the ROOT library list is now the same as for the Makefile. Unfortunately, that did not fix the problem. But, it is still good to find and fix these differences anyway.
I also found another “bug” in the SCons configure scripts. I was confused, apparently, about what the flags NDEBUG and WITH_DEBUG actually mean. I had thought that (as the names might suggest), that one would pass one of these when compiling in debug mode, and the other when not. But, now that I look into it, that is not what they mean. I see the in the Makefile, the standard is to pass both of these. I have updated the SCons configure scripts to do things as the Makefile does in this respect.
The effect of this change is that now the *.o (which make creates) and *.os files (which SCons creates) in the src/ directory are literally all identical to one another. For the hana_decode directory, this is not quite true, because make actually goes into that directory to do the compilation of the source files, whereas SCons does it from the main directory. This causes the *.os and *.o object files to be different from one another in a binary sense. But, I have verified by using the command line that the make and SCons compilation commands do produce identical object files to one another for the hana_decode directory as well.
With that said, this change did not fix the problem either. Sigh …
I’m continuing to look at it … just going through things systematically and eliminating possibilities at this point.
Cheers,
E.
> On Oct 1, 2017, at 1:59 PM, Ole Hansen ***@***.*** ***@***.***>> wrote:
>
> I am noticing another, possibly related difference between the Make-compiled and the SCons-compiled versions of the analyzer. If I log into a machine without X forwarding and then start the analyzer version compiled with make, I always get the familiar warning about DISPLAY not set:
>
> ***@***.*** analyzer]$ ./analyzer -v
> Warning in <UnknownClass::SetDisplay>: DISPLAY not set, setting it to 192.168.88.2:0.0
> Podd 1.6.0-beta3 Linux-4.12.13-1-ARCH-x86_64 git @1bc2030 ROOT 6.10/04
> Perhaps worth noting, I even get this warning when running with the (new) -v flag, which does not even create a THaInterface, but just runs a few cout commands in main() before exiting.
>
> Now, if I do the same with the version compiled with SCons, the DISPLAY warning is never shown, even when starting a session where it normally would appear. No error appears when trying to open windows from such a session, for example
>
> analyzer [0] auto b = new TBrowser
> and no window appears anywhere.
>
> It seems like the make-compiled version initializes some ROOT component that includes DISPLAY handling, while the SCons-compiled version doesn't. I am not sure if this is related to the interactive interpreter crashes, but it's certainly another indication of a significant difference between the build systems, and fixing one could fix the other as well.
>
> —
> You are receiving this because you were assigned.
> Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_analyzer_issues_141-23issuecomment-2D333394538&d=DwMFaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=QQI4wgZ48DvzGull4QAPAA&m=SM8G6lEOarbbgGfL2QXSiScPC6plwCvJH_1o5DJ0e5g&s=GABIBehZwumsGCizzSmRirHwYLWApC6SE7rSIPgu6vM&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AE1Pu6j0DJdaQ2ZWsZWoKfFkW2IvB1Nsks5sn9MfgaJpZM4PeWMB&d=DwMFaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=QQI4wgZ48DvzGull4QAPAA&m=SM8G6lEOarbbgGfL2QXSiScPC6plwCvJH_1o5DJ0e5g&s=myUlmMegAO_LhYyF5uLYbE7ndGGEBN3I96EMqDADLPQ&e=>.
>
Dr. Edward J. Brash
Professor of Physics - Christopher Newport University
Staff Scientist - Thomas Jefferson National Accelerator Facility
Honorary Senior Research Fellow - University of Glasgow
Office: 757-594-7451
Mobile: 757-753-2831
FAX: 757-594-7919
|
Great job, Ed! I had a hunch there was one tiny little detail at the bottom of this. I had played with compiler flags, but hadn't gotten to this one yet. And it explains neatly why hcana wasn't affected - it doesn't use Podd's main.C.
Thanks for the quick fix, and I'll put your changes into GitHub as soon as I can, before I make the next release.
Ole
|
This is a nasty one, I think.
If, and apparently only if, I build the analyzer with
scons
(v2.5.1 from EPEL) and then issue some C++11 commands from the interpreter, I frequently (but not always!) get segfaults. Example:Here's where it gets nasty:
scons
build when running undergdb
. Under the debugger, it just seems to work.make
build of the analyzerhcana
's SCons build seems unaffected as well.scons
ormake
. So far, I have only seen it on RHEL7 and CentOS7. I have tried both the ROOT version from EPEL (currently 6.10/02) and a self-built ROOT 6.10/04 installation. I am using the standard compiler there: g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16). It happens on several different machines, including the VirtualBox image we made for the analysis workshop this summer.I have already tried a number of variations on the compiler flags used by SCons, but so far nothing has made a difference. In particular, I have prevented
-rdynamic
to be parsed into theCXXFLAGS
and used it only as a link flag, as themake
build does. I've also reordered linker flags and manually re-linkedlibHall.so
,libdc.so
and the main executable. At this point, I'm stumped.This problem was already present in June before the analysis workshop, so it is not due to a recent change.
The text was updated successfully, but these errors were encountered: