Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault at the end of Marlin processor running #42

Open
yradkhorrami opened this issue Aug 24, 2021 · 6 comments
Open

Segmentation fault at the end of Marlin processor running #42

yradkhorrami opened this issue Aug 24, 2021 · 6 comments

Comments

@yradkhorrami
Copy link

  • OS version: CentOS Linux 7
  • Compiler version: GCC 8.2
  • Package version: ILCSoft v02-02-02, CVMFS release
  • Reproduced by: After that ANY Marlin processor is executed and at the end, just after reporting time used by processors, I receive "Segmentation fault"! It doesn't matter which Marlin Processor is in execute block of steering xml file. however, I tried with "IsolatedLeptonTaggingProcessor" and also separately with my own processor (both without LCIOOutputProcessor at the end):

........
[ MESSAGE "Marlin"] ---------------------------------------------------------
[ MESSAGE "Marlin"] Events skipped by processors :
[ MESSAGE "Marlin"] Total: 0
[ MESSAGE "Marlin"] ---------------------------------------------------------
[ MESSAGE "Marlin"]
[ MESSAGE "Marlin"] ---------------------------------------------------------
[ MESSAGE "Marlin"] Time used by processors ( in processEvent() ) :
[ MESSAGE "Marlin"]
[ MESSAGE "Marlin"] MyIsolatedLeptonTaggingProcess 8.300000e-01 s in 998 events ==> 8.316633e-04 [ s/evt.]
[ MESSAGE "Marlin"] Total: 8.300000e-01 s in 998 events ==> 8.316633e-04 [ s/evt.]
[ MESSAGE "Marlin"] ---------------------------------------------------------
Segmentation fault

  • Goal: At the end this does not affect the output I expected, But seems there is a problem in Marlin/iLCSoft/...
@dudarboh
Copy link
Member

I have tried it with my local analysis process and with MyRefitProcessorProton process from ILDConfig production.

And I couldn't reproduce the Seg. fault message..

Could you share the processor code which could reproduce this?

I think I had this behavior before, although I don't remember how did I fix that exactly...
My guess would be that it is something with Process destructor.. Seeing the code would help

@yradkhorrami
Copy link
Author

I'm using just IsolatedLeptonTaggingProcessor centrally installed on cvmfs
the steering file is attached. (just rename .xml.txt ->.xml)
SLDCorrection.xml.txt

@dudarboh
Copy link
Member

Very interesting..

I have tried on naf:
with:
source /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/init_ilcsoft.sh
then
Marlin ./SLDCorrection.xml
outputs no Seg.Fault. in the end..

 MESSAGE "MyIsolatedLeptonTaggingProcessor"] -------------------------------------------------
[ MESSAGE "Marlin"]  --------------------------------------------------------- 
[ MESSAGE "Marlin"]   Events skipped by processors : 
[ MESSAGE "Marlin"]   Total: 0
[ MESSAGE "Marlin"]  --------------------------------------------------------- 
[ MESSAGE "Marlin"] 
[ MESSAGE "Marlin"]  --------------------------------------------------------- 
[ MESSAGE "Marlin"]       Time used by processors ( in processEvent() ) :      
[ MESSAGE "Marlin"] 
[ MESSAGE "Marlin"] MyIsolatedLeptonTaggingProcess       7.000000e-01 s in          998 events  ==> 7.014028e-04 [ s/evt.] 
[ MESSAGE "Marlin"]             Total:                   7.000000e-01 s in          998 events  ==> 7.014028e-04 [ s/evt.] 
[ MESSAGE "Marlin"]  --------------------------------------------------------- 

@dudarboh
Copy link
Member

I could reproduce the problem by adding my custom /afs/desy.de/user/d/dudarboh/iLCSoft/MarlinUtil/lib/libMarlinUtilNew.so to the $MARLIN_DLL. Then, Seg. Fault appears in the end as described above.

@yradkhorrami could you share your output of echo $MARLIN_DLL to check if it has any potential processor/library duplicates?

My guess would be that this happens when marlin::Processor::~Processor() tries to clean up Processor parameters here

Although I am a bit puzzled, as my libMarlinUtilNew.so is not really a processor at all and I renamed the library...

Here is relevant part of valgrind output:

. . .
==16765== Invalid read of size 8
==16765==    at 0x4E8D8F0: marlin::Processor::~Processor() (in /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Marlin/v01-17-01/lib/libMarlin.so.1.17.1)
==16765==    by 0x7216CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==16765==    by 0x7216D36: exit (in /usr/lib64/libc-2.17.so)
==16765==    by 0x71FF55B: (below main) (in /usr/lib64/libc-2.17.so)
==16765==  Address 0x2bb250d0 is 1,680 bytes inside an unallocated block of size 1,696 in arena "client"
==16765== 
==16765== Invalid read of size 8
==16765==    at 0x4E8D8F9: marlin::Processor::~Processor() (in /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Marlin/v01-17-01/lib/libMarlin.so.1.17.1)
==16765==    by 0x7216CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==16765==    by 0x7216D36: exit (in /usr/lib64/libc-2.17.so)
==16765==    by 0x71FF55B: (below main) (in /usr/lib64/libc-2.17.so)
==16765==  Address 0x2bb24fd0 is 1,424 bytes inside an unallocated block of size 1,696 in arena "client"
==16765== 
==16765== Jump to the invalid address stated on the next line
==16765==    at 0x0: ???
==16765==    by 0x4E8D8FE: marlin::Processor::~Processor() (in /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Marlin/v01-17-01/lib/libMarlin.so.1.17.1)
==16765==    by 0x7216CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==16765==    by 0x7216D36: exit (in /usr/lib64/libc-2.17.so)
==16765==    by 0x71FF55B: (below main) (in /usr/lib64/libc-2.17.so)
==16765==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==16765== 
==16765== 
==16765== Process terminating with default action of signal 11 (SIGSEGV)
==16765==  Bad permissions for mapped region at address 0x0
==16765==    at 0x0: ???
==16765==    by 0x4E8D8FE: marlin::Processor::~Processor() (in /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Marlin/v01-17-01/lib/libMarlin.so.1.17.1)
==16765==    by 0x7216CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==16765==    by 0x7216D36: exit (in /usr/lib64/libc-2.17.so)
==16765==    by 0x71FF55B: (below main) (in /usr/lib64/libc-2.17.so)
==16765== 
. . . 

Maybe running it with debug symbols can give more info, although I would need to manually rebuild Marlin from scratch then..

Maybe @tmadlener, @gaede have a better explanation and potential fix in mind?

@yradkhorrami
Copy link
Author

@dudarboh, I just looked at MARLIN libraries and found an interesting point: before including a local Marlin library, there is no problem, and the Marlin job finishes without any Seg. Fault. As soon as I add some of my local Marlin library, the Seg. Fault appears at the end. I checked which libraries cause the issue and found out those had been compiled using previous versions of ILCSoft (gcc,...) cause the issue. after recompiling the same processor with the latest version, the Seg.Faul does not appear. the output of echo $MARLIN_DLL is:

/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinDD4hep/v00-06/lib/libMarlinDD4hep.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/DDMarlinPandora/v00-11/lib/libDDMarlinPandora.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinReco/v01-31/lib/libMarlinReco.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/PandoraAnalysis/v02-00-01/lib/libPandoraAnalysis.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/LCFIVertex/v00-08/lib/libLCFIVertexProcessors.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/CEDViewer/v01-17-01/lib/libCEDViewer.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Overlay/v00-22-02/lib/libOverlay.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinFastJet/v00-05-02/lib/libMarlinFastJet.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/LCTuple/v01-12/lib/libLCTuple.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinKinfit/v00-06/lib/libMarlinKinfit.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinTrkProcessors/v02-11/lib/libMarlinTrkProcessors.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinKinfitProcessors/v00-04-02/lib/libMarlinKinfitProcessors.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/ILDPerformance/v01-10/lib/libILDPerformance.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Clupatra/v01-03/lib/libClupatra.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Physsim/v00-04-01/lib/libPhyssim.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/LCFIPlus/v00-09/lib/libLCFIPlus.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/FCalClusterer/v01-00-01/lib/libFCalClusterer.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/ForwardTracking/v01-14/lib/libForwardTracking.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/ConformalTracking/v01-10/lib/libConformalTracking.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/LICH/v00-01/lib/libLICH.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Garlic/v03-01/lib/libGarlic.so:/afs/desy.de/group/flc/pool/radkhory/HdecayMode/lib/libHdecayMode.so:/afs/desy.de/group/flc/pool/radkhory/SLDecayCorrection/lib/libSLDecayCorrection.so

which HdecayMode caused the issue.

@dudarboh
Copy link
Member

dudarboh commented Feb 7, 2022

As Julie @Torndal recently also encountered this problem. I want to throw my 5 cents again.

Basically, I want to confirm @yradkhorrami observations from the previous post.
I encountered this seg. fault in the end, only with libraries inside MARLIN_DLL which were compiled with a previous versions of iLCSoft.

I was trying to debug it with gdb a bit, thanks to @tmadlener, but it really went far beyond return 0; in the main() and crashed somewhere on std::string() destructor...

Recompiling the processor with a consistent version with all other libraries from iLCSoft, I think should fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants