Connection manager repeatedly retries connection for dead process [JIRA: RIAK-2379] #723

ian-mi · 2016-02-11T23:39:40Z

When the calling process exits or crashes before a connection completes, the riak_core_connection process will crash with noproc when it attempts to call the connected callback. The connection manager will then repeatedly retry the connection with the same PID resulting in repeated noproc errors.

Seen at a customer where the fssource crashes due to:

2016-02-07 21:37:52 =SUPERVISOR REPORT====
     Supervisor: {local,riak_repl2_fssource_sup}
     Context:    child_terminated
     Reason:     {normal,{gen_server,call,[<11363.32378.80>,cluster_name,120000]}}
     Offender:   [{pid,<0.15265.27>},{name,822094670998632891489572718402909198556462055424},{mfargs,{riak_repl2_fssource,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

Which was then followed by periodic noproc crashes such as

2016-02-07 21:37:52 =ERROR REPORT====
** State machine <0.15266.27> terminating 
** Last message in was {tcp,#Port<0.819352>,<<131,104,2,100,0,2,111,107,104,2,100,0,8,102,117,108,108,115,121,110,99,104,3,97,3,97,0,97,0>>}
** When State == wait_for_protocol
**      Data  == {state,ranch_tcp,#Port<0.819352>,fullsync,[{3,0},{2,0},{1,1}],[{keepalive,true},{nodelay,true},{packet,4},{active,false}],riak_repl2_fssource,<0.15265.27>,"riak_tpsrvc_test2_iscc_104",[{clustername,"riak_tpsrvc_test2_iscc_104"},{ssl_enabled,false}],[{clustername,"riak_tpsrvc_test2_corp_104"},{ssl_enabled,false}],{10,253,50,54},9080}
** Reason for termination = 
** {noproc,{gen_server,call,[<0.15265.27>,{connected,#Port<0.819352>,ranch_tcp,{{REDACTED},9080},{fullsync,{3,0},{3,0}},[{clustername,"riak_tpsrvc_test2_corp_104"},{ssl_enabled,false}]},120000]}}
2016-02-07 21:37:52 =CRASH REPORT====
  crasher:
    initial call: riak_core_connection:init/1
    pid: <0.15266.27>
    registered_name: []
    exception exit: {{noproc,{gen_server,call,[<0.15265.27>,{connected,#Port<0.819352>,ranch_tcp,{{REDACTED},9080},{fullsync,{3,0},{3,0}},[{clustername,"riak_tpsrvc_test2_corp_104"},{ssl_enabled,false}]},120000]}},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,622}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
    ancestors: [<0.15251.27>]
    messages: []
    links: [#Port<0.819352>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 27
    reductions: 1637
  neighbours:

Always with the same PID. This behaviour continues until the node is restarted.

The text was updated successfully, but these errors were encountered:

Basho-JIRA changed the title ~~Connection manager repeatedly retries connection for dead process~~ Connection manager repeatedly retries connection for dead process [JIRA: RIAK-2379] Feb 11, 2016

Basho-JIRA added the JIRA: To Do label Feb 11, 2016

ian-mi mentioned this issue Mar 29, 2016

Catch exceptions thrown by the connected callback #736

Open

seanmcevoy mentioned this issue May 28, 2018

removed failed connections from the pool #784

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection manager repeatedly retries connection for dead process [JIRA: RIAK-2379] #723

Connection manager repeatedly retries connection for dead process [JIRA: RIAK-2379] #723

ian-mi commented Feb 11, 2016 •

edited by martincox

Loading

Connection manager repeatedly retries connection for dead process [JIRA: RIAK-2379] #723

Connection manager repeatedly retries connection for dead process [JIRA: RIAK-2379] #723

Comments

ian-mi commented Feb 11, 2016 • edited by martincox Loading

ian-mi commented Feb 11, 2016 •

edited by martincox

Loading