Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple nuse crashes with wait_queue #45

Open
thehajime opened this issue Jun 22, 2015 · 7 comments
Open

simple nuse crashes with wait_queue #45

thehajime opened this issue Jun 22, 2015 · 7 comments
Assignees
Labels

Comments

@thehajime
Copy link
Member

TSIA.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `ping 192.168.49.58'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __list_del (next=0x7f759c713e1a <lib_sock_poll+90>, prev=0x0) at ./include/linux/list.h:90
90              prev->next = next;
(gdb) bt
#0  __list_del (next=0x7f759c713e1a <lib_sock_poll+90>, prev=0x0) at ./include/linux/list.h:90
#1  __list_del_entry (entry=0x7fff7c482948) at ./include/linux/list.h:102
#2  list_del_init (entry=0x7fff7c482948) at ./include/linux/list.h:145
#3  autoremove_wake_function (wait=0x7fff7c482930, mode=<optimized out>, sync=<optimized out>, key=<optimized out>) at arch/lib/sched.c:173
#4  0x00007f759c714bc6 in __wake_up (q=0x1645318, mode=mode@entry=1, nr_exclusive=nr_exclusive@entry=1, key=key@entry=0xc3) at arch/lib/sched.c:242
#5  0x00007f759c714c05 in __wake_up_sync_key (q=<optimized out>, mode=mode@entry=1, nr_exclusive=nr_exclusive@entry=1, key=key@entry=0xc3) at arch/lib/sched.c:251
#6  0x00007f759c75a2a9 in sock_def_readable (sk=0x16474f8) at net/core/sock.c:2235
#7  0x00007f759c7590ed in sock_queue_rcv_skb (sk=sk@entry=0x16474f8, skb=skb@entry=0x7f75800011b8) at net/core/sock.c:474
#8  0x00007f759c7e863c in raw_rcv_skb (sk=sk@entry=0x16474f8, skb=skb@entry=0x7f75800011b8) at net/ipv4/raw.c:315
#9  0x00007f759c7e9924 in raw_rcv (sk=sk@entry=0x16474f8, skb=0x7f75800011b8) at net/ipv4/raw.c:334
#10 0x00007f759c7e9b11 in raw_v4_input (skb=skb@entry=0x7f758c0008c8, iph=0x7f758c0009e6, hash=hash@entry=1) at net/ipv4/raw.c:194
#11 0x00007f759c7e9b6c in raw_local_deliver (skb=skb@entry=0x7f758c0008c8, protocol=protocol@entry=1) at net/ipv4/raw.c:216
#12 0x00007f759c7bea99 in ip_local_deliver_finish (sk=sk@entry=0x0, skb=skb@entry=0x7f758c0008c8) at net/ipv4/ip_input.c:203
#13 0x00007f759c7bf092 in NF_HOOK_THRESH (thresh=-2147483648, okfn=0x7f759c7be9a0 <ip_local_deliver_finish>, out=0x0, in=<optimized out>, skb=0x7f758c0008c8, sk=0x0, hook=1, pf=2 '\002')
    at ./include/linux/netfilter.h:220
#14 NF_HOOK (okfn=0x7f759c7be9a0 <ip_local_deliver_finish>, out=0x0, in=<optimized out>, skb=0x7f758c0008c8, sk=0x0, hook=1, pf=2 '\002') at ./include/linux/netfilter.h:242
#15 ip_local_deliver (skb=0x7f758c0008c8) at net/ipv4/ip_input.c:256
#16 0x00007f759c7bf2f3 in NF_HOOK_THRESH (thresh=-2147483648, okfn=0x7f759c7bebd0 <ip_rcv_finish>, out=0x0, in=0x163f200, skb=0x7f758c0008c8, sk=0x0, hook=0, pf=2 '\002')
    at ./include/linux/netfilter.h:220
#17 NF_HOOK (okfn=0x7f759c7bebd0 <ip_rcv_finish>, out=0x0, in=0x163f200, skb=0x7f758c0008c8, sk=0x0, hook=0, pf=2 '\002') at ./include/linux/netfilter.h:242
#18 ip_rcv (skb=<optimized out>, dev=0x163f200, pt=<optimized out>, orig_dev=<optimized out>) at net/ipv4/ip_input.c:455
#19 0x00007f759c76d203 in __netif_receive_skb_core (skb=0x7f758c0008c8, pfmemalloc=<optimized out>) at net/core/dev.c:3895
#20 0x00007f759c76da74 in process_backlog (napi=0x7f759cc7be70 <softnet_data+112>, quota=64) at net/core/dev.c:4506
#21 0x00007f759c76d8be in napi_poll (n=0x7f759cc7be70 <softnet_data+112>, repoll=repoll@entry=0x7f759b63ce20) at net/core/dev.c:4744
#22 0x00007f759c76db98 in net_rx_action (h=<optimized out>) at net/core/dev.c:4809
#23 0x00007f759c713fe3 in do_softirq () at arch/lib/softirq.c:69
#24 0x00007f759c714048 in softirq_task_function (context=<optimized out>) at arch/lib/softirq.c:28
#25 0x00007f759c45624d in nuse_task_start_trampoline (context=0x15a6250) at nuse.c:175
#26 0x00007f759b84e182 in start_thread (arg=0x7f759b63d700) at pthread_create.c:312
#27 0x00007f759bf7d47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
  • hardware/or VM configuration (number of cpus, memory)
    • 8 cores Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 16G Memory
    • running Centos 6.4 final
    • running KVM to host multiple virtual guest
    • libos is on virtual guest running ubuntu 14.04.2 with 4 virtual cpus and 4G memory
  • host OS (distribution, version, kernel version)
    • KVM guest OS 4 virtual cpus, 4G memory
    • Linux ubuntu14 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    • libos is from your github clone on kernel 4.1.0-rc7+ :
# ./nuse ping 192.168.49.254

<5>Linux version 4.1.0-rc7+ (root@ubuntu14) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #0 Wed Jun 17 12:56:11 PDT 2015
  • how to build it (only make library ARCH=lib or with OPT=no)
    • make defconfig ARCH=lib
    • make library ARCH=lib
  • how often this happens
    • once in two runs,
@thehajime
Copy link
Member Author

this is a tentative patch to avoid this issue, but it's not a generic solution so need to work more.

https://gist.github.com/thehajime/65e58a101f0c50a04764

thehajime pushed a commit that referenced this issue Sep 2, 2015
While running net-next I hit this:
[  634.073119] ===============================
[  634.073150] [ INFO: suspicious RCU usage. ]
[  634.073182] 4.2.0-rc6+ #45 Not tainted
[  634.073213] -------------------------------
[  634.073244] include/net/vrf.h:38 suspicious rcu_dereference_check()
usage!
[  634.073274]
               other info that might help us debug this:

[  634.073307]
               rcu_scheduler_active = 1, debug_locks = 1
[  634.073338] 2 locks held by swapper/0/0:
[  634.073369]  #0:  (((&n->timer))){+.-...}, at: [<ffffffff8112bc35>]
call_timer_fn+0x5/0x480
[  634.073412]  #1:  (slock-AF_INET){+.-...}, at: [<ffffffff8174f0f5>]
icmp_send+0x155/0x5f0
[  634.073450]
               stack backtrace:
[  634.073483] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6+ #45
[  634.073514] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[  634.073545]  0000000000000000 0593ba8242d9ace4 ffff88002fc03b48
ffffffff81803f1b
[  634.073612]  0000000000000000 ffffffff81e12500 ffff88002fc03b78
ffffffff811003c5
[  634.073642]  0000000000000000 ffff88002ec4e600 ffffffff81f00f80
ffff88002fc03cf0
[  634.073669] Call Trace:
[  634.073694]  <IRQ>  [<ffffffff81803f1b>] dump_stack+0x4c/0x65
[  634.073728]  [<ffffffff811003c5>] lockdep_rcu_suspicious+0xc5/0x100
[  634.073763]  [<ffffffff8174eb56>] icmp_route_lookup+0x176/0x5c0
[  634.073793]  [<ffffffff8174f2fb>] ? icmp_send+0x35b/0x5f0
[  634.073818]  [<ffffffff8174f274>] ? icmp_send+0x2d4/0x5f0
[  634.073844]  [<ffffffff8174f3ce>] icmp_send+0x42e/0x5f0
[  634.073873]  [<ffffffff8170b662>] ipv4_link_failure+0x22/0xa0
[  634.073899]  [<ffffffff8174bdda>] arp_error_report+0x3a/0x80
[  634.073926]  [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0
[  634.073952]  [<ffffffff816d396e>] neigh_invalidate+0x8e/0x110
[  634.073984]  [<ffffffff816d62ae>] neigh_timer_handler+0x1ae/0x290
[  634.074013]  [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0
[  634.074013]  [<ffffffff8112bce3>] call_timer_fn+0xb3/0x480
[  634.074013]  [<ffffffff8112bc35>] ? call_timer_fn+0x5/0x480
[  634.074013]  [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0
[  634.074013]  [<ffffffff8112c2bc>] run_timer_softirq+0x20c/0x430
[  634.074013]  [<ffffffff810af50e>] __do_softirq+0xde/0x630
[  634.074013]  [<ffffffff810afc97>] irq_exit+0x117/0x120
[  634.074013]  [<ffffffff81810976>] smp_apic_timer_interrupt+0x46/0x60
[  634.074013]  [<ffffffff8180e950>] apic_timer_interrupt+0x70/0x80
[  634.074013]  <EOI>  [<ffffffff8106b9d6>] ? native_safe_halt+0x6/0x10
[  634.074013]  [<ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10
[  634.074013]  [<ffffffff81027d43>] default_idle+0x23/0x200
[  634.074013]  [<ffffffff8102852f>] arch_cpu_idle+0xf/0x20
[  634.074013]  [<ffffffff810f89ba>] default_idle_call+0x2a/0x40
[  634.074013]  [<ffffffff810f8dcc>] cpu_startup_entry+0x39c/0x4c0
[  634.074013]  [<ffffffff817f9cad>] rest_init+0x13d/0x150
[  634.074013]  [<ffffffff81f69038>] start_kernel+0x4a8/0x4c9
[  634.074013]  [<ffffffff81f68120>] ?
early_idt_handler_array+0x120/0x120
[  634.074013]  [<ffffffff81f68339>] x86_64_start_reservations+0x2a/0x2c
[  634.074013]  [<ffffffff81f68485>] x86_64_start_kernel+0x14a/0x16d

It would seem vrf_master_ifindex_rcu() can be called without RCU held in
other contexts as well so introduce a new helper which acquires rcu and
returns the ifindex.
Also add curly braces around both the "if" and "else" parts as per the
style guide.

Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
@pscollins
Copy link

FWIW I don't know if you dug too far into this, but this appears to be caused by race conditions related to the work queue lists. Have you made any progress?

@thehajime
Copy link
Member Author

if I confirmed that this issue is fixed in LKL (https://github.com/lkl/linux), I will close this issue.

@pscollins
Copy link

I poked at LKL a tiny bit on Friday and saw some behavior that suggested there are concurrency-related issues there, but I can't say anything solid yet.

FWIW I think you should leave this open as an issue on LibOS unless you are merging LibOS into LKL --- it is important for people who are interested in forking/using LibOS to have an accurate idea of the state of this project on its own so long as it is a standalone project.

@thehajime thehajime self-assigned this Jan 19, 2016
@thehajime
Copy link
Member Author

@pscollins agree to keep it opened.

@dd76
Copy link

dd76 commented Jul 21, 2016

Hi Hajime, to have userspace tcp/ip stack functionality should we use libOS or LKL. This is to have applications like httpserver integrated with userspace tcp/ip stack library. Plz comment.

@thehajime
Copy link
Member Author

LKL. it has been so much improved since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants