-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simple nuse crashes with wait_queue #45
Comments
this is a tentative patch to avoid this issue, but it's not a generic solution so need to work more. |
While running net-next I hit this: [ 634.073119] =============================== [ 634.073150] [ INFO: suspicious RCU usage. ] [ 634.073182] 4.2.0-rc6+ #45 Not tainted [ 634.073213] ------------------------------- [ 634.073244] include/net/vrf.h:38 suspicious rcu_dereference_check() usage! [ 634.073274] other info that might help us debug this: [ 634.073307] rcu_scheduler_active = 1, debug_locks = 1 [ 634.073338] 2 locks held by swapper/0/0: [ 634.073369] #0: (((&n->timer))){+.-...}, at: [<ffffffff8112bc35>] call_timer_fn+0x5/0x480 [ 634.073412] #1: (slock-AF_INET){+.-...}, at: [<ffffffff8174f0f5>] icmp_send+0x155/0x5f0 [ 634.073450] stack backtrace: [ 634.073483] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6+ #45 [ 634.073514] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 634.073545] 0000000000000000 0593ba8242d9ace4 ffff88002fc03b48 ffffffff81803f1b [ 634.073612] 0000000000000000 ffffffff81e12500 ffff88002fc03b78 ffffffff811003c5 [ 634.073642] 0000000000000000 ffff88002ec4e600 ffffffff81f00f80 ffff88002fc03cf0 [ 634.073669] Call Trace: [ 634.073694] <IRQ> [<ffffffff81803f1b>] dump_stack+0x4c/0x65 [ 634.073728] [<ffffffff811003c5>] lockdep_rcu_suspicious+0xc5/0x100 [ 634.073763] [<ffffffff8174eb56>] icmp_route_lookup+0x176/0x5c0 [ 634.073793] [<ffffffff8174f2fb>] ? icmp_send+0x35b/0x5f0 [ 634.073818] [<ffffffff8174f274>] ? icmp_send+0x2d4/0x5f0 [ 634.073844] [<ffffffff8174f3ce>] icmp_send+0x42e/0x5f0 [ 634.073873] [<ffffffff8170b662>] ipv4_link_failure+0x22/0xa0 [ 634.073899] [<ffffffff8174bdda>] arp_error_report+0x3a/0x80 [ 634.073926] [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0 [ 634.073952] [<ffffffff816d396e>] neigh_invalidate+0x8e/0x110 [ 634.073984] [<ffffffff816d62ae>] neigh_timer_handler+0x1ae/0x290 [ 634.074013] [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0 [ 634.074013] [<ffffffff8112bce3>] call_timer_fn+0xb3/0x480 [ 634.074013] [<ffffffff8112bc35>] ? call_timer_fn+0x5/0x480 [ 634.074013] [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0 [ 634.074013] [<ffffffff8112c2bc>] run_timer_softirq+0x20c/0x430 [ 634.074013] [<ffffffff810af50e>] __do_softirq+0xde/0x630 [ 634.074013] [<ffffffff810afc97>] irq_exit+0x117/0x120 [ 634.074013] [<ffffffff81810976>] smp_apic_timer_interrupt+0x46/0x60 [ 634.074013] [<ffffffff8180e950>] apic_timer_interrupt+0x70/0x80 [ 634.074013] <EOI> [<ffffffff8106b9d6>] ? native_safe_halt+0x6/0x10 [ 634.074013] [<ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10 [ 634.074013] [<ffffffff81027d43>] default_idle+0x23/0x200 [ 634.074013] [<ffffffff8102852f>] arch_cpu_idle+0xf/0x20 [ 634.074013] [<ffffffff810f89ba>] default_idle_call+0x2a/0x40 [ 634.074013] [<ffffffff810f8dcc>] cpu_startup_entry+0x39c/0x4c0 [ 634.074013] [<ffffffff817f9cad>] rest_init+0x13d/0x150 [ 634.074013] [<ffffffff81f69038>] start_kernel+0x4a8/0x4c9 [ 634.074013] [<ffffffff81f68120>] ? early_idt_handler_array+0x120/0x120 [ 634.074013] [<ffffffff81f68339>] x86_64_start_reservations+0x2a/0x2c [ 634.074013] [<ffffffff81f68485>] x86_64_start_kernel+0x14a/0x16d It would seem vrf_master_ifindex_rcu() can be called without RCU held in other contexts as well so introduce a new helper which acquires rcu and returns the ifindex. Also add curly braces around both the "if" and "else" parts as per the style guide. Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
FWIW I don't know if you dug too far into this, but this appears to be caused by race conditions related to the work queue lists. Have you made any progress? |
if I confirmed that this issue is fixed in LKL (https://github.com/lkl/linux), I will close this issue. |
I poked at LKL a tiny bit on Friday and saw some behavior that suggested there are concurrency-related issues there, but I can't say anything solid yet. FWIW I think you should leave this open as an issue on LibOS unless you are merging LibOS into LKL --- it is important for people who are interested in forking/using LibOS to have an accurate idea of the state of this project on its own so long as it is a standalone project. |
@pscollins agree to keep it opened. |
Hi Hajime, to have userspace tcp/ip stack functionality should we use libOS or LKL. This is to have applications like httpserver integrated with userspace tcp/ip stack library. Plz comment. |
LKL. it has been so much improved since then. |
TSIA.
Linux ubuntu14 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: