Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753344AbYCRJkm (ORCPT ); Tue, 18 Mar 2008 05:40:42 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752304AbYCRJkb (ORCPT ); Tue, 18 Mar 2008 05:40:31 -0400 Received: from pentafluge.infradead.org ([213.146.154.40]:56291 "EHLO pentafluge.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752252AbYCRJk3 (ORCPT ); Tue, 18 Mar 2008 05:40:29 -0400 Subject: Re: deadlock on 2.6.24.3-rt3 From: Peter Zijlstra To: Steven Rostedt Cc: Hiroshi Shimamoto , linux-rt-users , Ingo Molnar , Thomas Gleixner , LKML In-Reply-To: References: <47DEB7F0.8040207@ct.jp.nec.com> <47DF097B.4090200@ct.jp.nec.com> Content-Type: text/plain Date: Tue, 18 Mar 2008 10:40:18 +0100 Message-Id: <1205833218.8514.338.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.21.92 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10910 Lines: 239 On Mon, 2008-03-17 at 21:53 -0400, Steven Rostedt wrote: > [Added Peter Zijlsta and LKML] Yeah, patches and such should really go to LKML as LKML is the -rt development list. > Hiroshi, thanks for looking into this! > > Peter, this looks like a legit fix, could you Ack it. > > > On Mon, 17 Mar 2008, Hiroshi Shimamoto wrote: > > > Hiroshi Shimamoto wrote: > > > Hi, > > > > > > I got a soft lockup message on 2.6.24.3-rt3. > > > I attached the .config. > > > > > > I think there is a deadlock scenario, I explain later. > > > > > > Here is the console log; > > > BUG: soft lockup - CPU#2 stuck for 11s! [bash:2175] > > > CPU 2: > > > Modules linked in: > > > Pid: 2175, comm: bash Not tainted 2.6.24.3-rt3 #1 > > > RIP: 0010:[] [] __spin_lock+0x57/0x67 > > > RSP: 0000:ffff8100c52a1d48 EFLAGS: 00000202 > > > RAX: 0000000000000000 RBX: 0000000000004bc5 RCX: 0000000000004bc5 > > > RDX: 0000000000000002 RSI: 00000000006c3208 RDI: 0000000000000001 > > > RBP: 000000000000000d R08: ffff8100cbc28018 R09: ffff810007c95458 > > > R10: 00000000006c3208 R11: 0000000000000246 R12: ffffffff808246e8 > > > R13: 000284d000000002 R14: ffffffff80387277 R15: 00000000ffffffff > > > FS: 00002b28926a2ef0(0000) GS:ffff8100cf8a3940(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > CR2: 00000000006c3208 CR3: 00000000c3cac000 CR4: 00000000000006e0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > > Call Trace: > > > [] __spin_lock+0x29/0x67 > > > [] swap_info_get+0x65/0xdd > > > [] can_share_swap_page+0x39/0x84 > > > [] do_wp_page+0x2f9/0x519 > > > [] handle_mm_fault+0x615/0x7cf > > > [] proc_flush_task+0x171/0x29c > > > [] recalc_sigpending+0xe/0x3c > > > [] do_page_fault+0x162/0x754 > > > [] audit_syscall_exit+0x31c/0x37a > > > [] error_exit+0x0/0x51 > > > --------------------------- > > > | preempt count: 00010002 ] > > > | 2-level deep critical section nesting: > > > ---------------------------------------- > > > .. [] .... __spin_lock+0xe/0x67 > > > .....[<00000000>] .. ( <= 0x0) > > > .. [] .... __spin_lock+0xe/0x67 > > > .....[<00000000>] .. ( <= 0x0) > > > BUG: soft lockup - CPU#3 stuck for 11s! [stress:9460] > > > CPU 3: > > > Modules linked in: > > > Pid: 9460, comm: stress Not tainted 2.6.24.3-rt3 #1 > > > RIP: 0010:[] [] find_get_page+0xad/0xbe > > > RSP: 0018:ffff8100cbf25b88 EFLAGS: 00000202 > > > 0000000000002009 RBX: ffffffff80824bc8 RCX: 0000000000000002 > > > RDX: 0000000000000002 RSI: ffff8100cbfcf298 RDI: ffff810005ad8910 > > > RBP: ffffffff80383a57 R08: ffff810005ad8918 R09: 0000000000000003 > > > R10: ffff810005ad88d8 R11: 0000000000000001 R12: ffffffff80822880 > > > R13: ffff81000799ce48 R14: ffffffff8028921c R15: ffffffff80822880 > > > FS: 00002acaa373bb00(0000) GS:ffff8100cf8a32c0(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > CR2: 00002b9b2827c530 CR3: 000000006cc90000 CR4: 00000000000006e0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > > Call Trace: > > > [] find_get_page+0x23/0xbe > > > [] free_swap_and_cache+0x46/0xdd > > > [] unmap_vmas+0x626/0x8ce > > > [] exit_mmap+0xac/0x147 > > > [] mmput+0x32/0xae > > > [] do_exit+0x199/0x914 > > > [] __dequeue_signal+0x19/0x1b7 > > > [] do_group_exit+0x2c/0x7e > > > [] get_signal_to_deliver+0x2ef/0x4aa > > > [] do_notify_resume+0xa8/0x7cd > > > [] add_preempt_count+0x14/0x111 > > > [] __up_read+0x13/0x8d > > > [] do_page_fault+0x187/0x754 > > > [] __dequeue_entity+0x2d/0x34 > > > [] __switch_to+0x27/0x2c9 > > > [] do_page_fault+0x1f4/0x754 > > > [] retint_signal+0x3d/0x7f > > > --------------------------- > > > | preempt count: 00010005 ] > > > | 5-level deep critical section nesting: > > > ---------------------------------------- > > > .. [] .... __spin_lock+0xe/0x67 > > > .....[<00000000>] .. ( <= 0x0) > > > .. [] .... __spin_lock+0xe/0x67 > > > .....[<00000000>] .. ( <= 0x0) > > > .. [] .... __spin_lock+0xe/0x67 > > > .....[<00000000>] .. ( <= 0x0) > > > .. [] .... find_get_page+0x14/0xbe > > > .....[<00000000>] .. ( <= 0x0) > > > .. [] .... __spin_lock+0xe/0x67 > > > .....[<00000000>] .. ( <= 0x0) > > > > > > > > > I also got a kernel core. > > > (gdb) info thr > > > 4 process 9460 0xffffffff8027c974 in find_get_page (mapping=, > > > offset=18446744071570598016) at include/asm/processor_64.h:385 > > > 3 process 2175 __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333 > > > 2 process 9132 __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22 > > > * 1 process 9478 __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333 > > > > > > CPU3(thread 4) is in find_get_page(), and the others in __spin_lock(). > > > The thread 4 is waiting to turn PG_nonewrefs bit off in wait_on_page_ref() which is > > > called from page_cache_get_speculative(), and the thread 4 holds the swap_lock. > > > The other threads waiting the swap_lock. > > > On the other hand, the thread 1 turned PG_nonewrefs bit on by calling > > > lock_page_ref_irq() in remove_mapping(), and then waiting the swap_lock. > > > So if the target page of remove_mapping() is in the exiting process memory, > > > the kernel is deadlock. > > > > > > (gdb) bt > > > #0 __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333 > > > #1 0xffffffff80296597 in swap_info_get (entry=) > > > at mm/swapfile.c:253 > > > #2 0xffffffff80296618 in swap_free (entry={val = 1}) at mm/swapfile.c:300 > > > #3 0xffffffff80286acd in remove_mapping (mapping=, > > > page=0xffff810005ad8910) at mm/vmscan.c:423 > > > ... > > > > > > (gdb) thr 2 > > > (gdb) bt > > > #0 __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22 > > > #1 0xffffffff80296374 in valid_swaphandles (entry=, > > > offset=0xffff81001e22bc78) at mm/swapfile.c:1783 > > > #2 0xffffffff8028b0af in swapin_readahead (entry={val = 1}, addr=0, vma=0x1) > > > at mm/memory.c:2054 > > > #3 0xffffffff8029a6af in shmem_getpage (inode=0xffff8100cdf4fd48, idx=0, > > > pagep=0xffff81001e22bd80, sgp=SGP_FAULT, type=0xffff81001e22bd34) at mm/shmem.c:1089 > > > ... > > > > > > (gdb) thr 3 > > > (gdb) bt > > > #0 __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333 > > > #1 0xffffffff80296597 in swap_info_get (entry=) > > > at mm/swapfile.c:253 > > > #2 0xffffffff80296c0c in can_share_swap_page (page=) > > > at mm/swapfile.c:317 > > > #3 0xffffffff8028ae6e in do_wp_page (mm=0xffff8100ce772f40, vma=0xffff8100cd212f00, > > > address=7090696, page_table=0xffff8100cbcef618, pmd=0xffff8100cbc28018, > > > ptl=0xffff810007c95458, orig_pte=) at mm/memory.c:1606 > > > ... > > > > > > (gdb) thr 4 > > > (gdb) bt > > > #0 0xffffffff8027c974 in find_get_page (mapping=, > > > offset=18446744071570598016) at include/asm/processor_64.h:385 > > > #1 0xffffffff80296f83 in free_swap_and_cache (entry={val = 4032}) at mm/swapfile.c:403 > > > #2 0xffffffff8028b9b7 in unmap_vmas (tlbp=0xffff8100cbf25cd8, vma=0xffff8100cde5c678, > > > start_addr=0, end_addr=18446744073709551615, nr_accounted=0xffff8100cbf25cd0, > > > details=0x0) at mm/memory.c:728 > > > #3 0xffffffff8028fa4c in exit_mmap (mm=0xffff8100cd093600) at mm/mmap.c:2048 > > > #4 0xffffffff8023ced7 in mmput (mm=0xffff8100cd093600) at kernel/fork.c:443 > > > #5 0xffffffff80242f00 in do_exit (code=14) at kernel/exit.c:997 > > > ... > > > > > > > > > I think it came from the lockless speculative get page patch. > > > I found the newer version of this patch in linux-mm. > > > http://marc.info/?l=linux-mm&m=119477111927364&w=2 > > > > > > I haven't tested it because it looks big change and hard to apply. > > > But it seems to fix this deadlock issue. > > > Any other patch to fix this issue is welcome. Yeah, in the latest lockless pagecache patches Nick got rid of PG_nonewrefs as suggested by Hugh, however -rt also has my concurrent pagecache patches and those need PG_nonewrefs on their own, so I hadn't bothered to update to Nick's latest. Perhaps I ought to, as you point out, page_cache_get_speculative() is cleaner in his latest.. /me puts it on his overlong TODO list. > > Is this patch good? Yes, it does look good. Doesn't remove_exclusice_swap_page() also nest PG_nonewrefs inside of swap_lock? Looks to me that also needs fixing.. > > --- > > From: Hiroshi Shimamoto > > Subject: [PATCH] avoid deadlock related with PG_nonewrefs and swap_lock > > > > There is a deadlock scenario; remove_mapping() vs free_swap_and_cache(). > > remove_mapping() turns PG_nonewrefs bit on, then locks swap_lock. > > free_swap_and_cache() locks swap_lock, then wait to turn PG_nonewrefs bit > > off in find_get_page(). > > > > swap_lock can be unlocked before calling find_get_page(). > > > > Signed-off-by: Hiroshi Shimamoto > > --- > > mm/swapfile.c | 5 +++-- > > 1 files changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 5036b70..581afee 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -400,13 +400,14 @@ void free_swap_and_cache(swp_entry_t entry) > > p = swap_info_get(entry); > > if (p) { > > if (swap_entry_free(p, swp_offset(entry)) == 1) { > > + spin_unlock(&swap_lock); > > page = find_get_page(&swapper_space, entry.val); > > if (page && unlikely(TestSetPageLocked(page))) { > > page_cache_release(page); > > page = NULL; > > } > > - } > > - spin_unlock(&swap_lock); > > + } else > > + spin_unlock(&swap_lock); > > } > > if (page) { > > int one_user; > > -- > > 1.5.4.1 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/