Return-Path: Received: from mail-ig0-f193.google.com ([209.85.213.193]:33616 "EHLO mail-ig0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754189AbcEPTPv convert rfc822-to-8bit (ORCPT ); Mon, 16 May 2016 15:15:51 -0400 Received: by mail-ig0-f193.google.com with SMTP id rc4so8169478igc.0 for ; Mon, 16 May 2016 12:15:51 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: blocked on the spin lock From: Weston Andros Adamson In-Reply-To: <1463425370.5011.26.camel@redhat.com> Date: Mon, 16 May 2016 15:15:49 -0400 Cc: Tigran Mkrtchyan , linux-nfs list , Steve Dickson , William Andros Adamson Message-Id: References: <148899538.11512158.1462174191227.JavaMail.zimbra@desy.de> <1463425370.5011.26.camel@redhat.com> To: dwysocha@redhat.com Sender: linux-nfs-owner@vger.kernel.org List-ID: > On May 16, 2016, at 3:02 PM, Dave Wysochanski wrote: > > On Mon, 2016-05-02 at 09:29 +0200, Mkrtchyan, Tigran wrote: >> >> Hi Dros et. al. >> >> We have seen the following stack trace on one of out systems: >> >> >> Apr 28 22:09:51 bird510 kernel: BUG: soft lockup - CPU#7 stuck for 67s! [tee:13755] >> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] >> Apr 28 22:09:51 bird510 kernel: CPU 7 >> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] >> Apr 28 22:09:51 bird510 kernel: >> Apr 28 22:09:51 bird510 kernel: Pid: 13755, comm: tee Tainted: P -- ------------ 2.6.32-573.12.1.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M >> Apr 28 22:09:51 bird510 kernel: RIP: 0010:[] [] _spin_lock+0x21/0x30 >> Apr 28 22:09:51 bird510 kernel: RSP: 0018:ffff8803528179c8 EFLAGS: 00000297 >> Apr 28 22:09:51 bird510 kernel: RAX: 00000000000003b1 RBX: ffff8803528179c8 RCX: 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: RDX: 00000000000003b0 RSI: ffff880159aeb820 RDI: ffff880159aeb8d0 >> Apr 28 22:09:51 bird510 kernel: RBP: ffffffff8100bc0e R08: ffff880352817af0 R09: 0000000000000002 >> Apr 28 22:09:51 bird510 kernel: R10: ffffea0008efc428 R11: 0000000000000003 R12: 0000000000000001 >> Apr 28 22:09:51 bird510 kernel: R13: 001310facb5806bd R14: 0000000000000000 R15: 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: FS: 00007f3e78118700(0000) GS:ffff880028260000(0000) knlGS:0000000000000000 >> Apr 28 22:09:51 bird510 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> Apr 28 22:09:51 bird510 kernel: CR2: 00007fefa21ff000 CR3: 0000000352959000 CR4: 00000000000007e0 >> Apr 28 22:09:51 bird510 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Apr 28 22:09:51 bird510 kernel: Process tee (pid: 13755, threadinfo ffff880352814000, task ffff880636064ab0) >> Apr 28 22:09:51 bird510 kernel: Stack: >> Apr 28 22:09:51 bird510 kernel: ffff880352817a18 ffffffffa0490dc1 ffff880159aeb8d0 ffff880159aeb770 >> Apr 28 22:09:51 bird510 kernel: 0000000000000000 0000000000000000 ffffffffa04bd040 ffffffffa048f39a >> Apr 28 22:09:51 bird510 kernel: ffff8803378761c0 ffffea000a13dcc0 ffff880352817a48 ffffffffa0490ec8 >> Apr 28 22:09:51 bird510 kernel: Call Trace: >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_invalidate_page+0x47/0x80 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? do_invalidatepage+0x25/0x30 >> Apr 28 22:09:51 bird510 kernel: [] ? truncate_inode_page+0xa2/0xc0 >> Apr 28 22:09:51 bird510 kernel: [] ? truncate_inode_pages_range+0x16f/0x500 >> Apr 28 22:09:51 bird510 kernel: [] ? mempool_free_slab+0x17/0x20 >> Apr 28 22:09:51 bird510 kernel: [] ? truncate_inode_pages+0x15/0x20 >> Apr 28 22:09:51 bird510 kernel: [] ? truncate_pagecache+0x4f/0x70 >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs3_proc_setattr+0xdb/0x120 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_wb_all+0x43/0x50 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_setattr+0xf0/0x170 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? notify_change+0x168/0x340 >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_open+0x78/0x90 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? do_truncate+0x64/0xa0 >> Apr 28 22:09:51 bird510 kernel: [] ? security_inode_permission+0x1f/0x30 >> Apr 28 22:09:51 bird510 kernel: [] ? do_filp_open+0x861/0xd20 >> Apr 28 22:09:51 bird510 kernel: [] ? strncpy_from_user+0x4a/0x90 >> Apr 28 22:09:51 bird510 kernel: [] ? alloc_fd+0x92/0x160 >> Apr 28 22:09:51 bird510 kernel: [] ? do_sys_open+0x67/0x130 >> Apr 28 22:09:51 bird510 kernel: [] ? sys_open+0x20/0x30 >> Apr 28 22:09:51 bird510 kernel: [] ? system_call_fastpath+0x16/0x1b >> Apr 28 22:09:51 bird510 kernel: Code: 01 74 05 e8 52 1c d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f >> Apr 28 22:09:51 bird510 kernel: Call Trace: >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_invalidate_page+0x47/0x80 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? do_invalidatepage+0x25/0x30 >> Apr 28 22:09:51 bird510 kernel: [] ? truncate_inode_page+0xa2/0xc0 >> Apr 28 22:09:51 bird510 kernel: [] ? truncate_inode_pages_range+0x16f/0x500 >> Apr 28 22:09:51 bird510 kernel: [] ? mempool_free_slab+0x17/0x20 >> Apr 28 22:09:51 bird510 kernel: [] ? truncate_inode_pages+0x15/0x20 >> Apr 28 22:09:51 bird510 kernel: [] ? truncate_pagecache+0x4f/0x70 >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs3_proc_setattr+0xdb/0x120 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_wb_all+0x43/0x50 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_setattr+0xf0/0x170 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? notify_change+0x168/0x340 >> Apr 28 22:09:51 bird510 kernel: [] ? nfs_open+0x78/0x90 [nfs] >> Apr 28 22:09:51 bird510 kernel: [] ? do_truncate+0x64/0xa0 >> Apr 28 22:09:51 bird510 kernel: [] ? security_inode_permission+0x1f/0x30 >> Apr 28 22:09:51 bird510 kernel: [] ? do_filp_open+0x861/0xd20 >> Apr 28 22:09:51 bird510 kernel: [] ? strncpy_from_user+0x4a/0x90 >> Apr 28 22:09:51 bird510 kernel: [] ? alloc_fd+0x92/0x160 >> Apr 28 22:09:51 bird510 kernel: [] ? do_sys_open+0x67/0x130 >> Apr 28 22:09:51 bird510 kernel: [] ? sys_open+0x20/0x30 >> Apr 28 22:09:51 bird510 kernel: [] ? system_call_fastpath+0x16/0x1b >> >> >> >> >> The questionable spin lock was remove by commit 411a99adf. Is there was a problem? >> Do we need a packport in RHEL kernel? >> > > > This is a known bug 1322823. It should be fixed in RHEL6.7 > kernel-2.6.32-573.27.1.el6 or above as well as RHEL6.8 > kernel-2.6.32-642.el6 or higher. It only affects kernels > 2.6.32-573.10.2.el6 until kernel-2.6.32-573.27.1.el6. The bug is > private but see the description below. Thanks for the detailed explanation! -dros > > This bug was an unfortunate side-effect of a fix for another nasty bug > Bug 1135601 - System panics with "kernel BUG at fs/nfs/inode.c:113! > But 1135601 was fixed by a RHEL6 equivalent of the following upstream > patch: 3e21704 nfs: handle multiple reqs in nfs_wb_page_cancel > > Saving &inode->i_lock in an NFS specific structure seems problematic > which maybe why it was missed in the backport, and partially removed in > commit 411a99adf. I posted a related cleanup patch in this area > https://www.spinics.net/lists/linux-nfs/msg57194.html > > > > Bug 1322823 - RHEL6.7.z: nfs client kernel deadlock due to inode->i_lock > == nfs_commit_info.lock in code path nfs_wb_page_cancel -> > nfs_clear_request_commit due to patch to fix bug 1135601 > > Dave Wysochanski 2016-03-31 07:50:18 EDT > > Description of problem: > Repeated kernel deadlock hit multiple times by a customer, and caused by the current fix to bug 1135601. Basically the NFS client / RPC deadlocks due to a spinlock being taken twice. We have vmcores. > > I think in the fix for 1135601 the following "finer point" of the NFS > client was missed: So inode->i_lock == nfs_commit_info.lock? If that's > true it may be ok but it's not obvious and it's unclear why the code > isn't just using inode->i_lock but apparently copying the address into > nfs_commit_info.lock ? Indeed looking at the code we have: > ~~~ > static void nfs_init_cinfo_from_inode(struct nfs_commit_info *cinfo, > struct inode *inode) > { > cinfo->lock = &inode->i_lock; > cinfo->mds = &NFS_I(inode)->commit_info; > cinfo->ds = pnfs_get_ds_info(inode); > cinfo->dreq = NULL; > cinfo->completion_ops = &nfs_commit_completion_ops; > } > "fs/nfs/write.c" 1927L, 50230C > ~~~ > > > == Technical explanation of deadlock seen in vmcore == > > Ok with the understanding that inode->i_lock == nfs_commit_info.lock, take a look at the pid 11305 > ~~~ > crash> bt > PID: 11305 TASK: ffff881c7715aab0 CPU: 8 COMMAND: "sas" > #0 [ffff881078806e90] crash_nmi_callback at ffffffff81033cf6 > #1 [ffff881078806ea0] notifier_call_chain at ffffffff8153f545 > #2 [ffff881078806ee0] atomic_notifier_call_chain at ffffffff8153f5aa > #3 [ffff881078806ef0] notify_die at ffffffff810a783e > #4 [ffff881078806f20] do_nmi at ffffffff8153d203 > #5 [ffff881078806f50] nmi at ffffffff8153cac0 > [exception RIP: _spin_lock+0x1c] > RIP: ffffffff8153c32c RSP: ffff88104032fb58 RFLAGS: 00000297 > RAX: 00000000000009c1 RBX: ffff880a82b84bc0 RCX: 0000000000000000 > RDX: 00000000000009c0 RSI: ffff8809ff681820 RDI: ffff8809ff6818d0 > RBP: ffff88104032fb58 R8: ffff88104032fc80 R9: 0000000000000002 > R10: ffffea0022eec790 R11: 0000000000000003 R12: ffff88104032fb68 > R13: 0000000000000000 R14: ffff8809ff6818d0 R15: 0000000000000020 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > --- --- > #6 [ffff88104032fb58] _spin_lock at ffffffff8153c32c > #7 [ffff88104032fb60] nfs_clear_request_commit at ffffffffa03e9dc1 [nfs] > #8 [ffff88104032fbb0] nfs_wb_page_cancel at ffffffffa03e9ec8 [nfs] > #9 [ffff88104032fbe0] nfs_invalidate_page at ffffffffa03d74f7 [nfs] > #10 [ffff88104032fc00] do_invalidatepage at ffffffff8113edb5 > #11 [ffff88104032fc10] truncate_inode_page at ffffffff8113f0d2 > #12 [ffff88104032fc30] truncate_inode_pages_range at ffffffff8113f47f > #13 [ffff88104032fd20] truncate_inode_pages at ffffffff8113f8a5 > #14 [ffff88104032fd30] truncate_pagecache at ffffffff8113f8ff > #15 [ffff88104032fd60] nfs_setattr_update_inode at ffffffffa03d9eb9 [nfs] > #16 [ffff88104032fd90] nfs3_proc_setattr at ffffffffa03eca6b [nfs] > #17 [ffff88104032fe20] nfs_setattr at ffffffffa03db610 [nfs] > #18 [ffff88104032fe60] notify_change at ffffffff811b1338 > #19 [ffff88104032fed0] do_truncate at ffffffff8118fee4 > #20 [ffff88104032ff40] sys_ftruncate at ffffffff81190290 > #21 [ffff88104032ff80] system_call_fastpath at ffffffff8100b0d2 > RIP: 00002b53f548a4d7 RSP: 00002b543fb36b50 RFLAGS: 00010246 > RAX: 000000000000004d RBX: ffffffff8100b0d2 RCX: 00002b5426a29c48 > RDX: 00002b543f11a880 RSI: 0000000000020000 RDI: 0000000000000028 > RBP: 00002b543f11a880 R8: 0000000000010000 R9: 00002b544e8a63b0 > R10: 000000000000002a R11: 0000000000000206 R12: 00002b543f11ad88 > R13: 00002b53f9ae7fe0 R14: 00002b53fa43af20 R15: 00002b543f11a880 > ORIG_RAX: 000000000000004d CS: 0033 SS: 002b > crash> dis -lr ffffffffa03e9ec8 | tail > 0xffffffffa03e9eb6 : pop %r13 > 0xffffffffa03e9eb8 : pop %r14 > 0xffffffffa03e9eba : leaveq > 0xffffffffa03e9ebb : retq > 0xffffffffa03e9ebc : nopl 0x0(%rax) > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/fs/nfs/write.c: 1787 > 0xffffffffa03e9ec0 : mov %rbx,%rdi > 0xffffffffa03e9ec3 : callq 0xffffffffa03e9d20 > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/arch/x86/include/asm/spinlock.h: 174 > 0xffffffffa03e9ec8 : mov %r14,%rax > crash> dis -lr ffffffffa03e9dc1 | tail > 0xffffffffa03e9dab : mov -0x10(%rbp),%rbx > 0xffffffffa03e9daf : mov -0x8(%rbp),%r12 > 0xffffffffa03e9db3 : leaveq > 0xffffffffa03e9db4 : retq > 0xffffffffa03e9db5 : nopl (%rax) > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/fs/nfs/write.c: 558 > 0xffffffffa03e9db8 : mov -0x40(%rbp),%rdi > 0xffffffffa03e9dbc : callq 0xffffffff8153c310 <_spin_lock> > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/arch/x86/include/asm/bitops.h: 252 > 0xffffffffa03e9dc1 : lock btrl $0x2,0x40(%rbx) > ~~~ > > > Now look at the code, specifically nfs_wb_page_cancel and nfs_clear_request_commit. On line 1780 we take inode->i_lock. > ~~~ > fs/nfs/write.c > 1771 > 1772 int nfs_wb_page_cancel(struct inode *inode, struct page *page) > 1773 { > 1774 struct nfs_page *req; > 1775 int ret = 0; > 1776 > 1777 BUG_ON(!PageLocked(page)); > 1778 for (;;) { > 1779 wait_on_page_writeback(page); > 1780 spin_lock(&inode->i_lock); <----------- Takes spinlock the first time > 1781 req = nfs_page_find_request_locked(page); > 1782 if (req == NULL) { > 1783 spin_unlock(&inode->i_lock); > 1784 break; > 1785 } > 1786 if (nfs_lock_request(req)) { > 1787 nfs_clear_request_commit(req); <---------- executing inside here > 1788 spin_unlock(&inode->i_lock); > > > 549 static void > 550 nfs_clear_request_commit(struct nfs_page *req) > 551 { > 552 if (test_bit(PG_CLEAN, &req->wb_flags)) { > 553 struct inode *inode = req->wb_context->dentry->d_inode; > 554 struct nfs_commit_info cinfo; > 555 > 556 nfs_init_cinfo_from_inode(&cinfo, inode); > 557 if (!pnfs_clear_request_commit(req, &cinfo)) { > 558--> spin_lock(cinfo.lock); <-------------- Tries to take spinlock a second time > 559 nfs_request_remove_commit_list(req, &cinfo); > 560 spin_unlock(cinfo.lock); > 561 } > 562 nfs_clear_page_commit(req->wb_page); > 563 } > 564 } > ~~~ > > Version-Release number of selected component (if applicable): > 2.6.32-573.18.1.el6 > > How reproducible: > multiple times by one customer so far > > Steps to Reproduce: > TBD > > Actual results: > NFS client system deadlocks -dros