Return-Path: Received: from fieldses.org ([173.255.197.46]:59793 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754350AbbCEUru (ORCPT ); Thu, 5 Mar 2015 15:47:50 -0500 Date: Thu, 5 Mar 2015 15:47:49 -0500 From: "J. Bruce Fields" To: Christoph Hellwig Cc: Dave Chinner , Eric Sandeen , linux-nfs@vger.kernel.org, xfs@oss.sgi.com Subject: Re: panic on 4.20 server exporting xfs filesystem Message-ID: <20150305204749.GA17934@fieldses.org> References: <20150304155421.GE1627@fieldses.org> <20150304220900.GX18360@dastard> <20150304222709.GI1627@fieldses.org> <20150304224557.GY4251@dastard> <54F78BE5.1020608@sandeen.net> <20150304225623.GZ4251@dastard> <20150305040849.GJ1627@fieldses.org> <20150305131731.GA16235@lst.de> <20150305150138.GA15674@fieldses.org> <20150305170217.GC15674@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20150305170217.GC15674@fieldses.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Thu, Mar 05, 2015 at 12:02:17PM -0500, J. Bruce Fields wrote: > On Thu, Mar 05, 2015 at 10:01:38AM -0500, J. Bruce Fields wrote: > > On Thu, Mar 05, 2015 at 02:17:31PM +0100, Christoph Hellwig wrote: > > > On Wed, Mar 04, 2015 at 11:08:49PM -0500, J. Bruce Fields wrote: > > > > Ah-hah: > > > > > > > > static void > > > > nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls) > > > > { > > > > ... > > > > nfsd4_cb_layout_fail(ls); > > > > > > > > That'd do it! > > > > > > > > Haven't tried to figure out why exactly that's getting called, and why > > > > only rarely. Some intermittent problem with the callback path, I guess. > > > > > > > > Anyway, I think that solves most of the mystery.... > > > > > > Ooops, that was a nasty git merge error in the last rebase, see the fix > > > below. > > > > Thanks! > > And with that fix things look good. > > I'm still curious why the callbacks are failling. It's also logging > "nfsd: client 192.168.122.32 failed to respond to layout recall". I spoke too soon, I'm still not getting through my usual test run--the most recent run is hanging in generic/247 with the following in the server logs. But I probably still won't get a chance to look at this any closer till after vault. --b. nfsd: client 192.168.122.32 failed to respond to layout recall. Fencing.. nfsd: fence failed for client 192.168.122.32: -2! nfsd: client 192.168.122.32 failed to respond to layout recall. Fencing.. nfsd: fence failed for client 192.168.122.32: -2! receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt ffff88005639a000 xid c21abd62 kswapd0: page allocation failure: order:0, mode:0x120 CPU: 0 PID: 580 Comm: kswapd0 Not tainted 4.0.0-rc2-09922-g26cbcc7 #89 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 0000000000000000 ffff88007f803bc8 ffffffff81a92597 0000000080000101 0000000000000120 ffff88007f803c58 ffffffff81155322 ffffffff822e9448 0000000000000000 0000000000000000 0000000000000001 00000000ffffffff Call Trace: [] dump_stack+0x4f/0x7b [] warn_alloc_failed+0xe2/0x130 [] __alloc_pages_nodemask+0x5f3/0x860 [] skb_page_frag_refill+0xab/0xd0 [] try_fill_recv+0x38a/0x7d0 [] virtnet_receive+0x6f2/0x860 [] virtnet_poll+0x21/0x90 [] net_rx_action+0x1aa/0x320 [] __do_softirq+0xf2/0x370 [] irq_exit+0x65/0x70 [] do_IRQ+0x64/0x100 [] ? free_buffer_head+0x22/0x80 [] common_interrupt+0x6f/0x6f [] ? kernel_text_address+0x64/0x70 [] ? kmem_cache_free+0xb7/0x230 [] free_buffer_head+0x22/0x80 [] try_to_free_buffers+0x79/0xc0 [] xfs_vm_releasepage+0x6b/0x160 [] ? __page_check_address+0x1f0/0x1f0 [] try_to_release_page+0x35/0x60 [] shrink_page_list+0x9c2/0xba0 [] ? trace_hardirqs_on_caller+0x12d/0x1d0 [] shrink_inactive_list+0x23f/0x520 [] shrink_lruvec+0x595/0x6d0 [] ? shrink_slab.part.58.constprop.67+0x299/0x410 [] shrink_zone+0x4d/0xa0 [] kswapd+0x471/0xa30 [] ? try_to_free_pages+0x460/0x460 [] kthread+0xef/0x110 [] ? kthread_create_on_node+0x220/0x220 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x220/0x220 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 64 active_anon:7053 inactive_anon:2435 isolated_anon:0 active_file:88743 inactive_file:89505 isolated_file:32 unevictable:0 dirty:9786 writeback:0 unstable:0 free:3571 slab_reclaimable:227807 slab_unreclaimable:75772 mapped:21010 shmem:380 pagetables:1567 bounce:0 free_cma:0 DMA free:7788kB min:44kB low:52kB high:64kB active_anon:204kB inactive_anon:260kB active_file:696kB inactive_file:1016kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:32kB writeback:0kB mapped:924kB shmem:36kB slab_reclaimable:1832kB slab_unreclaimable:1632kB kernel_stack:48kB pagetables:80kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 1943 1943 1943 DMA32 free:6188kB min:5616kB low:7020kB high:8424kB active_anon:28008kB inactive_anon:9480kB active_file:354360kB inactive_file:357004kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:2080640kB managed:1990060kB mlocked:0kB dirty:39112kB writeback:0kB mapped:83116kB shmem:1484kB slab_reclaimable:909396kB slab_unreclaimable:301708kB kernel_stack:2608kB pagetables:6188kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 21*4kB (UEM) 11*8kB (UM) 10*16kB (UEM) 3*32kB (UM) 1*64kB (E) 1*128kB (E) 2*256kB (UM) 1*512kB (M) 2*1024kB (UM) 2*2048kB (ER) 0*4096kB = 7788kB DMA32: 276*4kB (UM) 30*8kB (UM) 83*16kB (UM) 2*32kB (MR) 2*64kB (M) 2*128kB (R) 2*256kB (R) 1*512kB (R) 0*1024kB 1*2048kB (R) 0*4096kB = 6192kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 178684 total pagecache pages 1 pages in swap cache Swap cache stats: add 14, delete 13, find 0/0 Free swap = 839620kB Total swap = 839676kB 524158 pages RAM 0 pages HighMem/MovableOnly 22666 pages reserved nfsd: client 192.168.122.32 failed to respond to layout recall. Fencing.. nfsd: fence failed for client 192.168.122.32: -2! receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt ffff880051dfc000 xid 8ff02aaf INFO: task nfsd:17653 blocked for more than 120 seconds. Not tainted 4.0.0-rc2-09922-g26cbcc7 #89 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D ffff8800753a7848 11720 17653 2 0x00000000 ffff8800753a7848 0000000000000001 0000000000000001 ffffffff82210580 ffff88004e9bcb50 0000000000000006 ffffffff8119d9f0 ffff8800753a7848 ffff8800753a7fd8 ffff88002e5e3d70 0000000000000246 ffff88004e9bcb50 Call Trace: [] ? new_sync_read+0xb0/0xb0 [] ? new_sync_read+0xb0/0xb0 [] schedule+0x37/0x90 [] schedule_preempt_disabled+0x18/0x30 [] mutex_lock_nested+0x156/0x400 [] ? xfs_file_buffered_aio_write.isra.9+0x6a/0x2a0 [] ? new_sync_read+0xb0/0xb0 [] xfs_file_buffered_aio_write.isra.9+0x6a/0x2a0 [] ? new_sync_read+0xb0/0xb0 [] xfs_file_write_iter+0x86/0x130 [] do_iter_readv_writev+0x65/0xa0 [] do_readv_writev+0xe2/0x280 [] ? xfs_file_buffered_aio_write.isra.9+0x2a0/0x2a0 [] ? xfs_file_buffered_aio_write.isra.9+0x2a0/0x2a0 [] ? __lock_acquire+0x37d/0x2040 [] ? lock_release_non_nested+0xa9/0x340 [] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [] ? trace_hardirqs_on_caller+0x12d/0x1d0 [] ? trace_hardirqs_on+0xd/0x10 [] vfs_writev+0x39/0x50 [] nfsd_vfs_write.isra.11+0xa1/0x350 [nfsd] [] nfsd_write+0x8e/0x100 [nfsd] [] ? gen_boot_verifier+0x5/0xc0 [nfsd] [] nfsd4_write+0x185/0x1e0 [nfsd] [] nfsd4_proc_compound+0x3c7/0x6f0 [nfsd] [] nfsd_dispatch+0xc3/0x220 [nfsd] [] svc_process_common+0x43f/0x650 [sunrpc] [] svc_process+0x143/0x260 [sunrpc] [] nfsd+0x167/0x1e0 [nfsd] [] ? nfsd+0x5/0x1e0 [nfsd] [] ? nfsd_destroy+0xe0/0xe0 [nfsd] [] kthread+0xef/0x110 [] ? kthread_create_on_node+0x220/0x220 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x220/0x220 2 locks held by nfsd/17653: #0: (sb_writers#11){.+.+.+}, at: [] do_readv_writev+0x238/0x280 #1: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [] xfs_file_buffered_aio_write.isra.9+0x6a/0x2a0 INFO: task nfsd:17655 blocked for more than 120 seconds. ...