Return-Path: Received: from fieldses.org ([174.143.236.118]:49136 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750821Ab0HYPpC (ORCPT ); Wed, 25 Aug 2010 11:45:02 -0400 Date: Wed, 25 Aug 2010 11:44:51 -0400 From: "J. Bruce Fields" To: Artem Bityutskiy Cc: "linux-nfs@vger.kernel.org" , Trond Myklebust , Christoph Hellwig , Jens Axboe Subject: Re: hang in writeback code on nfsv4 mount Message-ID: <20100825154451.GB14440@fieldses.org> References: <20100825023425.GA24591@fieldses.org> <1282717945.24044.187.camel@localhost> Content-Type: text/plain; charset=us-ascii In-Reply-To: <1282717945.24044.187.camel@localhost> Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Wed, Aug 25, 2010 at 09:32:25AM +0300, Artem Bityutskiy wrote: > On Wed, 2010-08-25 at 04:34 +0200, ext J. Bruce Fields wrote: > > As of 253c34e9b10c30d3064be654b5b78fbc1a8b1896 "writeback: prevent > > unnecessary bdi threads wakeups", any nfs mount hangs for me. Is this a > > known issue? > > > > --b. > > > > INFO: task mount.nfs4:3812 blocked for more than 120 seconds. > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > mount.nfs4 D 0000000000000000 2880 3812 3811 0x00000000 > > ffff88001ed25a28 0000000000000046 ffff88001ed25fd8 ffff88001ed25fd8 > > ffff88001ed24000 ffff88001ed24000 ffff88001ed24000 ffff88001f9503a0 > > ffff88001ed25fd8 ffff88001f9503a8 ffff88001ed24000 ffff88001ed25fd8 > > Call Trace: > > [] schedule_timeout+0x1cd/0x2e0 > > [] ? mark_held_locks+0x6c/0xa0 > > [] ? _raw_spin_unlock_irq+0x30/0x60 > > [] ? trace_hardirqs_on_caller+0x14d/0x190 > > [] ? sub_preempt_count+0xe/0xd0 > > [] wait_for_common+0x120/0x190 > > [] ? default_wake_function+0x0/0x20 > > [] wait_for_completion+0x1d/0x20 > > [] kthread_stop+0x4a/0x150 > > [] ? thaw_process+0x70/0x80 > > [] bdi_unregister+0x10a/0x1a0 > > [] nfs_put_super+0x19/0x20 > > [] generic_shutdown_super+0x54/0xe0 > > [] kill_anon_super+0x16/0x60 > > [] nfs4_kill_super+0x39/0x90 > > [] deactivate_locked_super+0x45/0x60 > > [] deactivate_super+0x49/0x70 > > [] mntput_no_expire+0x84/0xe0 > > [] release_mounts+0x9f/0xc0 > > [] put_mnt_ns+0x65/0x80 > > [] nfs_follow_remote_path+0x1e6/0x420 > > [] nfs4_try_mount+0x6f/0xd0 > > [] nfs4_get_sb+0xa2/0x360 > > [] vfs_kern_mount+0x88/0x1f0 > > [] do_kern_mount+0x52/0x130 > > [] ? _lock_kernel+0x6a/0x170 > > [] do_mount+0x26e/0x7f0 > > [] ? copy_mount_options+0xea/0x190 > > [] sys_mount+0x98/0xf0 > > [] system_call_fastpath+0x16/0x1b > > 1 lock held by mount.nfs4/3812: > > #0: (&type->s_umount_key#24){+.+...}, at: [] deactivate_super+0x41/0x70 > > I've tried both v2.6.36-rc2 and commit > 253c34e9b10c30d3064be654b5b78fbc1a8b1896 [1] - and I can mount an NFS > share with no problems: > > sudo mount -t nfs4 sauron:/home/dedekind/ /mnt/sauron_home/ > > works fine. Any hints about how to reproduce this are welcome. Huh. The simple mount hits it every time for me. I'll investigate some more. > I'll try to look at the code and figure out why this could happen. > > So, does the mount at some point succeed? Or it is blocked forever? And > sysrq-t output would be useful to look at as well. It's blocked forever as far as I can tell. I'll get a sysrq-t trace. > Also, it is strange that 'sys_mount()' involves 'nfs4_kill_super()' - is > this normal or this is an error path? NFSv4 uses a temporary private namespace to look up the initial mount path--see c02d7adf8c5429727a98bad1d039bccad4c61c50 and preceding commits for explanation. So this may well be normal (but I haven't looked at it closely). Hm, my mount path has a mountpoint in it--if sauron:/home/dedekind/ doesn't, then that's a difference between our setups. > [1]: the kernel tree does not compile on this commit, and I applied > patch on top to solve the compilation issue: > 387ac089361fbe5ef287e6950c5c40f6b18e5c55 "block: fix missing export of > blk_types.h" Maybe you only hit that if you do headers_install or headers_check? --b.