Return-Path: Received: from mgw-sa02.nokia.com ([147.243.1.48]:18969 "EHLO mgw-sa02.nokia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753982Ab0H0GO5 (ORCPT ); Fri, 27 Aug 2010 02:14:57 -0400 Subject: Re: hang in writeback code on nfsv4 mount From: Artem Bityutskiy Reply-To: Artem.Bityutskiy@nokia.com To: "J. Bruce Fields" , Jens Axboe Cc: "linux-nfs@vger.kernel.org" , Trond Myklebust , Christoph Hellwig In-Reply-To: <20100825023425.GA24591@fieldses.org> References: <20100825023425.GA24591@fieldses.org> Content-Type: text/plain; charset="UTF-8" Date: Fri, 27 Aug 2010 09:13:15 +0300 Message-ID: <1282889595.2763.14.camel@localhost> Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Wed, 2010-08-25 at 04:34 +0200, ext J. Bruce Fields wrote: > As of 253c34e9b10c30d3064be654b5b78fbc1a8b1896 "writeback: prevent > unnecessary bdi threads wakeups", any nfs mount hangs for me. Is this a > known issue? > > --b. > > INFO: task mount.nfs4:3812 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > mount.nfs4 D 0000000000000000 2880 3812 3811 0x00000000 > ffff88001ed25a28 0000000000000046 ffff88001ed25fd8 ffff88001ed25fd8 > ffff88001ed24000 ffff88001ed24000 ffff88001ed24000 ffff88001f9503a0 > ffff88001ed25fd8 ffff88001f9503a8 ffff88001ed24000 ffff88001ed25fd8 > Call Trace: > [] schedule_timeout+0x1cd/0x2e0 > [] ? mark_held_locks+0x6c/0xa0 > [] ? _raw_spin_unlock_irq+0x30/0x60 > [] ? trace_hardirqs_on_caller+0x14d/0x190 > [] ? sub_preempt_count+0xe/0xd0 > [] wait_for_common+0x120/0x190 > [] ? default_wake_function+0x0/0x20 > [] wait_for_completion+0x1d/0x20 > [] kthread_stop+0x4a/0x150 > [] ? thaw_process+0x70/0x80 > [] bdi_unregister+0x10a/0x1a0 > [] nfs_put_super+0x19/0x20 > [] generic_shutdown_super+0x54/0xe0 > [] kill_anon_super+0x16/0x60 > [] nfs4_kill_super+0x39/0x90 > [] deactivate_locked_super+0x45/0x60 > [] deactivate_super+0x49/0x70 > [] mntput_no_expire+0x84/0xe0 > [] release_mounts+0x9f/0xc0 > [] put_mnt_ns+0x65/0x80 > [] nfs_follow_remote_path+0x1e6/0x420 > [] nfs4_try_mount+0x6f/0xd0 > [] nfs4_get_sb+0xa2/0x360 > [] vfs_kern_mount+0x88/0x1f0 > [] do_kern_mount+0x52/0x130 > [] ? _lock_kernel+0x6a/0x170 > [] do_mount+0x26e/0x7f0 > [] ? copy_mount_options+0xea/0x190 > [] sys_mount+0x98/0xf0 > [] system_call_fastpath+0x16/0x1b > 1 lock held by mount.nfs4/3812: > #0: (&type->s_umount_key#24){+.+...}, at: [] deactivate_super+0x41/0x70 Bruce, I can reproduce this by doing mount/fsstress/unmount, but it takes few hours for me. I added few printks and tried to reproduce this issue, but hit another issue. I am not sure the root cause for your issue is the same or not, but here is the bugfix anyway. Too bad I have to fly and cannot continue investigating. But I'll start this again as soon as I can. Jens, please, consider taking this to 2.6.36. >From aa87b87c8555151bb8e083d0dba3fd68c5b0d811 Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy Date: Fri, 27 Aug 2010 08:21:18 +0300 Subject: [PATCH] writeback: do not lose wakeup events when forking bdi threads This patch fixes the following issue: INFO: task mount.nfs4:1120 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.nfs4 D 00000000fffc6a21 0 1120 1119 0x00000000 ffff880235643948 0000000000000046 ffffffff00000000 ffffffff00000000 ffff880235643fd8 ffff880235314760 00000000001d44c0 ffff880235643fd8 00000000001d44c0 00000000001d44c0 00000000001d44c0 00000000001d44c0 Call Trace: [] schedule_timeout+0x34/0xf1 [] ? wait_for_common+0x3f/0x130 [] ? trace_hardirqs_on+0xd/0xf [] wait_for_common+0xd2/0x130 [] ? default_wake_function+0x0/0xf [] ? _raw_spin_unlock+0x26/0x2a [] wait_for_completion+0x18/0x1a [] sync_inodes_sb+0xca/0x1bc [] __sync_filesystem+0x47/0x7e [] sync_filesystem+0x47/0x4b [] generic_shutdown_super+0x22/0xd2 [] kill_anon_super+0x11/0x4f [] nfs4_kill_super+0x3f/0x72 [nfs] [] deactivate_locked_super+0x21/0x41 [] deactivate_super+0x40/0x45 [] mntput_no_expire+0xb8/0xed [] release_mounts+0x9a/0xb0 [] put_mnt_ns+0x6a/0x7b [] nfs_follow_remote_path+0x19a/0x296 [nfs] [] nfs4_try_mount+0x75/0xaf [nfs] [] nfs4_get_sb+0x276/0x2ff [nfs] [] vfs_kern_mount+0xb8/0x196 [] do_kern_mount+0x48/0xe8 [] do_mount+0x771/0x7e8 [] sys_mount+0x83/0xbd [] system_call_fastpath+0x16/0x1b The reason of this hang was a race condition: when the flusher thread is forking a bdi thread, we use 'kthread_run()', so we run it _before_ we make it visible in 'bdi->wb.task'. The bdi thread runs, does all works, and goes sleep. 'bdi->wb.task' is still NULL. And this is a dangerous time window. If at this time someone queues a work for this bdi, he does not see the bdi thread and wakes up the forker thread instead! But the forker has already forked this bdi thread, but just did not make it visible yet! The result is that we lose the wake up event for this bdi thread and the NFS4 code waits forever. To fix the problem, we should use 'ktrhead_create()' for creating bdi threads, then make them visible in 'bdi->wb.task', and only after this wake them up. This is exactly what this patch does. Signed-off-by: Artem Bityutskiy --- mm/backing-dev.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index a9a08d8..a9e0ec2 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -414,8 +414,8 @@ static int bdi_forker_thread(void *ptr) switch (action) { case FORK_THREAD: __set_current_state(TASK_RUNNING); - task = kthread_run(bdi_writeback_thread, &bdi->wb, "flush-%s", - dev_name(bdi->dev)); + task = kthread_create(bdi_writeback_thread, &bdi->wb, + "flush-%s", dev_name(bdi->dev)); if (IS_ERR(task)) { /* * If thread creation fails, force writeout of @@ -426,10 +426,13 @@ static int bdi_forker_thread(void *ptr) /* * The spinlock makes sure we do not lose * wake-ups when racing with 'bdi_queue_work()'. + * And as soon as the bdi thread is visible, we + * can start it. */ spin_lock(&bdi->wb_lock); bdi->wb.task = task; spin_unlock(&bdi->wb_lock); + wake_up_process(task); } break; -- 1.7.1.1 -- Best Regards, Artem Bityutskiy (Артём Битюцкий)