Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx1.redhat.com ([209.132.183.28]:6575 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751716Ab2FKKWv (ORCPT ); Mon, 11 Jun 2012 06:22:51 -0400 Date: Mon, 11 Jun 2012 06:22:46 -0400 From: Jeff Layton To: Hans de Bruin Cc: Linux NFS mailing list Subject: Re: nfsroot client will not start firefox or thunderbird from 3.4.0 nfsserver Message-ID: <20120611062246.5938d1ff@corrin.poochiereds.net> In-Reply-To: <4FD4A77B.2030506@xmsnet.nl> References: <4FC3F9F6.7010107@xmsnet.nl> <20120610055208.05ea7ea7@tlielax.poochiereds.net> <4FD4A77B.2030506@xmsnet.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: On Sun, 10 Jun 2012 15:56:11 +0200 Hans de Bruin wrote: > On 06/10/2012 11:52 AM, Jeff Layton wrote: > > On Tue, 29 May 2012 00:19:34 +0200 > > Hans de Bruin wrote: > > > >> I just upgraded my home server from kernel 3.3.5 to 3.4.0 and ran into > >> some trouble. My laptop, a nfsroot client, will not run firefox and > >> thunderbird anymore. When I start these programs from an xterm, the > >> cursor goes to the next line and waits indefinitely. > >> > >> I do not know if there is any order is lsof's output. A lsof | grep > >> firefox or thunderbird shows ......./.parentlock as the last line. > >> > >> It does not matter whether the client is running a 3.4.0 or a 3.3.0 > >> kernel, or if the server is running on top of xen or not. > >> > >> There is some noise in the servers dmesg: > >> > >> [ 241.256684] INFO: task kworker/u:2:801 blocked for more than 120 seconds. > >> [ 241.256691] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" > >> disables this message. > >> [ 241.256698] kworker/u:2 D 000000000000000c 0 801 2 > >> 0x00000000 > >> [ 241.256710] ffff8801390e5cf0 0000000000000046 0000000000012d00 > >> 0000000000012d00 > >> [ 241.256721] 0000000000012d00 ffff880139f8bd50 0000000000012d00 > >> ffff8801390e5fd8 > >> [ 241.256732] ffff8801390e5fd8 0000000000012d00 ffff880139ce4420 > >> ffff880139f8bd50 > >> [ 241.256743] Call Trace: > >> [ 241.256759] [] schedule+0x64/0x66 > >> [ 241.256769] [] cld_pipe_upcall+0x95/0xd1 > >> [ 241.256780] [] ? nfsd4_exchange_id+0x23e/0x23e > >> [ 241.256789] [] nfsd4_cld_grace_done+0x50/0x8a > >> [ 241.256798] [] nfsd4_record_grace_done+0x18/0x1a > >> [ 241.256807] [] laundromat_main+0x4a/0x213 > >> [ 241.256818] [] ? need_resched+0x1e/0x28 > >> [ 241.256826] [] ? __schedule+0x49d/0x4b5 > >> [ 241.256835] [] ? nfsd4_exchange_id+0x23e/0x23e > >> [ 241.256844] [] process_one_work+0x190/0x28d > >> [ 241.256854] [] worker_thread+0x105/0x189 > >> [ 241.256862] [] ? _raw_spin_unlock_irqrestore+0x1a/0x1d > >> [ 241.256872] [] ? manage_workers.clone.17+0x173/0x173 > >> [ 241.256881] [] kthread+0x8a/0x92 > >> [ 241.256891] [] kernel_thread_helper+0x4/0x10 > >> [ 241.256900] [] ? > >> kthread_freezable_should_stop+0x47/0x47 > >> [ 241.256909] [] ? gs_change+0x13/0x13 > >> > >> or xenified: > >> > >> > >> [ 240.563448] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" > >> disables this message. > >> [ 240.563458] kworker/u:2 D ffff88007fc12d00 0 808 2 > >> 0x00000000 > >> [ 240.563479] ffff88007532fcf0 0000000000000246 0000000000012d00 > >> 0000000000012d00 > >> [ 240.563504] 0000000000012d00 ffff880075f7caf0 0000000000012d00 > >> ffff88007532ffd8 > >> [ 240.563530] ffff88007532ffd8 0000000000012d00 ffffffff81813020 > >> ffff880075f7caf0 > >> [ 240.563555] Call Trace: > >> [ 240.563578] [] schedule+0x64/0x66 > >> [ 240.563594] [] cld_pipe_upcall+0x95/0xd1 > >> [ 240.563611] [] ? nfsd4_exchange_id+0x23e/0x23e > >> [ 240.563625] [] nfsd4_cld_grace_done+0x50/0x8a > >> [ 240.563640] [] nfsd4_record_grace_done+0x18/0x1a > >> [ 240.563654] [] laundromat_main+0x4a/0x213 > >> [ 240.563670] [] ? xen_spin_unlock+0x12/0x30 > >> [ 240.563685] [] ? nfsd4_exchange_id+0x23e/0x23e > >> [ 240.563700] [] process_one_work+0x190/0x28d > >> [ 240.563714] [] ? xen_spin_lock+0xb/0xd > >> [ 240.563729] [] worker_thread+0x105/0x189 > >> [ 240.563743] [] ? _raw_spin_unlock_irqrestore+0x1a/0x1d > >> [ 240.563758] [] ? manage_workers.clone.17+0x173/0x173 > >> [ 240.563772] [] kthread+0x8a/0x92 > >> [ 240.563787] [] kernel_thread_helper+0x4/0x10 > >> [ 240.563802] [] ? retint_restore_args+0x5/0x6 > >> [ 240.563816] [] ? gs_change+0x13/0x13 > >> > >> > > > > It sounds like you're not running the new nfsdcld daemon on the server, > > and /var/lib/nfs/v4recovery does not exist. Is that correct? > > > > Yes that correct. When I create the /var/lib/nfs/v4recovery directory my > problems are gone. > > On the server and the client v3 and v4 are compiled into the kernel. Al > mounts are v3. Should I either remove v4 from the kernels or upgrade > nfs-utils-1.2.3 to something newer? > > (a search on the linux-nfs wiki for nfsdcld does not return any hits) > Just creating the legacy state tracking directory should be enough to work around the problem for now. Eventually you'll want to move to a newer nfs-utils (1.2.5 or later) and run nfsdcld to handle the v4 client tracking. Alternately, you can just build the kernel without v4 support, but that's really not necessary. Now, that said...this upcall should be timing out in 30s, so it's not clear to me why the daemon is hanging for 120s. I'll have to see if I can reproduce this and track down the problem. -- Jeff Layton