Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx1.redhat.com ([209.132.183.28]:58893 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757643Ab3CDOXQ (ORCPT ); Mon, 4 Mar 2013 09:23:16 -0500 Date: Mon, 4 Mar 2013 09:23:10 -0500 From: Jeff Layton To: "Myklebust, Trond" Cc: Ming Lei , "J. Bruce Fields" , Linux Kernel Mailing List , "linux-nfs@vger.kernel.org" , Mandeep Singh Baines Subject: Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! Message-ID: <20130304092310.1d21100c@tlielax.poochiereds.net> In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA9286AD113@sacexcmbx05-prd.hq.netapp.com> References: <4FA345DA4F4AE44899BD2B03EEEC2FA9286AD113@sacexcmbx05-prd.hq.netapp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: On Mon, 4 Mar 2013 14:14:23 +0000 "Myklebust, Trond" wrote: > On Mon, 2013-03-04 at 21:57 +0800, Ming Lei wrote: > > Hi, > > > > The below warning can be triggered each time when mount.nfs is > > running on 3.9-rc1. > > > > Not sure if freezable_schedule() inside rpc_wait_bit_killable should > > be changed to schedule() since nfs_clid_init_mutex is held in the path. > > Cc:ing Jeff, who added freezable_schedule(), and applied it to > rpc_wait_bit_killable. > > So this is occurring when the kernel enters the freeze state? > Why does it occur only with nfs_clid_init_mutex, and not with all the > other mutexes that we hold across RPC calls? We hold inode->i_mutex > across RPC calls all the time when doing renames, unlinks, file > creation,... > cc'ing Mandeep as well since his patch caused this to start popping up... We've also gotten some similar lockdep pops in the nfsd code recently. I responded to an email about it here on Friday, and I'll re-post what I said there below: -------------------------[snip]---------------------- Ok, I see... rpc_wait_bit_killable() calls freezable_schedule(). That calls freezer_count() which calls try_to_freeze(). try_to_freeze does this lockdep check now as of commit 6aa9707099. The assumption seems to be that freezing a thread while holding any sort of lock is bad. The rationale in that patch seems a bit sketchy to me though. We can be fairly certain that we're not going to deadlock by holding these locks, but I guess there could be something I've missed. Mandeep, can you elaborate on whether there's really a deadlock scenario here? If not, then is there some way to annotate these locks so this lockdep pop goes away? -------------------------[snip]---------------------- > > [ 41.387939] ===================================== > > [ 41.392913] [ BUG: mount.nfs/643 still has locks held! ] > > [ 41.398559] 3.9.0-rc1+ #1740 Not tainted > > [ 41.402709] ------------------------------------- > > [ 41.407714] 1 lock held by mount.nfs/643: > > [ 41.411956] #0: (nfs_clid_init_mutex){+.+...}, at: [] > > nfs4_discover_server_trunking+0x60/0x1d4 > > [ 41.422363] > > [ 41.422363] stack backtrace: > > [ 41.427032] [] (unwind_backtrace+0x0/0xe0) from > > [] (rpc_wait_bit_killable+0x38/0xc8) > > [ 41.437103] [] (rpc_wait_bit_killable+0x38/0xc8) from > > [] (__wait_on_bit+0x54/0x9c) > > [ 41.446990] [] (__wait_on_bit+0x54/0x9c) from > > [] (out_of_line_wait_on_bit+0x78/0x84) > > [ 41.457061] [] (out_of_line_wait_on_bit+0x78/0x84) from > > [] (__rpc_execute+0x170/0x348) > > [ 41.467407] [] (__rpc_execute+0x170/0x348) from > > [] (rpc_run_task+0x9c/0xa4) > > [ 41.476715] [] (rpc_run_task+0x9c/0xa4) from [] > > (rpc_call_sync+0x70/0xb0) > > [ 41.485778] [] (rpc_call_sync+0x70/0xb0) from > > [] (nfs4_proc_setclientid+0x1a0/0x1c8) > > [ 41.495819] [] (nfs4_proc_setclientid+0x1a0/0x1c8) from > > [] (nfs40_discover_server_trunki > > ng+0xec/0x148) > > [ 41.507507] [] > > (nfs40_discover_server_trunking+0xec/0x148) from [] > > (nfs4_discover_server > > _trunking+0x94/0x1d4) > > [ 41.519866] [] (nfs4_discover_server_trunking+0x94/0x1d4) > > from [] (nfs4_init_client+0x15 > > 0/0x1b0) > > [ 41.531036] [] (nfs4_init_client+0x150/0x1b0) from > > [] (nfs_get_client+0x2cc/0x320) > > [ 41.540863] [] (nfs_get_client+0x2cc/0x320) from > > [] (nfs4_set_client+0x80/0xb0) > > [ 41.550476] [] (nfs4_set_client+0x80/0xb0) from > > [] (nfs4_create_server+0xb0/0x21c) > > [ 41.560333] [] (nfs4_create_server+0xb0/0x21c) from > > [] (nfs4_remote_mount+0x28/0x54) > > [ 41.570373] [] (nfs4_remote_mount+0x28/0x54) from > > [] (mount_fs+0x6c/0x160) > > [ 41.579498] [] (mount_fs+0x6c/0x160) from [] > > (vfs_kern_mount+0x4c/0xc0) > > [ 41.588378] [] (vfs_kern_mount+0x4c/0xc0) from > > [] (nfs_do_root_mount+0x74/0x90) > > [ 41.597961] [] (nfs_do_root_mount+0x74/0x90) from > > [] (nfs4_try_mount+0x24/0x3c) > > [ 41.607513] [] (nfs4_try_mount+0x24/0x3c) from > > [] (nfs_fs_mount+0x6dc/0x7a0) > > [ 41.616821] [] (nfs_fs_mount+0x6dc/0x7a0) from > > [] (mount_fs+0x6c/0x160) > > [ 41.625701] [] (mount_fs+0x6c/0x160) from [] > > (vfs_kern_mount+0x4c/0xc0) > > [ 41.634582] [] (vfs_kern_mount+0x4c/0xc0) from > > [] (do_mount+0x710/0x81c) > > [ 41.643524] [] (do_mount+0x710/0x81c) from [] > > (sys_mount+0x84/0xb8) > > [ 41.652008] [] (sys_mount+0x84/0xb8) from [] > > (ret_fast_syscall+0x0/0x48) > > [ 41.715911] device: '0:28': device_add > > [ 41.720062] PM: Adding info for No Bus:0:28 > > [ 41.746887] device: '0:29': device_add > > [ 41.751037] PM: Adding info for No Bus:0:29 > > [ 41.780700] device: '0:28': device_unregister > > [ 41.785400] PM: Removing info for No Bus:0:28 > > [ 41.790344] device: '0:28': device_create_release > > > > > > Thanks, > > -- > > Ming Lei > -- Jeff Layton