Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx11.netapp.com ([216.240.18.76]:29958 "EHLO mx11.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754568Ab3LQQzO convert rfc822-to-8bit (ORCPT ); Tue, 17 Dec 2013 11:55:14 -0500 From: Weston Andros Adamson To: Trond Myklebust CC: linux-nfs list Subject: Re: [PATCH] sunrpc: Fix infinite loop in RPC state machine Date: Tue, 17 Dec 2013 16:55:08 +0000 Message-ID: <2553541F-C143-45E2-93B2-19039E2B69D4@netapp.com> References: <1387299103-2577-1-git-send-email-dros@netapp.com> In-Reply-To: <1387299103-2577-1-git-send-email-dros@netapp.com> Content-Type: text/plain; charset="Windows-1252" MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org List-ID: Resending to Trond?s correct email address. -dros On Dec 17, 2013, at 11:51 AM, Weston Andros Adamson wrote: > When a task enters call_refreshresult with status 0 from call_refresh and > !rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting > or max number of retries. > > Instead of trying forever, make use of the retry path that other errors use. > > This only seems to be possible when the crrefresh callback is gss_refresh_null, > which only happens when destroying the context. > > To reproduce: > > 1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific > operations). > > 2) reboot - the client will be stuck and will need to be hard rebooted > > BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46] > Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy > irq event stamp: 195724 > hardirqs last enabled at (195723): [] restore_args+0x0/0x30 > hardirqs last disabled at (195724): [] apic_timer_interrupt+0x6a/0x80 > softirqs last enabled at (195722): [] __do_softirq+0x1df/0x276 > softirqs last disabled at (195717): [] irq_exit+0x53/0x9a > CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4 > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 > Workqueue: rpciod rpc_async_schedule [sunrpc] > task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000 > RIP: 0010:[] [] __rpc_execute+0x8a/0x362 [sunrpc] > RSP: 0018:ffff880079003d18 EFLAGS: 00000246 > RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007 > RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900 > RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7 > R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e > R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900 > FS: 0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0 > Stack: > ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830 > ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260 > ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000 > Call Trace: > [] ? trace_hardirqs_on_caller+0x145/0x1a1 > [] rpc_async_schedule+0x27/0x32 [sunrpc] > [] process_one_work+0x211/0x3a5 > [] ? process_one_work+0x172/0x3a5 > [] worker_thread+0x134/0x202 > [] ? rescuer_thread+0x280/0x280 > [] ? rescuer_thread+0x280/0x280 > [] kthread+0xc9/0xd1 > [] ? __kthread_parkme+0x61/0x61 > [] ret_from_fork+0x7c/0xb0 > [] ? __kthread_parkme+0x61/0x61 > Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00 > > And the output of "rpcdebug -m rpc -s all": > > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refreshresult (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refreshresult (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refresh (status 0) > RPC: 61 call_refreshresult (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refreshresult (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refresh (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refreshresult (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refreshresult (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > Signed-off-by: Weston Andros Adamson > --- > net/sunrpc/clnt.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index f09b7db..5f585a7 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -1529,9 +1529,14 @@ call_refreshresult(struct rpc_task *task) > task->tk_action = call_refresh; > switch (status) { > case 0: > - if (rpcauth_uptodatecred(task)) > + if (rpcauth_uptodatecred(task)) { > task->tk_action = call_allocate; > - return; > + return; > + } > + /* > + * Use rate-limiting and a max number of retries if refresh > + * had status 0 but failed to update the cred. > + */ > case -ETIMEDOUT: > rpc_delay(task, 3*HZ); > case -EAGAIN: > -- > 1.8.3.1 (Apple Git-46) >