Return-Path: Received: from mail-pa0-f43.google.com ([209.85.220.43]:36608 "EHLO mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750822AbbHRGZ5 (ORCPT ); Tue, 18 Aug 2015 02:25:57 -0400 Received: by pawq9 with SMTP id q9so28099147paw.3 for ; Mon, 17 Aug 2015 23:25:57 -0700 (PDT) Subject: Re: [PATCH] SUNRPC: Change WARN_ON to an warn message in rpc_exit_task To: Scott Mayhew , Trond Myklebust References: <55D1C0C7.4040005@gmail.com> <20150817133535.GA5192@tonberry.usersys.redhat.com> Cc: "linux-nfs@vger.kernel.org" , kinglongmee@gmail.com From: Kinglong Mee Message-ID: <55D2CFE7.8050805@gmail.com> Date: Tue, 18 Aug 2015 14:25:43 +0800 MIME-Version: 1.0 In-Reply-To: <20150817133535.GA5192@tonberry.usersys.redhat.com> Content-Type: text/plain; charset=windows-1252 Sender: linux-nfs-owner@vger.kernel.org List-ID: On 8/17/2015 21:35, Scott Mayhew wrote: > On Mon, 17 Aug 2015, Kinglong Mee wrote: > >> When testing nfs4.0 lock, I get WARN_ON message as below, >> [ 278.733391] ------------[ cut here ]------------ >> [ 278.734496] WARNING: CPU: 0 PID: 263 at /root/linux-nfs/net/sunrpc/sched.c:692 >> rpc_exit_task+0x85/0x90 [sunrpc]() >> [ 278.736771] Modules linked in: nfsv4(OE) nfs(OE) nfsd(E) bnep bluetooth ip6t_rpfilter ip6t_REJECT >> xt_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables >> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security >> ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 >> nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw btrfs ppdev xfs >> microcode xor serio_raw libcrc32c pcspkr raid6_pq e1000 parport_pc parport i2c_piix4 i2c_core >> auth_rpcgss nfs_acl lockd sunrpc(OE) ata_generic pata_acpi [last unloaded: nfsd] >> [ 278.751462] CPU: 0 PID: 263 Comm: kworker/0:4 Tainted: G OE 3.16.0-rc2+ #28 >> [ 278.753364] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, >> BIOS 6.00 07/31/2013 >> [ 278.755710] Workqueue: rpciod rpc_async_schedule [sunrpc] >> [ 278.757112] 0000000000000000 00000000f6173352 ffff880072e5fd30 ffffffff816eacfa >> [ 278.758811] 0000000000000000 ffff880072e5fd68 ffffffff8108bf3d ffff88005997f180 >> [ 278.760527] ffff88003ae44830 ffffffffa001a860 ffffffffa001a860 0000000000000000 >> [ 278.762271] Call Trace: >> [ 278.762802] [] dump_stack+0x45/0x56 >> [ 278.763906] [] warn_slowpath_common+0x7d/0xa0 >> [ 278.765419] [] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] >> [ 278.767243] [] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] >> [ 278.768758] [] warn_slowpath_null+0x1a/0x20 >> [ 278.770080] [] rpc_exit_task+0x85/0x90 [sunrpc] >> [ 278.771439] [] __rpc_execute+0x94/0x420 [sunrpc] >> [ 278.772829] [] rpc_async_schedule+0x26/0x30 [sunrpc] >> [ 278.774276] [] process_one_work+0x175/0x430 >> [ 278.775573] [] worker_thread+0x11d/0x530 >> [ 278.776824] [] ? rescuer_thread+0x300/0x300 >> [ 278.778069] [] kthread+0xd8/0xf0 >> [ 278.779114] [] ? insert_kthread_work+0x40/0x40 >> [ 278.780448] [] ret_from_fork+0x7c/0xb0 >> [ 278.781621] [] ? insert_kthread_work+0x40/0x40 >> [ 278.782972] ---[ end trace 3350394592832e42 ]--- >> >> task1 (lock/unlock): >> --> call_decode >> --> rpc_exit_task (-10008) >> --> nfs4_release_lockowner_done >> --> rpc_restart_call_prepare >> task->tk_action = call_start; >> task2 (umount): >> --> rpc_shutdown_client >> --> rpc_killall_tasks >> rovr->tk_flags |= RPC_TASK_KILLED; (set task1 to KILLED) >> task1: >> --> rpc_exit_task >> WARN_ON(RPC_ASSASSINATED(task)); >> >> Signed-off-by: Kinglong Mee >> --- >> net/sunrpc/sched.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c >> index 337ca85..562db0a 100644 >> --- a/net/sunrpc/sched.c >> +++ b/net/sunrpc/sched.c >> @@ -689,7 +689,14 @@ void rpc_exit_task(struct rpc_task *task) >> if (task->tk_ops->rpc_call_done != NULL) { >> task->tk_ops->rpc_call_done(task, task->tk_calldata); >> if (task->tk_action != NULL) { >> - WARN_ON(RPC_ASSASSINATED(task)); >> + /* >> + * A known case here is, >> + * The RPC_TASK_KILLED is set in rpc_killall_tasks by >> + * another process. >> + */ >> + if (RPC_ASSASSINATED(task)) >> + printk(KERN_WARNING "RPC: dead task tried to" >> + " walk away.\n"); >> /* Always release the RPC slot and buffer memory */ >> xprt_release(task); >> rpc_reset_task_statistics(task); >> -- >> 2.4.3 >> > > If this is going to be changed anyway, could we maybe have some more > useful debugging information added to the message? Something like: > > printk(KERN_WARNING "RPC: %s to %s was killed\n", > task->tk_msg.rpc_proc->p_name, > rpc_peeraddr2str(task->tk_client, RPC_DISPLAY_ADDR)); > > I recently had a case where this warning was popping several times per > minute. The machine was both a client and a server... it turns out > that it was actually CB_NULLs that were causing the warning, and the > trigger for that was that several clients had the same value for the > clientaddr mount option. > > Having some more useful info from the warning message would have saved > some time there. If it's harmless of this case, remove the WARN_ON directly maybe better. What's your opinion, Trond? thanks, Kinglong Mee