Return-Path: Received: from us-smtp-delivery-194.mimecast.com ([63.128.21.194]:53910 "EHLO us-smtp-delivery-194.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755333AbcIRXJE (ORCPT ); Sun, 18 Sep 2016 19:09:04 -0400 From: Trond Myklebust To: Oleg Drokin CC: Schumaker Anna , List Linux NFS Mailing Subject: Re: [PATCH v5 00/25] Fix delegation behaviour when server revokes some state Date: Sun, 18 Sep 2016 23:08:57 +0000 Message-ID: References: <1474089213-104014-1-git-send-email-trond.myklebust@primarydata.com> <838E325D-2DA1-4229-A046-76316302F813@linuxhacker.ru> <1474140727.7526.1.camel@primarydata.com> <1474148731.7526.3.camel@primarydata.com> <96CBC4AF-B0EA-4AFA-8068-E93CBACD1B0B@linuxhacker.ru> <8E0CD004-8BAE-4EC0-BAA2-ED7B83F85610@linuxhacker.ru> <2FBB0CA9-F6B4-4D80-B9C5-387DE4D267DD@primarydata.com> <3C3FEBB3-302A-4E22-ADD9-26DA1E6FAE3C@linuxhacker.ru> In-Reply-To: <3C3FEBB3-302A-4E22-ADD9-26DA1E6FAE3C@linuxhacker.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=WINDOWS-1252 Sender: linux-nfs-owner@vger.kernel.org List-ID: > On Sep 18, 2016, at 18:44, Oleg Drokin wrote: >=20 >=20 > On Sep 18, 2016, at 6:38 PM, Trond Myklebust wrote: >=20 >>=20 >>> On Sep 18, 2016, at 12:23, Oleg Drokin wrote: >>>=20 >>>=20 >>> On Sep 17, 2016, at 9:19 PM, Oleg Drokin wrote: >>>=20 >>>>=20 >>>> On Sep 17, 2016, at 5:59 PM, Trond Myklebust wrote: >>>>=20 >>>>>=20 >>>>>> On Sep 17, 2016, at 17:55, Oleg Drokin wrote: >>>>>>=20 >>>>>>=20 >>>>>> On Sep 17, 2016, at 5:45 PM, Trond Myklebust wrote: >>>>>>=20 >>>>>>> On Sat, 2016-09-17 at 15:32 -0400, Trond Myklebust wrote: >>>>>>>> On Sat, 2016-09-17 at 15:16 -0400, Oleg Drokin wrote: >>>>>>>>>=20 >>>>>>>>> On Sep 17, 2016, at 2:18 PM, Trond Myklebust wrote: >>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> On Sep 17, 2016, at 14:04, Oleg Drokin >>>>>>>>>>> wrote: >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> On Sep 17, 2016, at 1:13 AM, Trond Myklebust wrote: >>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>> According to RFC5661, if any of the SEQUENCE status bits >>>>>>>>>>>> SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, >>>>>>>>>>>> SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, >>>>>>>>>>>> SEQ4_STATUS_ADMIN_STATE_REVOKED, >>>>>>>>>>>> or SEQ4_STATUS_RECALLABLE_STATE_REVOKED are set, then we need >>>>>>>>>>>> to use >>>>>>>>>>>> TEST_STATEID to figure out which stateids have been revoked, >>>>>>>>>>>> so >>>>>>>>>>>> we >>>>>>>>>>>> can acknowledge the loss of state using FREE_STATEID. >>>>>>>>>>>>=20 >>>>>>>>>>>> While we already do this for open and lock state, we have not >>>>>>>>>>>> been doing >>>>>>>>>>>> so for all the delegations. >>>>>>>>>>>>=20 >>>>>>>>>>>> v2: nfs_v4_2_minor_ops needs to set .test_and_free_expired >>>>>>>>>>>> too >>>>>>>>>>>> v3: Now with added lock revoke fixes and >>>>>>>>>>>> close/delegreturn/locku fixes >>>>>>>>>>>> v4: Close a bunch of corner cases >>>>>>>>>>>> v5: Report revoked delegations as invalid in >>>>>>>>>>>> nfs_have_delegation() >>>>>>>>>>>> Fix an infinite loop in nfs_reap_expired_delegations. >>>>>>>>>>>> Fixes for other looping behaviour >>>>>>>>>>>=20 >>>>>>>>>>> This time around the loop seems to be more tight, >>>>>>>>>>> in userspace process: >>>>>>>>>>>=20 >>>>>>>>>>> [ 9197.256571] --> nfs41_call_sync_prepare data->seq_server >>>>>>>>>>> ffff8800a73ce000 >>>>>>>>>>> [ 9197.256572] --> nfs41_setup_sequence >>>>>>>>>>> [ 9197.256573] --> nfs4_alloc_slot used_slots=3D0000 >>>>>>>>>>> highest_used=3D4294967295 max_slots=3D31 >>>>>>>>>>> [ 9197.256574] <-- nfs4_alloc_slot used_slots=3D0001 >>>>>>>>>>> highest_used=3D0 >>>>>>>>>>> slotid=3D0 >>>>>>>>>>> [ 9197.256574] <-- nfs41_setup_sequence slotid=3D0 seqid=3D1401= 3800 >>>>>>>>>>> [ 9197.256582] encode_sequence: sessionid=3D1474126170:1:2:0 >>>>>>>>>>> seqid=3D14013800 slotid=3D0 max_slotid=3D0 cache_this=3D1 >>>>>>>>>>> [ 9197.256755] --> nfs4_alloc_slot used_slots=3D0001 >>>>>>>>>>> highest_used=3D0 >>>>>>>>>>> max_slots=3D31 >>>>>>>>>>> [ 9197.256756] <-- nfs4_alloc_slot used_slots=3D0003 >>>>>>>>>>> highest_used=3D1 >>>>>>>>>>> slotid=3D1 >>>>>>>>>>> [ 9197.256757] nfs4_free_slot: slotid 1 highest_used_slotid 0 >>>>>>>>>>> [ 9197.256758] nfs41_sequence_process: Error 0 free the slot=20 >>>>>>>>>>> [ 9197.256760] nfs4_free_slot: slotid 0 highest_used_slotid >>>>>>>>>>> 4294967295 >>>>>>>>>>> [ 9197.256779] --> nfs_put_client({2}) >>>>>>>>>>=20 >>>>>>>>>> What operation is the userspace process hanging on? Do you have = a >>>>>>>>>> stack trace for it? >>>>>>>>>=20 >>>>>>>>> seems to be open_create->truncate->ssetattr coming from: >>>>>>>>> cp /bin/sleep /mnt/nfs2/racer/12 >>>>>>>>>=20 >>>>>>>>> (gdb) bt >>>>>>>>> #0 nfs41_setup_sequence (session=3D0xffff88005a853800, >>>>>>>>> args=3D0xffff8800a7253b80,=20 >>>>>>>>> res=3D0xffff8800a7253b48, task=3D0xffff8800b0eb0f00) >>>>>>>>> at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:876 >>>>>>>>> #1 0xffffffff813a751c in nfs41_call_sync_prepare (task=3D>>>>>>>> out>,=20 >>>>>>>>> calldata=3D0xffff8800a7253b80) >>>>>>>>> at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:966 >>>>>>>>> #2 0xffffffff8185c639 in rpc_prepare_task (task=3D) >>>>>>>>> at /home/green/bk/linux-test/net/sunrpc/sched.c:683 >>>>>>>>> #3 0xffffffff8185f12b in __rpc_execute (task=3D0xffff88005a85380= 0) >>>>>>>>> at /home/green/bk/linux-test/net/sunrpc/sched.c:775 >>>>>>>>> #4 0xffffffff818617b4 in rpc_execute (task=3D0xffff88005a853800) >>>>>>>>> at /home/green/bk/linux-test/net/sunrpc/sched.c:843 >>>>>>>>> #5 0xffffffff818539b9 in rpc_run_task >>>>>>>>> (task_setup_data=3D0xffff8800a7253a50) >>>>>>>>> at /home/green/bk/linux-test/net/sunrpc/clnt.c:1052 >>>>>>>>> #6 0xffffffff813a75e3 in nfs4_call_sync_sequence (clnt=3D>>>>>>>> out>,=20 >>>>>>>>> server=3D, msg=3D, args=3D>>>>>>>> out>,=20 >>>>>>>>> res=3D) at /home/green/bk/linux- >>>>>>>>> test/fs/nfs/nfs4proc.c:1051 >>>>>>>>> #7 0xffffffff813b4645 in nfs4_call_sync (cache_reply=3D>>>>>>>> out>,=20 >>>>>>>>> res=3D, args=3D, msg=3D>>>>>>>> out>,=20 >>>>>>>>> server=3D, clnt=3D) >>>>>>>>> at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:1069 >>>>>>>>> #8 _nfs4_do_setattr (state=3D, cred=3D,=20 >>>>>>>>> res=3D, arg=3D, inode=3D>>>>>>>> out>) >>>>>>>>> ---Type to continue, or q to quit--- >>>>>>>>> at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:2916 >>>>>>>>> #9 nfs4_do_setattr (inode=3D0xffff880079b152a8, cred=3D>>>>>>>> out>,=20 >>>>>>>>> fattr=3D, sattr=3D, >>>>>>>>> state=3D0xffff880060588e00,=20 >>>>>>>>> ilabel=3D, olabel=3D0x0 ) >>>>>>>>> at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:2955 >>>>>>>>> #10 0xffffffff813b4a16 in nfs4_proc_setattr (dentry=3D>>>>>>>> out>,=20 >>>>>>>>> fattr=3D0xffff8800a7253b80, sattr=3D0xffff8800a7253b48) >>>>>>>>> at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:3684 >>>>>>>>> #11 0xffffffff8138f1cb in nfs_setattr (dentry=3D0xffff8800740c100= 0,=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Cool! Does the following help? >>>>>>>=20 >>>>>>> Grrr... There is another bug there=85 >>>>>>=20 >>>>>> Is this in addition to the previous patch or instead of? >>>>>=20 >>>>> It can apply on top of it. The two patches fix different bugs. >>>>=20 >>>> Ok, it does not seem to have helped: >>>=20 >>> Of the 10 nodes where I run this, I had some more failures overnight, >>> some of them different. >>>=20 >>> The other one I got was (I have 3 nodes in this state, >>> only one in the one reported yesterday.): >>=20 >>=20 >> I=92ve been trying desperately to reproduce the issue, but I can=92t mak= e knfsd return or revoke the delegations with the =93racer=94 script that y= ou pointed me to (I=92m using the stock knfsd from Linux 4.7.3). Is there a= nything else special about your setup that I should know about? >=20 >=20 > There is not really anything special here. > I have a full debug enabled kernel (I checking latest Linus' master befor= e > applying your whole patchset, but it reproduces the same on earlier ones = too). >=20 > In my particular case I run in VMs with overcommitted CPUs (that really h= elps to > promote races) and with HT enabled (that serves the same purpose). Interesting. I can try that. >=20 > I also boot my VMs from nfsroot, but I doubt it plays any role here. >=20 Do they all have different hostnames (and/or are they set up with unique nf= s.nfs4_unique_id module parameters) , or is there a chance that they might = be using the same client-id and thus be stepping on each others state? > Then I just ssh into the VMs and run the script and that is all. >=20 > I can show you my kernel config if you think it would help and I can also > provide you with access to my testbed if you think you want to take a clo= ser > look at my setup. Let me see if I can come up with some extra debugging dprintk()s that might= help understand that loop. The current set are apparently not sufficient t= o clue me in (sorry.. :-( )