Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-ig0-f174.google.com ([209.85.213.174]:52177 "EHLO mail-ig0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752362AbaLCXuL (ORCPT ); Wed, 3 Dec 2014 18:50:11 -0500 Received: by mail-ig0-f174.google.com with SMTP id hn15so17818567igb.7 for ; Wed, 03 Dec 2014 15:50:10 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: Date: Wed, 3 Dec 2014 18:50:10 -0500 Message-ID: Subject: Re: not picking a delegation stateid for IO when delegation stateid is being returned From: Olga Kornievskaia To: Trond Myklebust Cc: Linux NFS Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Dec 3, 2014 at 6:34 PM, Trond Myklebust wrote: > > On Dec 3, 2014 6:21 PM, "Olga Kornievskaia" wrote: >> >> On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust >> wrote: >> > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia >> > wrote: >> >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust >> >> wrote: >> >>> Hi Olga, >> >>> >> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia >> >>> wrote: >> >>>> Hi folks, >> >>>> >> >>>> I would like an opinion about changing code in such as way that we >> >>>> don't select a delegation stateid for an IO operation when this >> >>>> particular delegation is being returned. >> >>>> >> >>>> The reason it's some what problematic is that we send out a >> >>>> DELEG_RETURN and then we don't remove the stateid until we receive a >> >>>> reply. In the mean while, an IO operation can be happening and in >> >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >> >>>> Well, at the server, it finishes processing DELEG_RETURN before >> >>>> getting an IO op and by that time the server is finished with the >> >>>> stateid and can error an IO operation with BAD_STATEID. >> >>>> >> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >> >>>> index 7f3f606..4f6f6c9 100644 >> >>>> --- a/fs/nfs/delegation.c >> >>>> +++ b/fs/nfs/delegation.c >> >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >> >>>> *dst, struct inode *in >> >>>> flags &= FMODE_READ|FMODE_WRITE; >> >>>> rcu_read_lock(); >> >>>> delegation = rcu_dereference(nfsi->delegation); >> >>>> - ret = (delegation != NULL && (delegation->type & flags) == >> >>>> flags); >> >>>> + ret = (delegation != NULL && (delegation->type & flags) == >> >>>> flags && >> >>>> + !test_bit(NFS_DELEGATION_RETURNING, >> >>>> &delegation->flags)); >> >>>> if (ret) { >> >>>> nfs4_stateid_copy(dst, &delegation->stateid); >> >>>> nfs_mark_delegation_referenced(delegation); >> >>> >> >>> The above cannot eliminate the possibility that we won't use a >> >>> delegation while it is being returned. It will at best just reduce the >> >>> window of opportunity. >> >>> >> >> >> >> You are right this are still problems. Actually, we might set the bit >> >> but not yet get the open stateid from the open with deleg_cur and >> >> that's not good. It would be good to know we got the open stateid and >> >> then pick that. >> >> >> >>> So, why is this being considered to be a problem in the first place? >> >>> Are you seeing a measurable performance impact on a real life workload >> >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? >> >> >> >> Unfortunately, this problem is quite common and I hit it all the time >> >> on my setup. This leads to client seizing IO on that file and >> >> returning EIO. It's an unrecoverable error. I'm trying to figure out >> >> how to eliminate getting to that state. >> >> >> > >> > It definitely isn't intended to be an irrecoverable error. The client >> > is supposed to just replay the write after updating the stateid. >> >> open(deleg_cur) call / reply >> lock() call/reply >> deleg_return() call >> write(with deluge_stateid) call gets BAD_STATEID >> state recovery code marks lock state lost -> EIO. > > Why is it marking the lock as lost? If the recovery succeeded, it should > notice that the stateid has changed and instead retry. I'll get you a better explanation tomorrow besides saying "that's what I see when I run the code". > What kernel is this? This is upstream.