2008-04-29 21:57:08

by J. Bruce Fields

[permalink] [raw]
Subject: lock reclaims outside grace period

Current lockd code appears to reject regular locks done during the grace
period, but not reclaims that come outside of the grace period.

(That's based on inspecting the code--I haven't run tests.)

That seems like an obvious bug. (We're not giving the client any way to
determine whether conflicting locks might have been granted.)

Can we fix it, or is there a chance that people have been depending on
this behavior? (Maybe for failing over to an already-active server??)

--b.


2008-04-29 22:18:54

by Trond Myklebust

[permalink] [raw]
Subject: Re: lock reclaims outside grace period


On Tue, 2008-04-29 at 17:57 -0400, J. Bruce Fields wrote:
> Current lockd code appears to reject regular locks done during the grace
> period, but not reclaims that come outside of the grace period.
>
> (That's based on inspecting the code--I haven't run tests.)
>
> That seems like an obvious bug. (We're not giving the client any way to
> determine whether conflicting locks might have been granted.)
>
> Can we fix it, or is there a chance that people have been depending on
> this behavior? (Maybe for failing over to an already-active server??)

Sorry, but I really don't care if anyone has been relying on it: that is
a _major_ bug and needs to be fixed ASAP.

Trond


2008-04-30 00:03:53

by J. Bruce Fields

[permalink] [raw]
Subject: Re: lock reclaims outside grace period

On Tue, Apr 29, 2008 at 03:18:51PM -0700, Trond Myklebust wrote:
>
> On Tue, 2008-04-29 at 17:57 -0400, J. Bruce Fields wrote:
> > Current lockd code appears to reject regular locks done during the grace
> > period, but not reclaims that come outside of the grace period.
> >
> > (That's based on inspecting the code--I haven't run tests.)
> >
> > That seems like an obvious bug. (We're not giving the client any way to
> > determine whether conflicting locks might have been granted.)
> >
> > Can we fix it, or is there a chance that people have been depending on
> > this behavior? (Maybe for failing over to an already-active server??)
>
> Sorry, but I really don't care if anyone has been relying on it: that is
> a _major_ bug and needs to be fixed ASAP.

OK, good, I'll do some tests to confirm and then submit a patch.

When I ran across this I checked what specs I could find (mostly
wondering which error to return), and was surprised to find no mention
of this case. For example, from the Open Group XNFS spec
(http://www.opengroup.org/onlinepubs/9629799/):

"If "reclaim" is true, then the server will assume this is a
request to re-establish a previous lock (for example, after the
server has crashed and rebooted). During the grace period the
server will only accept locks with "reclaim" set to true."

But they don't state the converse.

And LCK_DENIED_GRACE_PERIOD "Indicates that the procedure failed because
the server host has recently been rebooted and the server NLM is
re-establishing existing locks, and is not yet ready to accept normal
service requests." But absent an objection I suppose I'll use
LCK_DENIED_GRACE_PERIOD for the other case too.

Anyway, it all made me worry whether ignoring the late-reclaim case was
actually standard behavior. It wouldn't be the only weird thing about
NLM.

--b.

2008-04-30 00:45:14

by Wendy Cheng

[permalink] [raw]
Subject: Re: lock reclaims outside grace period

J. Bruce Fields wrote:
> On Tue, Apr 29, 2008 at 03:18:51PM -0700, Trond Myklebust wrote:
>
>> On Tue, 2008-04-29 at 17:57 -0400, J. Bruce Fields wrote:
>>
>>> Current lockd code appears to reject regular locks done during the grace
>>> period, but not reclaims that come outside of the grace period.
>>>
>>> (That's based on inspecting the code--I haven't run tests.)
>>>
>>> That seems like an obvious bug. (We're not giving the client any way to
>>> determine whether conflicting locks might have been granted.)
>>>
>>> Can we fix it, or is there a chance that people have been depending on
>>> this behavior? (Maybe for failing over to an already-active server??)
>>>
>> Sorry, but I really don't care if anyone has been relying on it: that is
>> a _major_ bug and needs to be fixed ASAP.
>>
>
> OK, good, I'll do some tests to confirm and then submit a patch.
>

I can't disagree - but do prepare people start to ask why after kernel
version 2.6.x, they have to extend grace period to get NFS locking works
:) ...

-- Wendy


2008-05-02 20:04:13

by J. Bruce Fields

[permalink] [raw]
Subject: Re: lock reclaims outside grace period

On Tue, Apr 29, 2008 at 08:03:49PM -0400, bfields wrote:
> On Tue, Apr 29, 2008 at 03:18:51PM -0700, Trond Myklebust wrote:
> >
> > On Tue, 2008-04-29 at 17:57 -0400, J. Bruce Fields wrote:
> > > Current lockd code appears to reject regular locks done during the grace
> > > period, but not reclaims that come outside of the grace period.
> > >
> > > (That's based on inspecting the code--I haven't run tests.)
> > >
> > > That seems like an obvious bug. (We're not giving the client any way to
> > > determine whether conflicting locks might have been granted.)
> > >
> > > Can we fix it, or is there a chance that people have been depending on
> > > this behavior? (Maybe for failing over to an already-active server??)
> >
> > Sorry, but I really don't care if anyone has been relying on it: that is
> > a _major_ bug and needs to be fixed ASAP.
>
> OK, good, I'll do some tests to confirm and then submit a patch.

Well, I figured the easiest way to reproduce the problem would be just
by acquiring a lock on a client, then playing tricks with the network to
cause it to miss the grace period.

But I'm not getting statd to work--or at least, I'm not seeing any statd
activity on the network. There must be something basic wrong with my
configuration, but I haven't found it yet.

--b.

>
> When I ran across this I checked what specs I could find (mostly
> wondering which error to return), and was surprised to find no mention
> of this case. For example, from the Open Group XNFS spec
> (http://www.opengroup.org/onlinepubs/9629799/):
>
> "If "reclaim" is true, then the server will assume this is a
> request to re-establish a previous lock (for example, after the
> server has crashed and rebooted). During the grace period the
> server will only accept locks with "reclaim" set to true."
>
> But they don't state the converse.
>
> And LCK_DENIED_GRACE_PERIOD "Indicates that the procedure failed because
> the server host has recently been rebooted and the server NLM is
> re-establishing existing locks, and is not yet ready to accept normal
> service requests." But absent an objection I suppose I'll use
> LCK_DENIED_GRACE_PERIOD for the other case too.
>
> Anyway, it all made me worry whether ignoring the late-reclaim case was
> actually standard behavior. It wouldn't be the only weird thing about
> NLM.
>
> --b.