2017-12-07 16:10:18

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [nfsv4] questing about NLM LOCK, CANCEL UNLOCK

Ok. I thought that because RFC1813 covers NLM operations that it is.

I will extend this question to the Linux NFS mailing list as the
client implementation I'm interested is Linux.

On Thu, Dec 7, 2017 at 10:59 AM, Trond Myklebust <[email protected]> wrote:
> Hi Olga,
>
> NLM isn't an IETF protocol.
>
> Cheers
> Trond
>
> On 7 December 2017 at 10:57, Olga Kornievskaia <[email protected]> wrote:
>>
>> Hi folks,
>>
>> I'm looking for guidance for what are responsibilities of the client
>> or server to solve the following situation.
>>
>> Client application is acquiring a blocking lock but shortly after that
>> application is killed via ctrl-C. NFS client sent out NLM_LOCK but
>> hasn't gotten a reply yet as ctrl-c arrived. To clean up, the client
>> is sending CANCEL and UNLOCK to the server.
>>
>> Server ends up processing CANCEL and UNLOCK first (for whatever reason
>> one legitimate one could be network re-ordered packets) and server has
>> no state so it's just sending GRANTED replies back (which seems to be
>> a legitimate thing to do). Then server processes LOCK and replies to
>> the client.
>>
>> Now we are in the situation where lock was granted to a client that
>> doesn't know it's holding one and it prevents other clients from
>> grabbing this lock.
>>
>> Is there a solution or this is broken protocol?
>>
>> Should it be client's responsibility to notice that it received a LOCK
>> reply for which it wasn't waiting and always follow up with an UNLOCK?
>>
>> Thank you.
>>
>> _______________________________________________
>> nfsv4 mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/nfsv4
>
>


2017-12-07 16:58:39

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [nfsv4] questing about NLM LOCK, CANCEL UNLOCK

On Thu, Dec 07, 2017 at 11:10:16AM -0500, Olga Kornievskaia wrote:
> Ok. I thought that because RFC1813 covers NLM operations that it is.

Yeah, I don't see a harm to the occasional NLM question on the v4
working group list. It's a bit of an orphaned protocol, so there's not
really any other implementation-independent forum.

> I will extend this question to the Linux NFS mailing list as the
> client implementation I'm interested is Linux.

But that's fine too.

I thought that LOCK/CANCEL race was one of the motivations for NFSv4,
so...

> >> Is there a solution or this is broken protocol?

... I'd always assumed the protocol was impossible to implement 100%
correctly, though maybe there's some clever solution.

> >> Should it be client's responsibility to notice that it received a LOCK
> >> reply for which it wasn't waiting and always follow up with an UNLOCK?

That would be tricky and still not handle all cases, I think.

--b.

2017-12-07 18:00:34

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [nfsv4] questing about NLM LOCK, CANCEL UNLOCK

On Thu, Dec 7, 2017 at 11:58 AM, J. Bruce Fields <[email protected]> wrote:
> On Thu, Dec 07, 2017 at 11:10:16AM -0500, Olga Kornievskaia wrote:
>> Ok. I thought that because RFC1813 covers NLM operations that it is.
>
> Yeah, I don't see a harm to the occasional NLM question on the v4
> working group list. It's a bit of an orphaned protocol, so there's not
> really any other implementation-independent forum.
>
>> I will extend this question to the Linux NFS mailing list as the
>> client implementation I'm interested is Linux.
>
> But that's fine too.
>
> I thought that LOCK/CANCEL race was one of the motivations for NFSv4,
> so...
>
>> >> Is there a solution or this is broken protocol?
>
> ... I'd always assumed the protocol was impossible to implement 100%
> correctly, though maybe there's some clever solution.

Is such race still possible with the linux server implementation which
is single threaded and in my testing isn't processing LOCK/CANCEL
out-of-order then?

>> >> Should it be client's responsibility to notice that it received a LOCK
>> >> reply for which it wasn't waiting and always follow up with an UNLOCK?
>
> That would be tricky and still not handle all cases, I think.

Yes I'm not sure the client can do it. It will have no info about the
lock and not sure how to create an appropriate UNLOCK reply (unless we
keep around lock info of all cancelled locks).

>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2017-12-07 18:56:14

by Frank Filz

[permalink] [raw]
Subject: RE: [nfsv4] questing about NLM LOCK, CANCEL UNLOCK

> On Thu, Dec 7, 2017 at 11:58 AM, J. Bruce Fields <[email protected]>
> wrote:
> > On Thu, Dec 07, 2017 at 11:10:16AM -0500, Olga Kornievskaia wrote:
> >> Ok. I thought that because RFC1813 covers NLM operations that it is.
> >
> > Yeah, I don't see a harm to the occasional NLM question on the v4
> > working group list. It's a bit of an orphaned protocol, so there's
> > not really any other implementation-independent forum.
> >
> >> I will extend this question to the Linux NFS mailing list as the
> >> client implementation I'm interested is Linux.
> >
> > But that's fine too.
> >
> > I thought that LOCK/CANCEL race was one of the motivations for NFSv4,
> > so...
> >
> >> >> Is there a solution or this is broken protocol?
> >
> > ... I'd always assumed the protocol was impossible to implement 100%
> > correctly, though maybe there's some clever solution.
>
> Is such race still possible with the linux server implementation which is
single
> threaded and in my testing isn't processing LOCK/CANCEL out-of-order then?
>
> >> >> Should it be client's responsibility to notice that it received a
> >> >> LOCK reply for which it wasn't waiting and always follow up with an
> UNLOCK?
> >
> > That would be tricky and still not handle all cases, I think.
>
> Yes I'm not sure the client can do it. It will have no info about the lock
and not
> sure how to create an appropriate UNLOCK reply (unless we keep around
> lock info of all cancelled locks).

The NLM4_GRANTED call has a response where the client can respond
NLM4_DENIED to indicate it is not accepting the lock.

The Ganesha NFS server could process out of order, however, if the client
responds to the grant with NLM4_DENIED, Ganesha will properly release the
lock.

The imperfectness of NLM is why I have seen requests for tools to free
wedged locks... We haven't managed to write such a tool for Ganesha yet (so
either we don't actually have that many NLM clients, or folks are using
other mechanisms to deal with the issue - restarting the server and making
all the clients reclaim the locks they want is one way to do it...).

Frank


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


2017-12-07 19:17:24

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [nfsv4] questing about NLM LOCK, CANCEL UNLOCK

On Thu, Dec 7, 2017 at 1:56 PM, Frank Filz <[email protected]> wrote:
>> On Thu, Dec 7, 2017 at 11:58 AM, J. Bruce Fields <[email protected]>
>> wrote:
>> > On Thu, Dec 07, 2017 at 11:10:16AM -0500, Olga Kornievskaia wrote:
>> >> Ok. I thought that because RFC1813 covers NLM operations that it is.
>> >
>> > Yeah, I don't see a harm to the occasional NLM question on the v4
>> > working group list. It's a bit of an orphaned protocol, so there's
>> > not really any other implementation-independent forum.
>> >
>> >> I will extend this question to the Linux NFS mailing list as the
>> >> client implementation I'm interested is Linux.
>> >
>> > But that's fine too.
>> >
>> > I thought that LOCK/CANCEL race was one of the motivations for NFSv4,
>> > so...
>> >
>> >> >> Is there a solution or this is broken protocol?
>> >
>> > ... I'd always assumed the protocol was impossible to implement 100%
>> > correctly, though maybe there's some clever solution.
>>
>> Is such race still possible with the linux server implementation which is
> single
>> threaded and in my testing isn't processing LOCK/CANCEL out-of-order then?
>>
>> >> >> Should it be client's responsibility to notice that it received a
>> >> >> LOCK reply for which it wasn't waiting and always follow up with an
>> UNLOCK?
>> >
>> > That would be tricky and still not handle all cases, I think.
>>
>> Yes I'm not sure the client can do it. It will have no info about the lock
> and not
>> sure how to create an appropriate UNLOCK reply (unless we keep around
>> lock info of all cancelled locks).

> The NLM4_GRANTED call has a response where the client can respond
> NLM4_DENIED to indicate it is not accepting the lock.
>
> The Ganesha NFS server could process out of order, however, if the client
> responds to the grant with NLM4_DENIED, Ganesha will properly release the
> lock.

This is the case for when the server is doing a callback to the client
and granting the lock and client has ability to reply with denied. The
situation I'm curious about is just after the initial LOCK request is
sent and server hasn't processed it and it got&processed a
CANCEL/UNLOCK.

> The imperfectness of NLM is why I have seen requests for tools to free
> wedged locks... We haven't managed to write such a tool for Ganesha yet (so
> either we don't actually have that many NLM clients, or folks are using
> other mechanisms to deal with the issue - restarting the server and making
> all the clients reclaim the locks they want is one way to do it...).

Yes server-side lock breaking or rebooting is a way out this situation
in practice.
>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>

2017-12-07 19:59:14

by Frank Filz

[permalink] [raw]
Subject: RE: [nfsv4] questing about NLM LOCK, CANCEL UNLOCK

> On Thu, Dec 7, 2017 at 1:56 PM, Frank Filz <[email protected]> =
wrote:
> >> On Thu, Dec 7, 2017 at 11:58 AM, J. Bruce Fields
> >> <[email protected]>
> >> wrote:
> >> > On Thu, Dec 07, 2017 at 11:10:16AM -0500, Olga Kornievskaia =
wrote:
> >> >> Ok. I thought that because RFC1813 covers NLM operations that it =
is.
> >> >
> >> > Yeah, I don't see a harm to the occasional NLM question on the v4
> >> > working group list. It's a bit of an orphaned protocol, so =
there's
> >> > not really any other implementation-independent forum.
> >> >
> >> >> I will extend this question to the Linux NFS mailing list as the
> >> >> client implementation I'm interested is Linux.
> >> >
> >> > But that's fine too.
> >> >
> >> > I thought that LOCK/CANCEL race was one of the motivations for
> >> > NFSv4, so...
> >> >
> >> >> >> Is there a solution or this is broken protocol?
> >> >
> >> > ... I'd always assumed the protocol was impossible to implement
> >> > 100% correctly, though maybe there's some clever solution.
> >>
> >> Is such race still possible with the linux server implementation
> >> which is
> > single
> >> threaded and in my testing isn't processing LOCK/CANCEL =
out-of-order
> then?
> >>
> >> >> >> Should it be client's responsibility to notice that it =
received
> >> >> >> a LOCK reply for which it wasn't waiting and always follow up
> >> >> >> with an
> >> UNLOCK?
> >> >
> >> > That would be tricky and still not handle all cases, I think.
> >>
> >> Yes I'm not sure the client can do it. It will have no info about =
the
> >> lock
> > and not
> >> sure how to create an appropriate UNLOCK reply (unless we keep =
around
> >> lock info of all cancelled locks).
>=20
> > The NLM4_GRANTED call has a response where the client can respond
> > NLM4_DENIED to indicate it is not accepting the lock.
> >
> > The Ganesha NFS server could process out of order, however, if the
> > client responds to the grant with NLM4_DENIED, Ganesha will properly
> > release the lock.
>=20
> This is the case for when the server is doing a callback to the client =
and
> granting the lock and client has ability to reply with denied. The =
situation I'm
> curious about is just after the initial LOCK request is sent and =
server hasn't
> processed it and it got&processed a CANCEL/UNLOCK.

Oh, woops, yea... if the lock doesn't block, then the situation is =
different. When the client gets the LOCK response, it really should =
check and see if it was actually waiting for that response, or actually, =
the client should probably delay sending UNLOCK if it has not got a =
response to an outstanding lock request yet. CANCEL is only for =
cancelling a blocked lock, so shouldn't be sent unless the server has =
already responded to the LOCK with NLM4_BLOCKED, in which case there is =
no ordering issue (Ganesha should handle ordering issue between =
NLM4_GRANT call and NLM4_CANCEL).

If we want NLM4_CANCEL to be able to cancel an inflight LOCK request, =
then we would need to add semantics which would include the server =
either responding to the cancel to let the client know it hasn't =
processed the LOCK request yet, or it should somehow remember the cancel =
so it can ignore the LOCK request when it processes it (which is =
probably too much for the server to deal with).

> > The imperfectness of NLM is why I have seen requests for tools to =
free
> > wedged locks... We haven't managed to write such a tool for Ganesha
> > yet (so either we don't actually have that many NLM clients, or =
folks
> > are using other mechanisms to deal with the issue - restarting the
> > server and making all the clients reclaim the locks they want is one =
way to
> do it...).
>=20
> Yes server-side lock breaking or rebooting is a way out this situation =
in
> practice.
> >
> > Frank
> >
> >
> > ---
> > This email has been checked for viruses by Avast antivirus software.
> > https://www.avast.com/antivirus
> >


2017-12-13 18:39:23

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [nfsv4] questing about NLM LOCK, CANCEL UNLOCK

On Thu, Dec 7, 2017 at 2:59 PM, Frank Filz <[email protected]> wrote:
>> On Thu, Dec 7, 2017 at 1:56 PM, Frank Filz <[email protected]> wro=
te:
>> >> On Thu, Dec 7, 2017 at 11:58 AM, J. Bruce Fields
>> >> <[email protected]>
>> >> wrote:
>> >> > On Thu, Dec 07, 2017 at 11:10:16AM -0500, Olga Kornievskaia wrote:
>> >> >> Ok. I thought that because RFC1813 covers NLM operations that it i=
s.
>> >> >
>> >> > Yeah, I don't see a harm to the occasional NLM question on the v4
>> >> > working group list. It's a bit of an orphaned protocol, so there's
>> >> > not really any other implementation-independent forum.
>> >> >
>> >> >> I will extend this question to the Linux NFS mailing list as the
>> >> >> client implementation I'm interested is Linux.
>> >> >
>> >> > But that's fine too.
>> >> >
>> >> > I thought that LOCK/CANCEL race was one of the motivations for
>> >> > NFSv4, so...
>> >> >
>> >> >> >> Is there a solution or this is broken protocol?
>> >> >
>> >> > ... I'd always assumed the protocol was impossible to implement
>> >> > 100% correctly, though maybe there's some clever solution.
>> >>
>> >> Is such race still possible with the linux server implementation
>> >> which is
>> > single
>> >> threaded and in my testing isn't processing LOCK/CANCEL out-of-order
>> then?
>> >>
>> >> >> >> Should it be client's responsibility to notice that it received
>> >> >> >> a LOCK reply for which it wasn't waiting and always follow up
>> >> >> >> with an
>> >> UNLOCK?
>> >> >
>> >> > That would be tricky and still not handle all cases, I think.
>> >>
>> >> Yes I'm not sure the client can do it. It will have no info about the
>> >> lock
>> > and not
>> >> sure how to create an appropriate UNLOCK reply (unless we keep around
>> >> lock info of all cancelled locks).
>>
>> > The NLM4_GRANTED call has a response where the client can respond
>> > NLM4_DENIED to indicate it is not accepting the lock.
>> >
>> > The Ganesha NFS server could process out of order, however, if the
>> > client responds to the grant with NLM4_DENIED, Ganesha will properly
>> > release the lock.
>>
>> This is the case for when the server is doing a callback to the client a=
nd
>> granting the lock and client has ability to reply with denied. The situa=
tion I'm
>> curious about is just after the initial LOCK request is sent and server =
hasn't
>> processed it and it got&processed a CANCEL/UNLOCK.
>
> Oh, woops, yea... if the lock doesn't block, then the situation is differ=
ent. When the client gets the LOCK response, it really should check and see=
if it was actually waiting for that response, or actually, the client shou=
ld probably delay sending UNLOCK if it has not got a response to an outstan=
ding lock request yet.

Current linux client does not do that.

> CANCEL is only for cancelling a blocked lock, so shouldn't be sent unless=
the server has already responded to the LOCK with NLM4_BLOCKED, in which c=
ase there is no ordering issue (Ganesha should handle ordering issue betwee=
n NLM4_GRANT call and NLM4_CANCEL).

Current linux client will send a CANCEL even if it didn't receive the
reply from the server.

> If we want NLM4_CANCEL to be able to cancel an inflight LOCK request, the=
n we would need to add semantics which would include the server either resp=
onding to the cancel to let the client know it hasn't processed the LOCK re=
quest yet, or it should somehow remember the cancel so it can ignore the LO=
CK request when it processes it (which is probably too much for the server =
to deal with).

Sounds like Ganesha server (and Netapp) will have orphan locks that
would require manual lock break or server reboot.

>> > The imperfectness of NLM is why I have seen requests for tools to free
>> > wedged locks... We haven't managed to write such a tool for Ganesha
>> > yet (so either we don't actually have that many NLM clients, or folks
>> > are using other mechanisms to deal with the issue - restarting the
>> > server and making all the clients reclaim the locks they want is one w=
ay to
>> do it...).
>>
>> Yes server-side lock breaking or rebooting is a way out this situation i=
n
>> practice.
>> >
>> > Frank
>> >
>> >
>> > ---
>> > This email has been checked for viruses by Avast antivirus software.
>> > https://www.avast.com/antivirus
>> >
>