2022-07-20 18:50:25

by Jan Kasiak

[permalink] [raw]
Subject: NLM 4 Infinite Loop Bug

Hi all,

I'm writing my own NFS client, and while trying to test it, I've come
across a way to get the lockd thread into an infinite loop and stop
accepting any new requests.

Kernel Version: Linux ubuntu-jammy 5.15.0-41-generic

The client is a python program, and it does not run rpcbind, NLM, etc...

I issue an NM_LOCK (procedure 22) request with block set to false, and
get a GRANTED reply.

I then issue a FREE_ALL (procedure 23) request, and the lockd thread
gets stuck in nlm_traverse_locks - it matches the host, calls
nlm_unlock_files, and then jumps to the again label, and repeats this
loop forever.

It's not clear to me who is supposed to unset the host from the lock?
Any pointers as to why there is a jump to again?

Thanks,
-Jan


2022-07-20 20:10:13

by Jan Kasiak

[permalink] [raw]
Subject: Re: NLM 4 Infinite Loop Bug

Applying two commits from the Linux master branch seems to have fixed
the problem:

aec158242b87a43d83322e99bc71ab4428e5ab79
1197eb5906a5464dbaea24cac296dfc38499cc00

-Jan

On Wed, Jul 20, 2022 at 2:46 PM Jan Kasiak <[email protected]> wrote:
>
> Hi all,
>
> I'm writing my own NFS client, and while trying to test it, I've come
> across a way to get the lockd thread into an infinite loop and stop
> accepting any new requests.
>
> Kernel Version: Linux ubuntu-jammy 5.15.0-41-generic
>
> The client is a python program, and it does not run rpcbind, NLM, etc...
>
> I issue an NM_LOCK (procedure 22) request with block set to false, and
> get a GRANTED reply.
>
> I then issue a FREE_ALL (procedure 23) request, and the lockd thread
> gets stuck in nlm_traverse_locks - it matches the host, calls
> nlm_unlock_files, and then jumps to the again label, and repeats this
> loop forever.
>
> It's not clear to me who is supposed to unset the host from the lock?
> Any pointers as to why there is a jump to again?
>
> Thanks,
> -Jan

2022-07-26 17:25:28

by Jan Kasiak

[permalink] [raw]
Subject: Re: NLM 4 Infinite Loop Bug

Hi all,

Even after applying the above two patches, I have discovered a new set
of NLM 4 requests that break lockd.

Unfortunately, I don't have enough experience to suggest a fix, but
would be glad to test anyone's attempt.

All requests are non-blocking.

Scenario A
=========
lock(offset=UINT64_MAX, len=100) - GRANTED
free_all() - never finishes and lockd thread is stuck busy looping

Scenario B
========
lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED

test(svid=2, offset=UINT64_MAX, len=50) - DENIED
correct, holder offset, len are (UINT64_MAX, 100)

test(svid=2, offset=75, len=10) - DENIED
wrong, because holder (offset, len) are wrong (UINT64_MAX, 100),
because the above
lock overflows during comparison to (49, 50)

Scenario C
========
lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED

test(svid=2, offset=UINT64_MAX, len=50) - DENIED
correct, holder offset, len are (UINT64_MAX, 100)

unlock(svid=1, offset=UINT64_MAX, len=50) - GRANTED
weird, because it has now created a lock at (offset=UINT64_MAX + 50, len=50)
not sure what the correct behavior should be here - FBIG error?

test(svid=2, offset=75, len=10) - DENIED
wrong, because holder offset, len are wrong (49, 50), because the above
unlock has overflowed the offset

-Jan

On Wed, Jul 20, 2022 at 4:01 PM Jan Kasiak <[email protected]> wrote:
>
> Applying two commits from the Linux master branch seems to have fixed
> the problem:
>
> aec158242b87a43d83322e99bc71ab4428e5ab79
> 1197eb5906a5464dbaea24cac296dfc38499cc00
>
> -Jan
>
> On Wed, Jul 20, 2022 at 2:46 PM Jan Kasiak <[email protected]> wrote:
> >
> > Hi all,
> >
> > I'm writing my own NFS client, and while trying to test it, I've come
> > across a way to get the lockd thread into an infinite loop and stop
> > accepting any new requests.
> >
> > Kernel Version: Linux ubuntu-jammy 5.15.0-41-generic
> >
> > The client is a python program, and it does not run rpcbind, NLM, etc...
> >
> > I issue an NM_LOCK (procedure 22) request with block set to false, and
> > get a GRANTED reply.
> >
> > I then issue a FREE_ALL (procedure 23) request, and the lockd thread
> > gets stuck in nlm_traverse_locks - it matches the host, calls
> > nlm_unlock_files, and then jumps to the again label, and repeats this
> > loop forever.
> >
> > It's not clear to me who is supposed to unset the host from the lock?
> > Any pointers as to why there is a jump to again?
> >
> > Thanks,
> > -Jan

2022-07-26 17:38:15

by Chuck Lever

[permalink] [raw]
Subject: Re: NLM 4 Infinite Loop Bug

Hello Jan-

> On Jul 26, 2022, at 1:16 PM, Jan Kasiak <[email protected]> wrote:
>
> Hi all,
>
> Even after applying the above two patches, I have discovered a new set
> of NLM 4 requests that break lockd.
>
> Unfortunately, I don't have enough experience to suggest a fix, but
> would be glad to test anyone's attempt.
>
> All requests are non-blocking.
>
> Scenario A
> =========
> lock(offset=UINT64_MAX, len=100) - GRANTED
> free_all() - never finishes and lockd thread is stuck busy looping
>
> Scenario B
> ========
> lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED
>
> test(svid=2, offset=UINT64_MAX, len=50) - DENIED
> correct, holder offset, len are (UINT64_MAX, 100)
>
> test(svid=2, offset=75, len=10) - DENIED
> wrong, because holder (offset, len) are wrong (UINT64_MAX, 100),
> because the above
> lock overflows during comparison to (49, 50)
>
> Scenario C
> ========
> lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED
>
> test(svid=2, offset=UINT64_MAX, len=50) - DENIED
> correct, holder offset, len are (UINT64_MAX, 100)
>
> unlock(svid=1, offset=UINT64_MAX, len=50) - GRANTED
> weird, because it has now created a lock at (offset=UINT64_MAX + 50, len=50)
> not sure what the correct behavior should be here - FBIG error?
>
> test(svid=2, offset=75, len=10) - DENIED
> wrong, because holder offset, len are wrong (49, 50), because the above
> unlock has overflowed the offset

Thanks for testing.

May I ask that you file these as three separate bugs here:

https://bugzilla.linux-nfs.org/



--
Chuck Lever



2022-07-27 18:33:43

by Jan Kasiak

[permalink] [raw]
Subject: Re: NLM 4 Infinite Loop Bug

Hi Chuck,

I created 3 bugs:

https://bugzilla.linux-nfs.org/show_bug.cgi?id=390
For the original issue I reported with FREE_ALL
because I'm not sure if its fully fixed.

https://bugzilla.linux-nfs.org/show_bug.cgi?id=391
For scenario A

https://bugzilla.linux-nfs.org/show_bug.cgi?id=392
For scenario B/C because they are very similar

Thanks,
-Jan

On Tue, Jul 26, 2022 at 1:34 PM Chuck Lever III <[email protected]> wrote:
>
> Hello Jan-
>
> > On Jul 26, 2022, at 1:16 PM, Jan Kasiak <[email protected]> wrote:
> >
> > Hi all,
> >
> > Even after applying the above two patches, I have discovered a new set
> > of NLM 4 requests that break lockd.
> >
> > Unfortunately, I don't have enough experience to suggest a fix, but
> > would be glad to test anyone's attempt.
> >
> > All requests are non-blocking.
> >
> > Scenario A
> > =========
> > lock(offset=UINT64_MAX, len=100) - GRANTED
> > free_all() - never finishes and lockd thread is stuck busy looping
> >
> > Scenario B
> > ========
> > lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED
> >
> > test(svid=2, offset=UINT64_MAX, len=50) - DENIED
> > correct, holder offset, len are (UINT64_MAX, 100)
> >
> > test(svid=2, offset=75, len=10) - DENIED
> > wrong, because holder (offset, len) are wrong (UINT64_MAX, 100),
> > because the above
> > lock overflows during comparison to (49, 50)
> >
> > Scenario C
> > ========
> > lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED
> >
> > test(svid=2, offset=UINT64_MAX, len=50) - DENIED
> > correct, holder offset, len are (UINT64_MAX, 100)
> >
> > unlock(svid=1, offset=UINT64_MAX, len=50) - GRANTED
> > weird, because it has now created a lock at (offset=UINT64_MAX + 50, len=50)
> > not sure what the correct behavior should be here - FBIG error?
> >
> > test(svid=2, offset=75, len=10) - DENIED
> > wrong, because holder offset, len are wrong (49, 50), because the above
> > unlock has overflowed the offset
>
> Thanks for testing.
>
> May I ask that you file these as three separate bugs here:
>
> https://bugzilla.linux-nfs.org/
>
>
>
> --
> Chuck Lever
>
>
>

2022-07-27 18:34:39

by Chuck Lever

[permalink] [raw]
Subject: Re: NLM 4 Infinite Loop Bug



> On Jul 27, 2022, at 1:20 PM, Jan Kasiak <[email protected]> wrote:
>
> Hi Chuck,
>
> I created 3 bugs:
>
> https://bugzilla.linux-nfs.org/show_bug.cgi?id=390
> For the original issue I reported with FREE_ALL
> because I'm not sure if its fully fixed.
>
> https://bugzilla.linux-nfs.org/show_bug.cgi?id=391
> For scenario A
>
> https://bugzilla.linux-nfs.org/show_bug.cgi?id=392
> For scenario B/C because they are very similar

Thanks. Jeff and I will have a look at these soon.


> Thanks,
> -Jan
>
> On Tue, Jul 26, 2022 at 1:34 PM Chuck Lever III <[email protected]> wrote:
>>
>> Hello Jan-
>>
>>> On Jul 26, 2022, at 1:16 PM, Jan Kasiak <[email protected]> wrote:
>>>
>>> Hi all,
>>>
>>> Even after applying the above two patches, I have discovered a new set
>>> of NLM 4 requests that break lockd.
>>>
>>> Unfortunately, I don't have enough experience to suggest a fix, but
>>> would be glad to test anyone's attempt.
>>>
>>> All requests are non-blocking.
>>>
>>> Scenario A
>>> =========
>>> lock(offset=UINT64_MAX, len=100) - GRANTED
>>> free_all() - never finishes and lockd thread is stuck busy looping
>>>
>>> Scenario B
>>> ========
>>> lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED
>>>
>>> test(svid=2, offset=UINT64_MAX, len=50) - DENIED
>>> correct, holder offset, len are (UINT64_MAX, 100)
>>>
>>> test(svid=2, offset=75, len=10) - DENIED
>>> wrong, because holder (offset, len) are wrong (UINT64_MAX, 100),
>>> because the above
>>> lock overflows during comparison to (49, 50)
>>>
>>> Scenario C
>>> ========
>>> lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED
>>>
>>> test(svid=2, offset=UINT64_MAX, len=50) - DENIED
>>> correct, holder offset, len are (UINT64_MAX, 100)
>>>
>>> unlock(svid=1, offset=UINT64_MAX, len=50) - GRANTED
>>> weird, because it has now created a lock at (offset=UINT64_MAX + 50, len=50)
>>> not sure what the correct behavior should be here - FBIG error?
>>>
>>> test(svid=2, offset=75, len=10) - DENIED
>>> wrong, because holder offset, len are wrong (49, 50), because the above
>>> unlock has overflowed the offset
>>
>> Thanks for testing.
>>
>> May I ask that you file these as three separate bugs here:
>>
>> https://bugzilla.linux-nfs.org/
>>
>>
>>
>> --
>> Chuck Lever
>>
>>
>>

--
Chuck Lever