2022-02-21 06:12:11

by Kurt Garloff

[permalink] [raw]
Subject: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

Hi Olga,

your upstream commit 1976b2b3, applied to 5.15.24 as 6f283634 breaks NFS
for me.

This is while mounting many NFS filesystems from two NFS servers, one
Qnap (nfs v4.1) and one linux 5.15.16 knfsd (nfs v4.2).

The NFS mounts just would not succeed. This appears to happen to all
Qnap mounts and one of the mounts from the linux knfsd.

I did some bisecting in 5.15.24 ... reverting 6f283634 and subsequent
NFS/sunRPC patches from you and Xiyu, Anna did the trick to recover from
this failure.
To be precise: I reverted 4403233b 4b22aa42 5ca123c9 c5ae18fa be67be6a
6f283634 2df6a47a 0c5d3bfb 3cb5b317 58967a23 bbf647ec and 38ae9387 in
5.15.24. I started reenabling and 2df6aa647a is the last patch that
still results a working NFS for me.

Looking at the culprit patch, I could not immediately see what's wrong
-- so I'll leave it to you. I guess the server does not return
fs_locations in the way it's expected and thus the NFS mount hangs.

I seem not to be the only one, see
https://bbs.archlinux.org/viewtopic.php?pid=2022938
https://bugs.archlinux.org/task/73860

HTH,

--
Kurt Garloff <[email protected]>


2022-02-21 06:13:32

by Kornievskaia, Olga

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)



On 2/20/22, 6:17 PM, "Kurt Garloff" <[email protected]> wrote:

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




Hi Olga,

two updates:

On 20.02.22 23:26, Kurt Garloff wrote:
> Hi Olga,
>
> your upstream commit 1976b2b3, applied to 5.15.24 as 6f283634 breaks NFS
> for me.
>
> This is while mounting many NFS filesystems from two NFS servers, one
> Qnap (nfs v4.1) and one linux 5.15.16 knfsd (nfs v4.2).
I have to correct myself. All volumes broken by 5.15.24 come from Qnap.
> The NFS mounts just would not succeed. This appears to happen to all
> Qnap mounts and one of the mounts from the linux knfsd.
This mount also cam from Qnap -- in my mind I had migrated it already,
but not in reality :-O
> I did some bisecting in 5.15.24 ... reverting 6f283634 and subsequent
> NFS/sunRPC patches from you and Xiyu, Anna did the trick to recover from
> this failure.
> To be precise: I reverted 4403233b 4b22aa42 5ca123c9 c5ae18fa be67be6a
> 6f283634 2df6a47a 0c5d3bfb 3cb5b317 58967a23 bbf647ec and 38ae9387 in
> 5.15.24. I started reenabling and 2df6aa647a is the last patch that
> still results a working NFS for me.

Also, taking plain 5.15.24 and just reverting 6f283634 creates a
kernel that works well with Qnap NFS shares.

Is it possible for you to provide a network trace?

Best,

--
Kurt Garloff <[email protected]>


2022-02-21 09:55:12

by Kurt Garloff

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

Hi Olga,

two updates:

On 20.02.22 23:26, Kurt Garloff wrote:
> Hi Olga,
>
> your upstream commit 1976b2b3, applied to 5.15.24 as 6f283634 breaks NFS
> for me.
>
> This is while mounting many NFS filesystems from two NFS servers, one
> Qnap (nfs v4.1) and one linux 5.15.16 knfsd (nfs v4.2).
I have to correct myself. All volumes broken by 5.15.24 come from Qnap.
> The NFS mounts just would not succeed. This appears to happen to all
> Qnap mounts and one of the mounts from the linux knfsd.
This mount also cam from Qnap -- in my mind I had migrated it already,
but not in reality :-O
> I did some bisecting in 5.15.24 ... reverting 6f283634 and subsequent
> NFS/sunRPC patches from you and Xiyu, Anna did the trick to recover from
> this failure.
> To be precise: I reverted 4403233b 4b22aa42 5ca123c9 c5ae18fa be67be6a
> 6f283634 2df6a47a 0c5d3bfb 3cb5b317 58967a23 bbf647ec and 38ae9387 in
> 5.15.24. I started reenabling and 2df6aa647a is the last patch that
> still results a working NFS for me.

Also, taking plain 5.15.24 and just reverting 6f283634 creates a
kernel that works well with Qnap NFS shares.

Best,

--
Kurt Garloff <[email protected]>

2022-02-21 13:54:24

by Kurt Garloff

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

Hi Olga,

On 21.02.22 02:19, Kornievskaia, Olga wrote:
> On 2/20/22, 6:17 PM, "Kurt Garloff" <[email protected]> wrote:
>
> Hi Olga,
>
> two updates:
>
> On 20.02.22 23:26, Kurt Garloff wrote:
> > Hi Olga,
> >
> > your upstream commit 1976b2b3, applied to 5.15.24 as 6f283634 breaks NFS
> > for me.
> >
> > This is while mounting many NFS filesystems from two NFS servers, one
> > Qnap (nfs v4.1) and one linux 5.15.16 knfsd (nfs v4.2).
> I have to correct myself. All volumes broken by 5.15.24 come from Qnap.
> > The NFS mounts just would not succeed. This appears to happen to all
> > Qnap mounts and one of the mounts from the linux knfsd.
> This mount also cam from Qnap -- in my mind I had migrated it already,
> but not in reality :-O
> > I did some bisecting in 5.15.24 ... reverting 6f283634 and subsequent
> > NFS/sunRPC patches from you and Xiyu, Anna did the trick to recover from
> > this failure.
> > To be precise: I reverted 4403233b 4b22aa42 5ca123c9 c5ae18fa be67be6a
> > 6f283634 2df6a47a 0c5d3bfb 3cb5b317 58967a23 bbf647ec and 38ae9387 in
> > 5.15.24. I started reenabling and 2df6aa647a is the last patch that
> > still results a working NFS for me.
>
> Also, taking plain 5.15.24 and just reverting 6f283634 creates a
> kernel that works well with Qnap NFS shares.
>
> Is it possible for you to provide a network trace?

Yes.

Is tcpdump what you'd like to see? wireshark's dumpcap?
Any NFS specific tracing tools I should be using?

One trace with a working kernel and one with the broken one?

Best,

--
Kurt Garloff <[email protected]>
Cologne, Germany

2022-02-21 18:45:33

by Kurt Garloff

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

Hi,

On 21.02.22 10:31, Kurt Garloff wrote:
> Hi Olga,
>
> On 21.02.22 02:19, Kornievskaia, Olga wrote:
>> [...]
>> Is it possible for you to provide a network trace?
>
> Yes.
>
> Is tcpdump what you'd like to see? wireshark's dumpcap?
> Any NFS specific tracing tools I should be using?
>
> One trace with a working kernel and one with the broken one?

Comparing the good and the bad trace ...

mount -t nfs 192.168.155.74:/Public /mnt/Public
against Qnap 4.3.4.xxx NFS v4.1 server.

Both do:

Establish conn
NFS NULL (ack)
NFS EXCHANGE_ID (4.2 -> NFS4ERR_MINOR_VERS_MISMATCH)
Teardown and reestablish
NFS NULL (ack)
NFS EXCAHNGE_ID (4.1 -> ack)
NFS EXCAHNGE_ID (4.1 -> ack)
NFS CREATE_SESSION (ack)
NFS RECLAIM_COMPLETE (CB_NULL, ack)
NFS_SECINFO_NO_NAME (ack)
NFS PUTROOTFH|GETATTR (ack)
NFS GETATTR FH:0x62d40c52 (ack), 8 times
NFS ACCESS FH_ -x62d40c52 (denied md xt dl, alllowed rd lu)
NFS LOOKUP DH:0x62d40c52/Public (ack)
NFS LOOKUP DH:0x62d40c52/Public (ack)
NFS GETATTR FH:0x8ee88cee (ack), 3 times


Now the differences start:

The fixed NFS client repeatedly gets ack back, the broken NFS client gets

NFS GETATTR FH:0x8ee88cee (NFS4ERR_DELAY), repeating forever (exp. backoff)


If someone else wants to look at the pcapng data, let me know.

HTH,

--
Kurt Garloff <[email protected]>
Cologne, Germany

2022-02-23 13:20:04

by Kurt Garloff

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

Hi Olga,

any updates? Were you able to investigate the traces?

Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
though Qnap might have patched it),is not something that
should happen with a -stable kernel update, even if the problem
would be on the Qnap side, which would not be completely
surprising.

So I think we should revert this patch at least for -stable,
unless we understand what's going on and have a better fix
than a plain revert.

Best,
--

Kurt Garloff <[email protected]>

2022-02-23 22:37:16

by Kurt Garloff

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

Hi Olga,

On 23/02/2022 18:56, Olga Kornievskaia wrote:
> On Wed, Feb 23, 2022 at 8:20 AM Kurt Garloff <[email protected]> wrote:
>> Hi Olga,
>>
>> any updates? Were you able to investigate the traces?
>>
>> Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
>> though Qnap might have patched it),is not something that
>> should happen with a -stable kernel update, even if the problem
>> would be on the Qnap side, which would not be completely
>> surprising.
>>
>> So I think we should revert this patch at least for -stable,
>> unless we understand what's going on and have a better fix
>> than a plain revert.
> I haven't commented on your ask of requesting a revert in the stable
> version. I'm not sure what the philosophy there. I don't see why we
> can't ask for this feature to only be available from the kernel
> version it has been accepted into and not before. If you think the
> kernel version that you want to use will always be before this feature
> was accepted, then asking folks responsible for "stable" kernels seems
> like a good idea. At the time of inclusion to stable, I wasn't aware
> of the broken legacy server implementations out there.

I guess Greg would need to comment on the detailed policies
for stable kernels.
One of the goals for sure is to avoid regressions. If that causes
bugs not to be fixable or features not to be available, than that's
a price that might need to be accepted. A regression is just many many
times worse than an unfixed issue, twice so for something that claims
to be stable.

So, if we are relatively sure that no NFSv4.2 server has the
kernel-3.4.6-knfsd Qnap (NFSv4.1) misbehavior, my change that masks the
new features for NFS<v4.2 might be what makes this patch acceptable
for stable. Otherwise, we should either revert it or make it
opt-in. The latter is not really a good idea if we then differ
from the main branch where we might go for an opt-out solution.
So maybe it's opt-out for main branch and for stable with an
additional guard against NFS<v4.2 at least for -stable.

Just my 0.02€.

--
Kurt Garloff <[email protected]>
Cologne, Germany


2022-02-23 23:40:49

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

On Wed, Feb 23, 2022 at 8:20 AM Kurt Garloff <[email protected]> wrote:
>
> Hi Olga,
>
> any updates? Were you able to investigate the traces?
>
> Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
> though Qnap might have patched it),is not something that
> should happen with a -stable kernel update, even if the problem
> would be on the Qnap side, which would not be completely
> surprising.
>
> So I think we should revert this patch at least for -stable,
> unless we understand what's going on and have a better fix
> than a plain revert.

I haven't commented on your ask of requesting a revert in the stable
version. I'm not sure what the philosophy there. I don't see why we
can't ask for this feature to only be available from the kernel
version it has been accepted into and not before. If you think the
kernel version that you want to use will always be before this feature
was accepted, then asking folks responsible for "stable" kernels seems
like a good idea. At the time of inclusion to stable, I wasn't aware
of the broken legacy server implementations out there.

>
> Best,
> --
>
> Kurt Garloff <[email protected]>
>

2022-02-23 23:45:28

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

On Wed, Feb 23, 2022 at 8:20 AM Kurt Garloff <[email protected]> wrote:
>
> Hi Olga,
>
> any updates? Were you able to investigate the traces?
>
> Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
> though Qnap might have patched it),is not something that
> should happen with a -stable kernel update, even if the problem
> would be on the Qnap side, which would not be completely
> surprising.
>
> So I think we should revert this patch at least for -stable,
> unless we understand what's going on and have a better fix
> than a plain revert.

Hi Kurt,

I apologize for the late response. I have looked at the network trace.
The problem stems from the broken server that claims to support
fs_locations but then decides to never reply to the query.

I can implement a mount option to say fs_locquery=off to handle mounts
against the broken servers?

However I would like to ask if the better path forward isn't to update
to the knfsd where the problem is fixed?

>
> Best,
> --
>
> Kurt Garloff <[email protected]>
>

2022-02-24 01:00:49

by Trond Myklebust

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

On Wed, 2022-02-23 at 18:06 +0000, Chuck Lever III wrote:
>
>
> > On Feb 21, 2022, at 5:48 AM, Kurt Garloff <[email protected]> wrote:
> >
> > Hi,
> >
> > On 21.02.22 10:31, Kurt Garloff wrote:
> > > Hi Olga,
> > >
> > > On 21.02.22 02:19, Kornievskaia, Olga wrote:
> > > > [...]
> > > > Is it possible for you to provide a network trace?
> > >
> > > Yes.
> > >
> > > Is tcpdump what you'd like to see? wireshark's dumpcap?
> > > Any NFS specific tracing tools I should be using?
> > >
> > > One trace with a working kernel and one with the broken one?
> >
> > Comparing the good and the bad trace ...
> >
> > mount -t nfs 192.168.155.74:/Public /mnt/Public
> > against Qnap 4.3.4.xxx NFS v4.1 server.
> >
> > Both do:
> >
> > Establish conn
> > NFS NULL (ack)
> > NFS EXCHANGE_ID (4.2 -> NFS4ERR_MINOR_VERS_MISMATCH)
> > Teardown and reestablish
> > NFS NULL (ack)
> > NFS EXCAHNGE_ID (4.1 -> ack)
> > NFS EXCAHNGE_ID (4.1 -> ack)
> > NFS CREATE_SESSION (ack)
> > NFS RECLAIM_COMPLETE (CB_NULL, ack)
> > NFS_SECINFO_NO_NAME (ack)
> > NFS PUTROOTFH|GETATTR (ack)
> > NFS GETATTR FH:0x62d40c52 (ack), 8 times
> > NFS ACCESS FH_ -x62d40c52 (denied md xt dl, alllowed rd lu)
> > NFS LOOKUP DH:0x62d40c52/Public (ack)
> > NFS LOOKUP DH:0x62d40c52/Public (ack)
> > NFS GETATTR FH:0x8ee88cee (ack), 3 times
> >
> >
> > Now the differences start:
> >
> > The fixed NFS client repeatedly gets ack back, the broken NFS
> > client gets
> >
> > NFS GETATTR FH:0x8ee88cee (NFS4ERR_DELAY), repeating forever (exp.
> > backoff)
>
> Any idea why the server is not able to respond properly to
> the GETATTR request? That seems like the root of the problem.
>

The GETATTR is a request for fs_locations in order to probe for
alternative IP addresses.

IIRC, some earlier implementations of knfsd had this response when the
mountd daemon wasn't configured to expect a referral upcall for that
location.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-02-24 01:26:26

by Kurt Garloff

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

Hi Olga,

On 23/02/2022 18:49, Olga Kornievskaia wrote:
> I have posted a patch where you can mount with "notrunkdiscovery" and
> that should fix the problem with the Qnap server?

I have not seen it, unfortunately,y

Care to copy me?

You have seen my patch that limits the
FS_LOCATIONS capability to NFS >= v4.2 and
I found this to be effective in making things
work again. Assuming that you check for
the mount parameter instead of the NFS version
to disable this feature, I would assume the
option to be effective. I'm happy to test
as soon as I get hold of the patch.

Thanks,

--
Kurt Garloff <[email protected]>
Cologne, Germany

2022-02-24 01:35:37

by Kurt Garloff

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

Hi Olga,

thanks for coming back!

On 23.02.22 15:22, Olga Kornievskaia wrote:
> Hi Kurt,
> I apologize for the late response. I have looked at the network trace.
> The problem stems from the broken server that claims to support
> fs_locations but then decides to never reply to the query.
>
> I can implement a mount option to say fs_locquery=off to handle mounts
> against the broken servers?
>
> However I would like to ask if the better path forward isn't to update
> to the knfsd where the problem is fixed?

Well, I have ran self-compiled kernels on Qnap appliances before (to
work around Qnap's ext4 breakage when doing the case-independent
name lookup), but it was a painful and cumbersome process and I don't
want to repeat it. Appliances are not meant to use with custom
kernels.
Even if I do: This does not help many many other users ... Unless we
convince Qnap to provide patches for old appliances, we'll experience
breakage.

On my end, I have applied the attached patch, restricting the use
of FS_LOCATIONS to servers that advertize NFS v4.2 or later.

In the patch, you'll also see clearing the bit before it gets set.
This was spotted by seth, see
https://bbs.archlinux.org/viewtopic.php?pid=2023983#p2023983
In latest upstream kernels you'd also need to clear
NFS_CAP_CASE_PRESERVING | NFS_CAP_CASE_INSENSITIVE
so I wonder whether we should not just nullify the caps
bit field prior to testing and selectively setting flags.

With this patch, I can mount NFS volumes from Qnap knfsd
again without any special workarounds (such as nfsver=3 or the
to-be-implemented setting that you suggest). I have no idea
whether or not we leave a lot features behind by restricting
FS_LOCATIONS on the client side to servers >= NFS v4.2.
But certainly better than breaking in a -stable kernel update,
even if the server might be to blame.

Best,

--
Kurt Garloff <[email protected]>
Cologne, Germany


Attachments:
nfs-restrict-fs-loc-to-nfs42.diff (1.29 kB)

2022-02-24 01:37:21

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

On Wed, Feb 23, 2022 at 12:31 PM Kurt Garloff <[email protected]> wrote:
>
> Hi Olga,
>
> thanks for coming back!
>
> On 23.02.22 15:22, Olga Kornievskaia wrote:
> > Hi Kurt,
> > I apologize for the late response. I have looked at the network trace.
> > The problem stems from the broken server that claims to support
> > fs_locations but then decides to never reply to the query.
> >
> > I can implement a mount option to say fs_locquery=off to handle mounts
> > against the broken servers?
> >

I have posted a patch where you can mount with "notrunkdiscovery" and
that should fix the problem with the Qnap server?

> > However I would like to ask if the better path forward isn't to update
> > to the knfsd where the problem is fixed?
>
> Well, I have ran self-compiled kernels on Qnap appliances before (to
> work around Qnap's ext4 breakage when doing the case-independent
> name lookup), but it was a painful and cumbersome process and I don't
> want to repeat it. Appliances are not meant to use with custom
> kernels.
> Even if I do: This does not help many many other users ... Unless we
> convince Qnap to provide patches for old appliances, we'll experience
> breakage.
>
> On my end, I have applied the attached patch, restricting the use
> of FS_LOCATIONS to servers that advertize NFS v4.2 or later.
>
> In the patch, you'll also see clearing the bit before it gets set.
> This was spotted by seth, see
> https://bbs.archlinux.org/viewtopic.php?pid=2023983#p2023983
> In latest upstream kernels you'd also need to clear
> NFS_CAP_CASE_PRESERVING | NFS_CAP_CASE_INSENSITIVE
> so I wonder whether we should not just nullify the caps
> bit field prior to testing and selectively setting flags.
>
> With this patch, I can mount NFS volumes from Qnap knfsd
> again without any special workarounds (such as nfsver=3 or the
> to-be-implemented setting that you suggest). I have no idea
> whether or not we leave a lot features behind by restricting
> FS_LOCATIONS on the client side to servers >= NFS v4.2.
> But certainly better than breaking in a -stable kernel update,
> even if the server might be to blame.
>
> Best,
>
> --
> Kurt Garloff <[email protected]>
> Cologne, Germany

2022-02-24 01:50:19

by Chuck Lever III

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)



> On Feb 23, 2022, at 5:00 PM, Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2022-02-23 at 18:06 +0000, Chuck Lever III wrote:
>>
>>
>>> On Feb 21, 2022, at 5:48 AM, Kurt Garloff <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> On 21.02.22 10:31, Kurt Garloff wrote:
>>>> Hi Olga,
>>>>
>>>> On 21.02.22 02:19, Kornievskaia, Olga wrote:
>>>>> [...]
>>>>> Is it possible for you to provide a network trace?
>>>>
>>>> Yes.
>>>>
>>>> Is tcpdump what you'd like to see? wireshark's dumpcap?
>>>> Any NFS specific tracing tools I should be using?
>>>>
>>>> One trace with a working kernel and one with the broken one?
>>>
>>> Comparing the good and the bad trace ...
>>>
>>> mount -t nfs 192.168.155.74:/Public /mnt/Public
>>> against Qnap 4.3.4.xxx NFS v4.1 server.
>>>
>>> Both do:
>>>
>>> Establish conn
>>> NFS NULL (ack)
>>> NFS EXCHANGE_ID (4.2 -> NFS4ERR_MINOR_VERS_MISMATCH)
>>> Teardown and reestablish
>>> NFS NULL (ack)
>>> NFS EXCAHNGE_ID (4.1 -> ack)
>>> NFS EXCAHNGE_ID (4.1 -> ack)
>>> NFS CREATE_SESSION (ack)
>>> NFS RECLAIM_COMPLETE (CB_NULL, ack)
>>> NFS_SECINFO_NO_NAME (ack)
>>> NFS PUTROOTFH|GETATTR (ack)
>>> NFS GETATTR FH:0x62d40c52 (ack), 8 times
>>> NFS ACCESS FH_ -x62d40c52 (denied md xt dl, alllowed rd lu)
>>> NFS LOOKUP DH:0x62d40c52/Public (ack)
>>> NFS LOOKUP DH:0x62d40c52/Public (ack)
>>> NFS GETATTR FH:0x8ee88cee (ack), 3 times
>>>
>>>
>>> Now the differences start:
>>>
>>> The fixed NFS client repeatedly gets ack back, the broken NFS
>>> client gets
>>>
>>> NFS GETATTR FH:0x8ee88cee (NFS4ERR_DELAY), repeating forever (exp.
>>> backoff)
>>
>> Any idea why the server is not able to respond properly to
>> the GETATTR request? That seems like the root of the problem.
>>
>
> The GETATTR is a request for fs_locations in order to probe for
> alternative IP addresses.
>
> IIRC, some earlier implementations of knfsd had this response when the
> mountd daemon wasn't configured to expect a referral upcall for that
> location.

knfsd, or mountd? Is there known to be a server-side fix available?


--
Chuck Lever



2022-02-24 01:58:35

by Chuck Lever III

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)



> On Feb 21, 2022, at 5:48 AM, Kurt Garloff <[email protected]> wrote:
>
> Hi,
>
> On 21.02.22 10:31, Kurt Garloff wrote:
>> Hi Olga,
>>
>> On 21.02.22 02:19, Kornievskaia, Olga wrote:
>>> [...]
>>> Is it possible for you to provide a network trace?
>>
>> Yes.
>>
>> Is tcpdump what you'd like to see? wireshark's dumpcap?
>> Any NFS specific tracing tools I should be using?
>>
>> One trace with a working kernel and one with the broken one?
>
> Comparing the good and the bad trace ...
>
> mount -t nfs 192.168.155.74:/Public /mnt/Public
> against Qnap 4.3.4.xxx NFS v4.1 server.
>
> Both do:
>
> Establish conn
> NFS NULL (ack)
> NFS EXCHANGE_ID (4.2 -> NFS4ERR_MINOR_VERS_MISMATCH)
> Teardown and reestablish
> NFS NULL (ack)
> NFS EXCAHNGE_ID (4.1 -> ack)
> NFS EXCAHNGE_ID (4.1 -> ack)
> NFS CREATE_SESSION (ack)
> NFS RECLAIM_COMPLETE (CB_NULL, ack)
> NFS_SECINFO_NO_NAME (ack)
> NFS PUTROOTFH|GETATTR (ack)
> NFS GETATTR FH:0x62d40c52 (ack), 8 times
> NFS ACCESS FH_ -x62d40c52 (denied md xt dl, alllowed rd lu)
> NFS LOOKUP DH:0x62d40c52/Public (ack)
> NFS LOOKUP DH:0x62d40c52/Public (ack)
> NFS GETATTR FH:0x8ee88cee (ack), 3 times
>
>
> Now the differences start:
>
> The fixed NFS client repeatedly gets ack back, the broken NFS client gets
>
> NFS GETATTR FH:0x8ee88cee (NFS4ERR_DELAY), repeating forever (exp. backoff)

Any idea why the server is not able to respond properly to
the GETATTR request? That seems like the root of the problem.

--
Chuck Lever



2022-02-24 09:14:46

by Greg KH

[permalink] [raw]
Subject: Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)

On Wed, Feb 23, 2022 at 11:24:41PM +0100, Kurt Garloff wrote:
> Hi Olga,
>
> On 23/02/2022 18:56, Olga Kornievskaia wrote:
> > On Wed, Feb 23, 2022 at 8:20 AM Kurt Garloff <[email protected]> wrote:
> > > Hi Olga,
> > >
> > > any updates? Were you able to investigate the traces?
> > >
> > > Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
> > > though Qnap might have patched it),is not something that
> > > should happen with a -stable kernel update, even if the problem
> > > would be on the Qnap side, which would not be completely
> > > surprising.
> > >
> > > So I think we should revert this patch at least for -stable,
> > > unless we understand what's going on and have a better fix
> > > than a plain revert.
> > I haven't commented on your ask of requesting a revert in the stable
> > version. I'm not sure what the philosophy there. I don't see why we
> > can't ask for this feature to only be available from the kernel
> > version it has been accepted into and not before. If you think the
> > kernel version that you want to use will always be before this feature
> > was accepted, then asking folks responsible for "stable" kernels seems
> > like a good idea. At the time of inclusion to stable, I wasn't aware
> > of the broken legacy server implementations out there.
>
> I guess Greg would need to comment on the detailed policies
> for stable kernels.
> One of the goals for sure is to avoid regressions. If that causes
> bugs not to be fixable or features not to be available, than that's
> a price that might need to be accepted. A regression is just many many
> times worse than an unfixed issue, twice so for something that claims
> to be stable.

The policy for the stable kernel releases is the same as for Linus's
releases, "no user visible regressions are allowed".

There is no difference here, if something changes in one of Linus's
releases that breaks a working system, then it needs to be fixed. The
stable kernels are not unique here at all. Any user must be able to
always upgrade to a new kernel version without having to worry about
anything breaking.

So if there is a kernel change in Linus's tree that breaks existing
systems, it needs to be reverted or fixed to not do this.

thanks,

greg k-h