2021-01-08 11:44:32

by Kurt Garloff

[permalink] [raw]
Subject: NFS 4.2 client support broken on 5.10.5

Hi Neil, Anna, Trond,

compiling a kernel, I suddenly started getting errors from objtool orc.
(This first occurs on init/main.o.)

I looked at all kind of things, before I noticed that this was not a
toolchain issue (gcc-10.2.1 self compiled), gcc plugins (I use
structleak and stackleak) nor an issue with objtool or libelf,
but that there was an -EIO error.

The kernel tree is on an NFS share, and I run 5.10.5 client kernel
against the kernel NFS (4.2) server, running a 5.10.3 kernel.

The issue does NOT occur on a 5.10.3 client kernel, but is easily
reproducible on 5.10.5. Note that 5.10.5 on a local file system or
against an NFSv3 server does not show the issue.

Test program that reproduces this on the first pwrite64() is attached.
Note that the call to ftruncate() is required to make the problem happen.

I could go on bisecting this to a particular patch, but you'll
probably be able to see right away what's wrong.

Best,

--
Kurt Garloff <[email protected]>
Cologne, Germany


Attachments:
testpwrite.c (1.54 kB)

2021-01-08 12:03:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS 4.2 client support broken on 5.10.5

On Fri, 2021-01-08 at 12:41 +0100, Kurt Garloff wrote:
> Hi Neil, Anna, Trond,
>
> compiling a kernel, I suddenly started getting errors from objtool
> orc.
> (This first occurs on init/main.o.)
>
> I looked at all kind of things, before I noticed that this was not a
> toolchain issue (gcc-10.2.1 self compiled), gcc plugins (I use
> structleak and stackleak) nor an issue with objtool or libelf,
> but that there was an -EIO error.
>
> The kernel tree is on an NFS share, and I run 5.10.5 client kernel
> against the kernel NFS (4.2) server, running a 5.10.3 kernel.
>
> The issue does NOT occur on a 5.10.3 client kernel, but is easily
> reproducible on 5.10.5. Note that 5.10.5 on a local file system or
> against an NFSv3 server does not show the issue.
>
> Test program that reproduces this on the first pwrite64() is
> attached.
> Note that the call to ftruncate() is required to make the problem
> happen.
>
> I could go on bisecting this to a particular patch, but you'll
> probably be able to see right away what's wrong.
>
> Best,
>

Hmm... If this is NFSv4.2 do you have READ_PLUS turned on or off in
.config? It really is not safe to enable READ_PLUS on 5.10 kernels
since that can cause random memory corruption.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-01-08 14:41:47

by Kurt Garloff

[permalink] [raw]
Subject: Re: NFS 4.2 client support broken on 5.10.5

Hi Trond,

On 08/01/2021 12:58, Trond Myklebust wrote:
> On Fri, 2021-01-08 at 12:41 +0100, Kurt Garloff wrote:
>> Hi Neil, Anna, Trond,
>>
>> compiling a kernel, I suddenly started getting errors from objtool
>> orc.
>> (This first occurs on init/main.o.)
>>
>> I looked at all kind of things, before I noticed that this was not a
>> toolchain issue (gcc-10.2.1 self compiled), gcc plugins (I use
>> structleak and stackleak) nor an issue with objtool or libelf,
>> but that there was an -EIO error.
>>
>> The kernel tree is on an NFS share, and I run 5.10.5 client kernel
>> against the kernel NFS (4.2) server, running a 5.10.3 kernel.
>>
>> The issue does NOT occur on a 5.10.3 client kernel, but is easily
>> reproducible on 5.10.5. Note that 5.10.5 on a local file system or
>> against an NFSv3 server does not show the issue.
>>
>> Test program that reproduces this on the first pwrite64() is
>> attached.
>> Note that the call to ftruncate() is required to make the problem
>> happen.
>>
>> I could go on bisecting this to a particular patch, but you'll
>> probably be able to see right away what's wrong.
>>
> Hmm... If this is NFSv4.2 do you have READ_PLUS turned on or off in
> .config? It really is not safe to enable READ_PLUS on 5.10 kernels
> since that can cause random memory corruption.
OK, it is turned on in my kernel -- looks like I have not read the
warning in the config option help text carefully enough ...

I'll test what happens if I switch it off and report back.

Thanks for the quick response

---

Kurt Garloff <[email protected]>
Cologne, Germany


2021-01-08 15:52:09

by Kurt Garloff

[permalink] [raw]
Subject: Re: NFS 4.2 client support broken on 5.10.5

Hi Trond,

Am 08.01.21 um 15:39 schrieb Kurt Garloff:
> Hi Trond,
>
> On 08/01/2021 12:58, Trond Myklebust wrote:
>> On Fri, 2021-01-08 at 12:41 +0100, Kurt Garloff wrote:
>>> [...]
>>> The kernel tree is on an NFS share, and I run 5.10.5 client kernel
>>> against the kernel NFS (4.2) server, running a 5.10.3 kernel.
>>>
>>> The issue does NOT occur on a 5.10.3 client kernel, but is easily
>>> reproducible on 5.10.5. Note that 5.10.5 on a local file system or
>>> against an NFSv3 server does not show the issue.
>>>
>>> Test program that reproduces this on the first pwrite64() is
>>> attached.
>>> Note that the call to ftruncate() is required to make the problem
>>> happen.
>>>
>>> I could go on bisecting this to a particular patch, but you'll
>>> probably be able to see right away what's wrong.
>>>
>> Hmm... If this is NFSv4.2 do you have READ_PLUS turned on or off in
>> .config? It really is not safe to enable READ_PLUS on 5.10 kernels
>> since that can cause random memory corruption.
> OK, it is turned on in my kernel -- looks like I have not read the
> warning in the config option help text carefully enough ...
>
> I'll test what happens if I switch it off and report back.

OK, I compiled a kernel without support for READ_PLUS
and the test program magically succeeds.

So take my report as input to the developers that work
on making READ_PLUS work. Maybe they want to add
my little program to their CI suite.

Thanks,

--
Kurt Garloff <[email protected]>, Cologne, Germany