Hello,
I have just deployed a new system with Mellanox ConnectX-4 VPI EDR IB
cards and wanted to setup NFS over RDMA on it.
However, while mounting the FS over RDMA works fine, actually using it
results in the following messages absolutely hammering dmesg on both
client and server:
> https://gist.github.com/BtbN/9582e597b6581f552fa15982b0285b80#file-server-log
The spam only stops once I forcibly reboot the client. The filesystem
gets nowhere during all this. The retrans counter in nfsstat just keeps
going up, nothing actually gets done.
This is on Linux 5.4.54, using nfs-utils 2.4.3.
The mlx5 driver had enhanced-mode disabled in order to enable IPoIB
connected mode with an MTU of 65520.
Normal NFS 4.2 over tcp works perfectly fine on this setup, it's only
when I mount via rdma that things go wrong.
Is this an issue on my end, or did I run into a bug somewhere here?
Any pointers, patches and solutions to test are welcome.
Thanks,
Timo Rothenpieler
Hi Timo-
> On Aug 3, 2020, at 11:05 AM, Timo Rothenpieler <[email protected]> wrote:
>
> Hello,
>
> I have just deployed a new system with Mellanox ConnectX-4 VPI EDR IB cards and wanted to setup NFS over RDMA on it.
>
> However, while mounting the FS over RDMA works fine, actually using it results in the following messages absolutely hammering dmesg on both client and server:
>
>> https://gist.github.com/BtbN/9582e597b6581f552fa15982b0285b80#file-server-log
>
> The spam only stops once I forcibly reboot the client. The filesystem gets nowhere during all this. The retrans counter in nfsstat just keeps going up, nothing actually gets done.
>
> This is on Linux 5.4.54, using nfs-utils 2.4.3.
> The mlx5 driver had enhanced-mode disabled in order to enable IPoIB connected mode with an MTU of 65520.
>
> Normal NFS 4.2 over tcp works perfectly fine on this setup, it's only when I mount via rdma that things go wrong.
>
> Is this an issue on my end, or did I run into a bug somewhere here?
> Any pointers, patches and solutions to test are welcome.
I haven't seen that failure mode here, so best I can recommend is
keep investigating. I've copied linux-rdma in case they have any
advice.
--
Chuck Lever
On Mon, Aug 03, 2020 at 12:24:21PM -0400, Chuck Lever wrote:
> Hi Timo-
>
> > On Aug 3, 2020, at 11:05 AM, Timo Rothenpieler <[email protected]> wrote:
> >
> > Hello,
> >
> > I have just deployed a new system with Mellanox ConnectX-4 VPI EDR IB cards and wanted to setup NFS over RDMA on it.
> >
> > However, while mounting the FS over RDMA works fine, actually using it results in the following messages absolutely hammering dmesg on both client and server:
> >
> >> https://gist.github.com/BtbN/9582e597b6581f552fa15982b0285b80#file-server-log
> >
> > The spam only stops once I forcibly reboot the client. The filesystem gets nowhere during all this. The retrans counter in nfsstat just keeps going up, nothing actually gets done.
> >
> > This is on Linux 5.4.54, using nfs-utils 2.4.3.
> > The mlx5 driver had enhanced-mode disabled in order to enable IPoIB connected mode with an MTU of 65520.
> >
> > Normal NFS 4.2 over tcp works perfectly fine on this setup, it's only when I mount via rdma that things go wrong.
> >
> > Is this an issue on my end, or did I run into a bug somewhere here?
> > Any pointers, patches and solutions to test are welcome.
>
> I haven't seen that failure mode here, so best I can recommend is
> keep investigating. I've copied linux-rdma in case they have any
> advice.
The mentioning of IPoIB is a slightly confusing in the context of NFS-over-RDMA.
Are you running NFS over IPoIB?
From brief look on CQE error syndrome (local length error), the client sends wrong WQE.
Thanks
>
> --
> Chuck Lever
>
>
>
On 04.08.2020 11:36, Leon Romanovsky wrote:
> On Mon, Aug 03, 2020 at 12:24:21PM -0400, Chuck Lever wrote:
>> Hi Timo-
>>
>>> On Aug 3, 2020, at 11:05 AM, Timo Rothenpieler <[email protected]> wrote:
>>>
>>> Hello,
>>>
>>> I have just deployed a new system with Mellanox ConnectX-4 VPI EDR IB cards and wanted to setup NFS over RDMA on it.
>>>
>>> However, while mounting the FS over RDMA works fine, actually using it results in the following messages absolutely hammering dmesg on both client and server:
>>>
>>>> https://gist.github.com/BtbN/9582e597b6581f552fa15982b0285b80#file-server-log
>>>
>>> The spam only stops once I forcibly reboot the client. The filesystem gets nowhere during all this. The retrans counter in nfsstat just keeps going up, nothing actually gets done.
>>>
>>> This is on Linux 5.4.54, using nfs-utils 2.4.3.
>>> The mlx5 driver had enhanced-mode disabled in order to enable IPoIB connected mode with an MTU of 65520.
>>>
>>> Normal NFS 4.2 over tcp works perfectly fine on this setup, it's only when I mount via rdma that things go wrong.
>>>
>>> Is this an issue on my end, or did I run into a bug somewhere here?
>>> Any pointers, patches and solutions to test are welcome.
>>
>> I haven't seen that failure mode here, so best I can recommend is
>> keep investigating. I've copied linux-rdma in case they have any
>> advice.
>
> The mentioning of IPoIB is a slightly confusing in the context of NFS-over-RDMA.
> Are you running NFS over IPoIB?
For all I'm aware, NFS over RDMA still needs an IP and port to be
targeted to, so IPoIB is mandatory?
At least the admin guide in the kernel says so.
Right now I actually am running NFS over IPoIB (without RDMA), because
of the issue at hand. And would like to turn on RDMA for enhanced
performance.
> From brief look on CQE error syndrome (local length error), the client sends wrong WQE.
Does that point at an issue in the kernel code, or something I did wrong?
The fstab entries for these mounts look like this:
10.110.10.200:/home /home nfs4
rw,rdma,port=20049,noatime,async,vers=4.2,_netdev 0 0
Is there anything more I can investigate? I tried turning connected mode
off and lowering the mtu in turn, but that did not have any effect.
On Tue, Aug 04, 2020 at 12:52:27PM +0200, Timo Rothenpieler wrote:
> On 04.08.2020 11:36, Leon Romanovsky wrote:
> > On Mon, Aug 03, 2020 at 12:24:21PM -0400, Chuck Lever wrote:
> > > Hi Timo-
> > >
> > > > On Aug 3, 2020, at 11:05 AM, Timo Rothenpieler <[email protected]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have just deployed a new system with Mellanox ConnectX-4 VPI EDR IB cards and wanted to setup NFS over RDMA on it.
> > > >
> > > > However, while mounting the FS over RDMA works fine, actually using it results in the following messages absolutely hammering dmesg on both client and server:
> > > >
> > > > > https://gist.github.com/BtbN/9582e597b6581f552fa15982b0285b80#file-server-log
> > > >
> > > > The spam only stops once I forcibly reboot the client. The filesystem gets nowhere during all this. The retrans counter in nfsstat just keeps going up, nothing actually gets done.
> > > >
> > > > This is on Linux 5.4.54, using nfs-utils 2.4.3.
> > > > The mlx5 driver had enhanced-mode disabled in order to enable IPoIB connected mode with an MTU of 65520.
> > > >
> > > > Normal NFS 4.2 over tcp works perfectly fine on this setup, it's only when I mount via rdma that things go wrong.
> > > >
> > > > Is this an issue on my end, or did I run into a bug somewhere here?
> > > > Any pointers, patches and solutions to test are welcome.
> > >
> > > I haven't seen that failure mode here, so best I can recommend is
> > > keep investigating. I've copied linux-rdma in case they have any
> > > advice.
> >
> > The mentioning of IPoIB is a slightly confusing in the context of NFS-over-RDMA.
> > Are you running NFS over IPoIB?
>
> For all I'm aware, NFS over RDMA still needs an IP and port to be targeted
> to, so IPoIB is mandatory?
> At least the admin guide in the kernel says so.
>
> Right now I actually am running NFS over IPoIB (without RDMA), because of
> the issue at hand. And would like to turn on RDMA for enhanced performance.
>
> > From brief look on CQE error syndrome (local length error), the client sends wrong WQE.
>
> Does that point at an issue in the kernel code, or something I did wrong?
>
> The fstab entries for these mounts look like this:
>
> 10.110.10.200:/home /home nfs4
> rw,rdma,port=20049,noatime,async,vers=4.2,_netdev 0 0
>
> Is there anything more I can investigate? I tried turning connected mode off
> and lowering the mtu in turn, but that did not have any effect.
Chuck,
You probably know which traces Timo should enable on the client.
The fact that NFS over (not-enahnced) IPoIB works highly reduces
driver/FW issues.
Thanks
> On Aug 4, 2020, at 8:25 AM, Leon Romanovsky <[email protected]> wrote:
>
> On Tue, Aug 04, 2020 at 12:52:27PM +0200, Timo Rothenpieler wrote:
>> On 04.08.2020 11:36, Leon Romanovsky wrote:
>>> On Mon, Aug 03, 2020 at 12:24:21PM -0400, Chuck Lever wrote:
>>>> Hi Timo-
>>>>
>>>>> On Aug 3, 2020, at 11:05 AM, Timo Rothenpieler <[email protected]> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I have just deployed a new system with Mellanox ConnectX-4 VPI EDR IB cards and wanted to setup NFS over RDMA on it.
>>>>>
>>>>> However, while mounting the FS over RDMA works fine, actually using it results in the following messages absolutely hammering dmesg on both client and server:
>>>>>
>>>>>> https://gist.github.com/BtbN/9582e597b6581f552fa15982b0285b80#file-server-log
>>>>>
>>>>> The spam only stops once I forcibly reboot the client. The filesystem gets nowhere during all this. The retrans counter in nfsstat just keeps going up, nothing actually gets done.
>>>>>
>>>>> This is on Linux 5.4.54, using nfs-utils 2.4.3.
>>>>> The mlx5 driver had enhanced-mode disabled in order to enable IPoIB connected mode with an MTU of 65520.
>>>>>
>>>>> Normal NFS 4.2 over tcp works perfectly fine on this setup, it's only when I mount via rdma that things go wrong.
>>>>>
>>>>> Is this an issue on my end, or did I run into a bug somewhere here?
>>>>> Any pointers, patches and solutions to test are welcome.
>>>>
>>>> I haven't seen that failure mode here, so best I can recommend is
>>>> keep investigating. I've copied linux-rdma in case they have any
>>>> advice.
>>>
>>> The mentioning of IPoIB is a slightly confusing in the context of NFS-over-RDMA.
>>> Are you running NFS over IPoIB?
>>
>> For all I'm aware, NFS over RDMA still needs an IP and port to be targeted
>> to, so IPoIB is mandatory?
>> At least the admin guide in the kernel says so.
>>
>> Right now I actually am running NFS over IPoIB (without RDMA), because of
>> the issue at hand. And would like to turn on RDMA for enhanced performance.
>>
>>> From brief look on CQE error syndrome (local length error), the client sends wrong WQE.
>>
>> Does that point at an issue in the kernel code, or something I did wrong?
>>
>> The fstab entries for these mounts look like this:
>>
>> 10.110.10.200:/home /home nfs4
>> rw,rdma,port=20049,noatime,async,vers=4.2,_netdev 0 0
>>
>> Is there anything more I can investigate? I tried turning connected mode off
>> and lowering the mtu in turn, but that did not have any effect.
>
> Chuck,
>
> You probably know which traces Timo should enable on the client.
> The fact that NFS over (not-enahnced) IPoIB works highly reduces
> driver/FW issues.
Timo, I tend to think this is not a configuration issue.
Do you know of a known working kernel?
--
Chuck Lever
On 04.08.2020 14:49, Chuck Lever wrote:
> Timo, I tend to think this is not a configuration issue.
> Do you know of a known working kernel?
>
This is a brand new system, it's never been running with any kernel
older than 5.4, and downgrading it to 4.19 or something else while in
operation is unfortunately not easily possible. For a client it would
definitely not be out of the question, but the main nfs server I cannot
easily downgrade.
Also keep in mind that the dmesg spam happens on both server and client
simultaneously.
I'll see if I can borrow two of the nodes to turn into a temporary test
system for this.
The Kernel for this system is self-built and not any distribution
kernel. This could not be a missing kernel config option or something?
> On Aug 4, 2020, at 9:08 AM, Timo Rothenpieler <[email protected]> wrote:
>
> On 04.08.2020 14:49, Chuck Lever wrote:
>> Timo, I tend to think this is not a configuration issue.
>> Do you know of a known working kernel?
>
> This is a brand new system, it's never been running with any kernel older than 5.4, and downgrading it to 4.19 or something else while in operation is unfortunately not easily possible. For a client it would definitely not be out of the question, but the main nfs server I cannot easily downgrade.
>
> Also keep in mind that the dmesg spam happens on both server and client simultaneously.
Let's start with the client only, since restarting it seems to clear the problem.
> I'll see if I can borrow two of the nodes to turn into a temporary test system for this.
>
> The Kernel for this system is self-built and not any distribution kernel.
Would it be easy to try a kernel earlier in the 5.4.y stable series?
> This could not be a missing kernel config option or something?
Doubtful.
--
Chuck Lever
On 04.08.2020 15:12, Chuck Lever wrote:
>
>
>> On Aug 4, 2020, at 9:08 AM, Timo Rothenpieler <[email protected]> wrote:
>>
>> On 04.08.2020 14:49, Chuck Lever wrote:
>>> Timo, I tend to think this is not a configuration issue.
>>> Do you know of a known working kernel?
>>
>> This is a brand new system, it's never been running with any kernel older than 5.4, and downgrading it to 4.19 or something else while in operation is unfortunately not easily possible. For a client it would definitely not be out of the question, but the main nfs server I cannot easily downgrade.
>>
>> Also keep in mind that the dmesg spam happens on both server and client simultaneously.
>
> Let's start with the client only, since restarting it seems to clear the problem.
>
>
>> I'll see if I can borrow two of the nodes to turn into a temporary test system for this.
>>
>> The Kernel for this system is self-built and not any distribution kernel.
>
> Would it be easy to try a kernel earlier in the 5.4.y stable series?
Yes, that should be very straight forward, since I can just use the same
config.
Got any specific version in mind?
> On Aug 4, 2020, at 9:19 AM, Timo Rothenpieler <[email protected]> wrote:
>
> On 04.08.2020 15:12, Chuck Lever wrote:
>>> On Aug 4, 2020, at 9:08 AM, Timo Rothenpieler <[email protected]> wrote:
>>>
>>> On 04.08.2020 14:49, Chuck Lever wrote:
>>>> Timo, I tend to think this is not a configuration issue.
>>>> Do you know of a known working kernel?
>>>
>>> This is a brand new system, it's never been running with any kernel older than 5.4, and downgrading it to 4.19 or something else while in operation is unfortunately not easily possible. For a client it would definitely not be out of the question, but the main nfs server I cannot easily downgrade.
>>>
>>> Also keep in mind that the dmesg spam happens on both server and client simultaneously.
>> Let's start with the client only, since restarting it seems to clear the problem.
>>> I'll see if I can borrow two of the nodes to turn into a temporary test system for this.
>>>
>>> The Kernel for this system is self-built and not any distribution kernel.
>> Would it be easy to try a kernel earlier in the 5.4.y stable series?
>
> Yes, that should be very straight forward, since I can just use the same config.
> Got any specific version in mind?
Start with an early one, like 5.4.16.
--
Chuck Lever
On 04.08.2020 15:24, Chuck Lever wrote:
> Start with an early one, like 5.4.16.
>
Still happening with 5.4.16 on the client. I'll see if I can get a 4.19
one going soon.
On Tue, Aug 04, 2020 at 09:12:55AM -0400, Chuck Lever wrote:
>
>
> > On Aug 4, 2020, at 9:08 AM, Timo Rothenpieler <[email protected]> wrote:
> >
> > On 04.08.2020 14:49, Chuck Lever wrote:
> >> Timo, I tend to think this is not a configuration issue.
> >> Do you know of a known working kernel?
> >
> > This is a brand new system, it's never been running with any kernel older than 5.4, and downgrading it to 4.19 or something else while in operation is unfortunately not easily possible. For a client it would definitely not be out of the question, but the main nfs server I cannot easily downgrade.
> >
> > Also keep in mind that the dmesg spam happens on both server and client simultaneously.
>
> Let's start with the client only, since restarting it seems to clear the problem.
It is client because according to the server CQE errors, it is Remote_Invalid_Request_Error
with "9.7.5.2.2 NAK CODES" from IBTA.
Thanks
> On Aug 4, 2020, at 9:46 AM, Leon Romanovsky <[email protected]> wrote:
>
> On Tue, Aug 04, 2020 at 09:12:55AM -0400, Chuck Lever wrote:
>>
>>
>>> On Aug 4, 2020, at 9:08 AM, Timo Rothenpieler <[email protected]> wrote:
>>>
>>> On 04.08.2020 14:49, Chuck Lever wrote:
>>>> Timo, I tend to think this is not a configuration issue.
>>>> Do you know of a known working kernel?
>>>
>>> This is a brand new system, it's never been running with any kernel older than 5.4, and downgrading it to 4.19 or something else while in operation is unfortunately not easily possible. For a client it would definitely not be out of the question, but the main nfs server I cannot easily downgrade.
>>>
>>> Also keep in mind that the dmesg spam happens on both server and client simultaneously.
>>
>> Let's start with the client only, since restarting it seems to clear the problem.
>
> It is client because according to the server CQE errors, it is Remote_Invalid_Request_Error
> with "9.7.5.2.2 NAK CODES" from IBTA.
Thanks! OK, then let's use ftrace.
Timo, can you install trace-cmd on your client? Then:
1. # trace-cmd record -e rpcrdma -e sunrpc
2. Trigger the problem
3. Control-C the trace-cmd, and copy the trace.dat file to another system
4. reboot your client
Then send me your trace.dat. You don't have to cc the mailing lists.
--
Chuck Lever
> On Aug 4, 2020, at 9:53 AM, Chuck Lever <[email protected]> wrote:
>
>
>
>> On Aug 4, 2020, at 9:46 AM, Leon Romanovsky <[email protected]> wrote:
>>
>> On Tue, Aug 04, 2020 at 09:12:55AM -0400, Chuck Lever wrote:
>>>
>>>
>>>> On Aug 4, 2020, at 9:08 AM, Timo Rothenpieler <[email protected]> wrote:
>>>>
>>>> On 04.08.2020 14:49, Chuck Lever wrote:
>>>>> Timo, I tend to think this is not a configuration issue.
>>>>> Do you know of a known working kernel?
>>>>
>>>> This is a brand new system, it's never been running with any kernel older than 5.4, and downgrading it to 4.19 or something else while in operation is unfortunately not easily possible. For a client it would definitely not be out of the question, but the main nfs server I cannot easily downgrade.
>>>>
>>>> Also keep in mind that the dmesg spam happens on both server and client simultaneously.
>>>
>>> Let's start with the client only, since restarting it seems to clear the problem.
>>
>> It is client because according to the server CQE errors, it is Remote_Invalid_Request_Error
>> with "9.7.5.2.2 NAK CODES" from IBTA.
>
> Thanks! OK, then let's use ftrace.
>
> Timo, can you install trace-cmd on your client? Then:
>
> 1. # trace-cmd record -e rpcrdma -e sunrpc
>
> 2. Trigger the problem
>
> 3. Control-C the trace-cmd, and copy the trace.dat file to another system
>
> 4. reboot your client
>
> Then send me your trace.dat. You don't have to cc the mailing lists.
I see a LOC_LEN_ERR on a Receive. Leon, doesn't that mean the server's
Send was too large?
Timo, what filesystem are you sharing on your NFS server? The thing that
comes to mind is https://bugzilla.kernel.org/show_bug.cgi?id=198053
--
Chuck Lever
On 04.08.2020 17:34, Chuck Lever wrote:
> I see a LOC_LEN_ERR on a Receive. Leon, doesn't that mean the server's
> Send was too large?
>
> Timo, what filesystem are you sharing on your NFS server? The thing that
> comes to mind is https://bugzilla.kernel.org/show_bug.cgi?id=198053
>
The filesystem on the server is indeed a zfs-on-linux (version 0.8.4),
just as in that bug report.
Should I try to apply the proposed fix you posted on that bug report on
the client (and server?).
> On Aug 4, 2020, at 11:39 AM, Timo Rothenpieler <[email protected]> wrote:
>
> On 04.08.2020 17:34, Chuck Lever wrote:
>> I see a LOC_LEN_ERR on a Receive. Leon, doesn't that mean the server's
>> Send was too large?
>> Timo, what filesystem are you sharing on your NFS server? The thing that
>> comes to mind is https://bugzilla.kernel.org/show_bug.cgi?id=198053
>
> The filesystem on the server is indeed a zfs-on-linux (version 0.8.4), just as in that bug report.
>
> Should I try to apply the proposed fix you posted on that bug report on the client (and server?).
If you are hitting that bug, the server is the problem. The client
should work fine once the server is fixed. (I'm not happy about
the client's looping behavior either, but that will go away once
the server behaves).
I'm not hopeful that the fix applies cleanly to v4.19, but it
might. Another option would be upgrading your NFS server.
--
Chuck Lever
On 04.08.2020 17:46, Chuck Lever wrote:
>
>
>> On Aug 4, 2020, at 11:39 AM, Timo Rothenpieler <[email protected]> wrote:
>>
>> On 04.08.2020 17:34, Chuck Lever wrote:
>>> I see a LOC_LEN_ERR on a Receive. Leon, doesn't that mean the server's
>>> Send was too large?
>>> Timo, what filesystem are you sharing on your NFS server? The thing that
>>> comes to mind is https://bugzilla.kernel.org/show_bug.cgi?id=198053
>>
>> The filesystem on the server is indeed a zfs-on-linux (version 0.8.4), just as in that bug report.
>>
>> Should I try to apply the proposed fix you posted on that bug report on the client (and server?).
>
> If you are hitting that bug, the server is the problem. The client
> should work fine once the server is fixed. (I'm not happy about
> the client's looping behavior either, but that will go away once
> the server behaves).
>
> I'm not hopeful that the fix applies cleanly to v4.19, but it
> might. Another option would be upgrading your NFS server.
It's running on 5.4.54 and the patch applies with no fuzz whatsoever:
> patching file fs/nfsd/nfs4xdr.c
> Hunk #1 succeeded at 3530 (offset 9 lines).
> Hunk #2 succeeded at 3556 (offset 9 lines).
> patching file include/linux/sunrpc/svc.h
> patching file include/linux/sunrpc/svc_rdma.h
> Hunk #2 succeeded at 172 (offset 1 line).
> Hunk #3 succeeded at 192 (offset 1 line).
> patching file include/linux/sunrpc/svc_xprt.h
> patching file net/sunrpc/svc.c
> Hunk #1 succeeded at 1635 (offset -2 lines).
> patching file net/sunrpc/svcsock.c
> Hunk #2 succeeded at 660 (offset 2 lines).
> Hunk #3 succeeded at 1181 (offset 4 lines).
> patching file net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> Hunk #1 succeeded at 193 (offset 2 lines).
> patching file net/sunrpc/xprtrdma/svc_rdma_rw.c
> Hunk #1 succeeded at 481 (offset -3 lines).
> Hunk #2 succeeded at 500 (offset -3 lines).
> Hunk #3 succeeded at 510 (offset -3 lines).
> Hunk #4 succeeded at 524 (offset -3 lines).
> Hunk #5 succeeded at 538 (offset -3 lines).
> Hunk #6 succeeded at 578 (offset -3 lines).
> patching file net/sunrpc/xprtrdma/svc_rdma_sendto.c
> Hunk #1 succeeded at 856 (offset -15 lines).
> Hunk #2 succeeded at 891 with fuzz 2 (offset -22 lines).
> patching file net/sunrpc/xprtrdma/svc_rdma_transport.c
> Hunk #1 succeeded at 81 (offset -1 lines).
I will deploy the patch to both server and client and report back.
On Tue, Aug 04, 2020 at 11:34:05AM -0400, Chuck Lever wrote:
>
>
> > On Aug 4, 2020, at 9:53 AM, Chuck Lever <[email protected]> wrote:
> >
> >
> >
> >> On Aug 4, 2020, at 9:46 AM, Leon Romanovsky <[email protected]> wrote:
> >>
> >> On Tue, Aug 04, 2020 at 09:12:55AM -0400, Chuck Lever wrote:
> >>>
> >>>
> >>>> On Aug 4, 2020, at 9:08 AM, Timo Rothenpieler <[email protected]> wrote:
> >>>>
> >>>> On 04.08.2020 14:49, Chuck Lever wrote:
> >>>>> Timo, I tend to think this is not a configuration issue.
> >>>>> Do you know of a known working kernel?
> >>>>
> >>>> This is a brand new system, it's never been running with any kernel older than 5.4, and downgrading it to 4.19 or something else while in operation is unfortunately not easily possible. For a client it would definitely not be out of the question, but the main nfs server I cannot easily downgrade.
> >>>>
> >>>> Also keep in mind that the dmesg spam happens on both server and client simultaneously.
> >>>
> >>> Let's start with the client only, since restarting it seems to clear the problem.
> >>
> >> It is client because according to the server CQE errors, it is Remote_Invalid_Request_Error
> >> with "9.7.5.2.2 NAK CODES" from IBTA.
> >
> > Thanks! OK, then let's use ftrace.
> >
> > Timo, can you install trace-cmd on your client? Then:
> >
> > 1. # trace-cmd record -e rpcrdma -e sunrpc
> >
> > 2. Trigger the problem
> >
> > 3. Control-C the trace-cmd, and copy the trace.dat file to another system
> >
> > 4. reboot your client
> >
> > Then send me your trace.dat. You don't have to cc the mailing lists.
>
> I see a LOC_LEN_ERR on a Receive. Leon, doesn't that mean the server's
> Send was too large?
1.
We have local_length_error counter, it can help to run it on server and clients.
[leonro@vm ~]$ cat /sys/class/infiniband/ibp0s9/ports/1/hw_counters/resp_local_length_error
0
resp_local_length_error - "Number of times responder detected local length errors."
2.
LOC_LEN_ERR supports that is written in CQE error on the client.
This is what is written in our HW document:
IB compliant completion with error syndrome
0x1: Local_Length_Error
3.
From IBTA, 11.6.2 COMPLETION RETURN STATUS
Local Length Error - Generated for a Work Request posted to the local
Send Queue when the sum of the Data Segment lengths exceeds the message
length for the channel adapter port. Generated for a Work Request posted
to the local Receive Queue when the sum of the Data Segment lengths is
too small to receive a valid incoming message or the length of the incoming
message is greater than the maximum message size supported by the HCA port
that received the message.
So if "1" works :), we will be able to distinguish if client sends too
large WR or recieves too large.
Thanks
>
> Timo, what filesystem are you sharing on your NFS server? The thing that
> comes to mind is https://bugzilla.kernel.org/show_bug.cgi?id=198053
>
>
> --
> Chuck Lever
>
>
>
On 04.08.2020 17:50, Timo Rothenpieler wrote:
> On 04.08.2020 17:46, Chuck Lever wrote:
>>
>>
>>> On Aug 4, 2020, at 11:39 AM, Timo Rothenpieler
>>> <[email protected]> wrote:
>>>
>>> On 04.08.2020 17:34, Chuck Lever wrote:
>>>> I see a LOC_LEN_ERR on a Receive. Leon, doesn't that mean the server's
>>>> Send was too large?
>>>> Timo, what filesystem are you sharing on your NFS server? The thing
>>>> that
>>>> comes to mind is https://bugzilla.kernel.org/show_bug.cgi?id=198053
>>>
>>> The filesystem on the server is indeed a zfs-on-linux (version
>>> 0.8.4), just as in that bug report.
>>>
>>> Should I try to apply the proposed fix you posted on that bug report
>>> on the client (and server?).
>>
>> If you are hitting that bug, the server is the problem. The client
>> should work fine once the server is fixed. (I'm not happy about
>> the client's looping behavior either, but that will go away once
>> the server behaves).
>>
>> I'm not hopeful that the fix applies cleanly to v4.19, but it
>> might. Another option would be upgrading your NFS server.
>
> It's running on 5.4.54 and the patch applies with no fuzz whatsoever:
>
> I will deploy the patch to both server and client and report back.
Reporting success.
With the patch from that bug applied, no error spam is happening anymore.
Plus, the filesystem actually works and definitely got a whole lot
snappier than before. Which is not all that unexpected.
Thank you so much for your help analyzing this and for the fix!
I hope it can get applied to mainline soon and will reach 5.4 backports
eventually.
Until then, I will carry it as a local patch for the systems.
Thanks again,
Timo