LinuxLists.cc - [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts

2020-05-01 19:05:14

Subject: [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts

Hi-

Over the past year or so I've observed that using sec=krb5i or
sec=krb5p with NFS/RDMA while testing my Linux NFS client and server
results in frequent connection loss. I've been able to pin this down
a bit.

The problem is that SUNRPC's calls into the kernel crypto functions
can sometimes reschedule. If that happens between the time the
server receives an RPC Call and the time it has unwrapped the Call's
GSS sequence number, the number is very likely to have fallen
outside the GSS context's sequence number window. When that happens,
the server is required to drop the Call, and therefore also the
connection.

I've tested this observation by applying the following naive patch to
both my client and server, and got the following results.

I. Is this an effective fix?

With sec=krb5p,proto=rdma, I ran a 12-thread git regression test
(cd git/; make -j12 test).

Without this patch on the server, over 3,000 connection loss events
are observed. With it, the test runs on a single connection. Clearly
the server needs to have some kind of mitigation in this area.

II. Does the fix cause a performance regression?

Because this patch affects both the client and server paths, I
tested the performance differences between applying the patch in
various combinations and with both sec=krb5 (which checksums just
the RPC message header) and krb5i (which checksums the whole RPC
message.

fio 8KiB 70% read, 30% write for 30 seconds, NFSv3 on RDMA.

-- krb5 --

unpatched client and server:
Connect count: 3
read: IOPS=84.3k, 50.00th=[ 1467], 99.99th=[10028]
write: IOPS=36.1k, 50.00th=[ 1565], 99.99th=[20579]

patched client, unpatched server:
Connect count: 2
read: IOPS=75.4k, 50.00th=[ 1647], 99.99th=[ 7111]
write: IOPS=32.3k, 50.00th=[ 1745], 99.99th=[ 7439]

unpatched client, patched server:
Connect count: 1
read: IOPS=84.1k, 50.00th=[ 1467], 99.99th=[ 8717]
write: IOPS=36.1k, 50.00th=[ 1582], 99.99th=[ 9241]

patched client and server:
Connect count: 1
read: IOPS=74.9k, 50.00th=[ 1663], 99.99th=[ 7046]
write: IOPS=31.0k, 50.00th=[ 1762], 99.99th=[ 7242]

-- krb5i --

unpatched client and server:
Connect count: 6
read: IOPS=35.8k, 50.00th=[ 3228], 99.99th=[49546]
write: IOPS=15.4k, 50.00th=[ 3294], 99.99th=[50594]

patched client, unpatched server:
Connect count: 5
read: IOPS=36.3k, 50.00th=[ 3228], 99.99th=[14877]
write: IOPS=15.5k, 50.00th=[ 3294], 99.99th=[15139]

unpatched client, patched server:
Connect count: 3
read: IOPS=35.7k, 50.00th=[ 3228], 99.99th=[15926]
write: IOPS=15.2k, 50.00th=[ 3294], 99.99th=[15926]

patched client and server:
Connect count: 3
read: IOPS=36.3k, 50.00th=[ 3195], 99.99th=[15139]
write: IOPS=15.5k, 50.00th=[ 3261], 99.99th=[15270]

The good news:
Both results show that I/O tail latency improves significantly when
either the client or the server has this patch applied.

The bad news:
The krb5 performance result shows an IOPS regression when the client
has this patch applied.

So now I'm trying to understand how to come up with a solution that
prevents the rescheduling/connection loss issue without also
causing a performance regression.

Any thoughts/comments/advice appreciated.

---

Chuck Lever (1):
SUNRPC: crypto calls should never schedule

--
Chuck Lever

2020-05-04 19:10:58

by Chuck Lever

[permalink] [raw]

Subject: Re: [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts

> On May 4, 2020, at 3:07 PM, [email protected] wrote:
>
> On Fri, May 01, 2020 at 03:04:00PM -0400, Chuck Lever wrote:
>> Over the past year or so I've observed that using sec=krb5i or
>> sec=krb5p with NFS/RDMA while testing my Linux NFS client and server
>> results in frequent connection loss. I've been able to pin this down
>> a bit.
>>
>> The problem is that SUNRPC's calls into the kernel crypto functions
>> can sometimes reschedule. If that happens between the time the
>> server receives an RPC Call and the time it has unwrapped the Call's
>> GSS sequence number, the number is very likely to have fallen
>> outside the GSS context's sequence number window. When that happens,
>> the server is required to drop the Call, and therefore also the
>> connection.
>
> Does it help to increase GSS_SEQ_WIN? I think you could even increase
> it by a couple orders of magnitudes without getting into higher-order
> allocations.

Increasing GSS_SEQ_WIN reduces the frequency of connection loss events,
but does not close the race window. It's a workaround -- the server
scheduler can delay incoming requests arbitrarily, so there will
always be a race opportunity here unless incoming requests are heavily
serialized.

> --b.
>
>>
>> I've tested this observation by applying the following naive patch to
>> both my client and server, and got the following results.
>>
>> I. Is this an effective fix?
>>
>> With sec=krb5p,proto=rdma, I ran a 12-thread git regression test
>> (cd git/; make -j12 test).
>>
>> Without this patch on the server, over 3,000 connection loss events
>> are observed. With it, the test runs on a single connection. Clearly
>> the server needs to have some kind of mitigation in this area.
>>
>>
>> II. Does the fix cause a performance regression?
>>
>> Because this patch affects both the client and server paths, I
>> tested the performance differences between applying the patch in
>> various combinations and with both sec=krb5 (which checksums just
>> the RPC message header) and krb5i (which checksums the whole RPC
>> message.
>>
>> fio 8KiB 70% read, 30% write for 30 seconds, NFSv3 on RDMA.
>>
>> -- krb5 --
>>
>> unpatched client and server:
>> Connect count: 3
>> read: IOPS=84.3k, 50.00th=[ 1467], 99.99th=[10028]
>> write: IOPS=36.1k, 50.00th=[ 1565], 99.99th=[20579]
>>
>> patched client, unpatched server:
>> Connect count: 2
>> read: IOPS=75.4k, 50.00th=[ 1647], 99.99th=[ 7111]
>> write: IOPS=32.3k, 50.00th=[ 1745], 99.99th=[ 7439]
>>
>> unpatched client, patched server:
>> Connect count: 1
>> read: IOPS=84.1k, 50.00th=[ 1467], 99.99th=[ 8717]
>> write: IOPS=36.1k, 50.00th=[ 1582], 99.99th=[ 9241]
>>
>> patched client and server:
>> Connect count: 1
>> read: IOPS=74.9k, 50.00th=[ 1663], 99.99th=[ 7046]
>> write: IOPS=31.0k, 50.00th=[ 1762], 99.99th=[ 7242]
>>
>> -- krb5i --
>>
>> unpatched client and server:
>> Connect count: 6
>> read: IOPS=35.8k, 50.00th=[ 3228], 99.99th=[49546]
>> write: IOPS=15.4k, 50.00th=[ 3294], 99.99th=[50594]
>>
>> patched client, unpatched server:
>> Connect count: 5
>> read: IOPS=36.3k, 50.00th=[ 3228], 99.99th=[14877]
>> write: IOPS=15.5k, 50.00th=[ 3294], 99.99th=[15139]
>>
>> unpatched client, patched server:
>> Connect count: 3
>> read: IOPS=35.7k, 50.00th=[ 3228], 99.99th=[15926]
>> write: IOPS=15.2k, 50.00th=[ 3294], 99.99th=[15926]
>>
>> patched client and server:
>> Connect count: 3
>> read: IOPS=36.3k, 50.00th=[ 3195], 99.99th=[15139]
>> write: IOPS=15.5k, 50.00th=[ 3261], 99.99th=[15270]
>>
>>
>> The good news:
>> Both results show that I/O tail latency improves significantly when
>> either the client or the server has this patch applied.
>>
>> The bad news:
>> The krb5 performance result shows an IOPS regression when the client
>> has this patch applied.
>>
>>
>> So now I'm trying to understand how to come up with a solution that
>> prevents the rescheduling/connection loss issue without also
>> causing a performance regression.
>>
>> Any thoughts/comments/advice appreciated.
>>
>> ---
>>
>> Chuck Lever (1):
>> SUNRPC: crypto calls should never schedule
>>
>> --
>> Chuck Lever

--
Chuck Lever