2018-11-30 17:08:51

by Chuck Lever III

[permalink] [raw]
Subject: kerberos with v4.20-rc4 heads up

Hi-

I'm testing krb5/krb5i/krb5p with stock v4.20-rc4 and NFS/TCP.
The workload is synthetic:

/home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r8k -t16 -c

The client is a 12-core Xeon system.


I'm seeing all kinds of symptoms:

- memory leaks: bvec and enc_pages are leaking

- EBADMSG is reported to user space

- data corruption

- connect deadlocks resulting in a mount hang

- invalid soft IRQ receive buffer warnings


--
Chuck Lever





2018-11-30 18:40:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: kerberos with v4.20-rc4 heads up

On Fri, 2018-11-30 at 12:08 -0500, Chuck Lever wrote:
> Hi-
>
> I'm testing krb5/krb5i/krb5p with stock v4.20-rc4 and NFS/TCP.
> The workload is synthetic:
>
> /home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r8k -t16 -c
>
> The client is a 12-core Xeon system.
>
>
> I'm seeing all kinds of symptoms:
>
> - memory leaks: bvec and enc_pages are leaking
>
> - EBADMSG is reported to user space
>
> - data corruption
>
> - connect deadlocks resulting in a mount hang
>
> - invalid soft IRQ receive buffer warnings
>

Does the following patch help?

8<----------------------------------------------
From 8ff4cd9f0f6912e14f657371b6b7eecf6d2091ee Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Fri, 30 Nov 2018 12:48:47 -0500
Subject: [PATCH] SUNRPC: call_connect_status() must handle tasks that got
transmitted

If a task failed to get the write lock in the call to xprt_connect(), then
it will be queued on xprt->sending. In that case, it is possible for it
to get transmitted before the call to call_connect_status(), in which
case it needs to be handled by call_transmit_status() instead.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/clnt.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index ae3b8145da35..e35d642558e7 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1915,6 +1915,13 @@ call_connect_status(struct rpc_task *task)
struct rpc_clnt *clnt = task->tk_client;
int status = task->tk_status;

+ /* Check if the task was already transmitted */
+ if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) {
+ xprt_end_transmit(task);
+ task->tk_action = call_transmit_status;
+ return;
+ }
+
dprint_status(task);

trace_rpc_connect_status(task);
--
2.19.2

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2018-11-30 20:36:51

by Chuck Lever III

[permalink] [raw]
Subject: Re: kerberos with v4.20-rc4 heads up



> On Nov 30, 2018, at 1:40 PM, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2018-11-30 at 12:08 -0500, Chuck Lever wrote:
>> Hi-
>>
>> I'm testing krb5/krb5i/krb5p with stock v4.20-rc4 and NFS/TCP.
>> The workload is synthetic:
>>
>> /home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r8k -t16 -c
>>
>> The client is a 12-core Xeon system.
>>
>>
>> I'm seeing all kinds of symptoms:
>>
>> - memory leaks: bvec and enc_pages are leaking
>>
>> - EBADMSG is reported to user space
>>
>> - data corruption
>>
>> - connect deadlocks resulting in a mount hang
>>
>> - invalid soft IRQ receive buffer warnings
>>
>
> Does the following patch help?

With this patch applied I am able to reproduce at least the
soft IRQ warnings and the connect deadlock. There is possibly
more than one bug.

I've found and fixed the enc_pages leak. Patch forthcoming.


> 8<----------------------------------------------
> From 8ff4cd9f0f6912e14f657371b6b7eecf6d2091ee Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Fri, 30 Nov 2018 12:48:47 -0500
> Subject: [PATCH] SUNRPC: call_connect_status() must handle tasks that got
> transmitted
>
> If a task failed to get the write lock in the call to xprt_connect(), then
> it will be queued on xprt->sending. In that case, it is possible for it
> to get transmitted before the call to call_connect_status(), in which
> case it needs to be handled by call_transmit_status() instead.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> net/sunrpc/clnt.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index ae3b8145da35..e35d642558e7 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1915,6 +1915,13 @@ call_connect_status(struct rpc_task *task)
> struct rpc_clnt *clnt = task->tk_client;
> int status = task->tk_status;
>
> + /* Check if the task was already transmitted */
> + if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) {
> + xprt_end_transmit(task);
> + task->tk_action = call_transmit_status;
> + return;
> + }
> +
> dprint_status(task);
>
> trace_rpc_connect_status(task);
> --
> 2.19.2
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]

--
Chuck Lever