2019-03-12 14:30:02

by Olga Kornievskaia

[permalink] [raw]
Subject: oops on a sunrpc tracepoint

Hi Chuck,

Is this a known issue?

[ 5087.845704] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000004
[ 5087.848463] #PF error: [normal kernel read fault]
[ 5087.849852] PGD 0 P4D 0
[ 5087.850618] Oops: 0000 [#1] SMP PTI
[ 5087.851577] CPU: 0 PID: 5964 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
[ 5087.853388] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 5087.860280] RIP: 0010:trace_event_raw_event_xprt_enq_xmit+0x6f/0xf0 [sunrpc]
[ 5087.862149] Code: 00 00 00 48 89 ee 48 89 e7 e8 7d 69 65 ef 48 85
c0 74 4a 41 0f b7 94 24 e0 00 00 00 48 89 e7 89 50 08 49 8b 94 24 a8
00 00 00 <8b> 52 04 89 50 0c 49 8b 94 24 c0 00 00 00 8b 92 a8 00 00 00
0f ca
[ 5087.866380] RSP: 0018:ffffc03c43297cf8 EFLAGS: 00010286
[ 5087.867797] RAX: ffff9ef936d134a0 RBX: 0000000000000003 RCX: 0000000000000000
[ 5087.869456] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc03c43297cf8
[ 5087.871092] RBP: ffff9ef92f5b28f0 R08: 0000000000000246 R09: 0000000000000000
[ 5087.872767] R10: fffff11380b706c0 R11: 0000000000000000 R12: ffff9ef9338bda00
[ 5087.874659] R13: 0000000000000004 R14: ffff9ef932e813f0 R15: ffff9ef932e81000
[ 5087.876538] FS: 0000000000000000(0000) GS:ffff9ef93bc00000(0000)
knlGS:0000000000000000
[ 5087.879404] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5087.881202] CR2: 0000000000000004 CR3: 000000000f0f2002 CR4: 00000000001606f0
[ 5087.883331] Call Trace:
[ 5087.885770] xprt_request_enqueue_transmit+0x2b6/0x4a0 [sunrpc]
[ 5087.887341] ? rpc_task_need_encode+0x40/0x40 [sunrpc]
[ 5087.888710] call_bc_transmit+0xd5/0x170 [sunrpc]
[ 5087.889880] __rpc_execute+0x7e/0x3f0 [sunrpc]
[ 5087.891022] rpc_run_bc_task+0x78/0xd0 [sunrpc]
[ 5087.892097] bc_svc_process+0x281/0x340 [sunrpc]
[ 5087.895456] nfs41_callback_svc+0x130/0x1c0 [nfsv4]
[ 5087.897018] ? remove_wait_queue+0x60/0x60
[ 5087.898038] kthread+0xf5/0x130
[ 5087.899531] ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
[ 5087.900912] ? kthread_bind+0x10/0x10
[ 5087.902438] ret_from_fork+0x1f/0x30


2019-03-12 14:33:27

by Chuck Lever III

[permalink] [raw]
Subject: Re: oops on a sunrpc tracepoint



> On Mar 12, 2019, at 10:29 AM, Olga Kornievskaia <[email protected]> wrote:
>
> Hi Chuck,
>
> Is this a known issue?

Nope.


> [ 5087.845704] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000004
> [ 5087.848463] #PF error: [normal kernel read fault]
> [ 5087.849852] PGD 0 P4D 0
> [ 5087.850618] Oops: 0000 [#1] SMP PTI
> [ 5087.851577] CPU: 0 PID: 5964 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
> [ 5087.853388] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> [ 5087.860280] RIP: 0010:trace_event_raw_event_xprt_enq_xmit+0x6f/0xf0 [sunrpc]
> [ 5087.862149] Code: 00 00 00 48 89 ee 48 89 e7 e8 7d 69 65 ef 48 85
> c0 74 4a 41 0f b7 94 24 e0 00 00 00 48 89 e7 89 50 08 49 8b 94 24 a8
> 00 00 00 <8b> 52 04 89 50 0c 49 8b 94 24 c0 00 00 00 8b 92 a8 00 00 00
> 0f ca
> [ 5087.866380] RSP: 0018:ffffc03c43297cf8 EFLAGS: 00010286
> [ 5087.867797] RAX: ffff9ef936d134a0 RBX: 0000000000000003 RCX: 0000000000000000
> [ 5087.869456] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc03c43297cf8
> [ 5087.871092] RBP: ffff9ef92f5b28f0 R08: 0000000000000246 R09: 0000000000000000
> [ 5087.872767] R10: fffff11380b706c0 R11: 0000000000000000 R12: ffff9ef9338bda00
> [ 5087.874659] R13: 0000000000000004 R14: ffff9ef932e813f0 R15: ffff9ef932e81000
> [ 5087.876538] FS: 0000000000000000(0000) GS:ffff9ef93bc00000(0000)
> knlGS:0000000000000000
> [ 5087.879404] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5087.881202] CR2: 0000000000000004 CR3: 000000000f0f2002 CR4: 00000000001606f0
> [ 5087.883331] Call Trace:
> [ 5087.885770] xprt_request_enqueue_transmit+0x2b6/0x4a0 [sunrpc]
> [ 5087.887341] ? rpc_task_need_encode+0x40/0x40 [sunrpc]
> [ 5087.888710] call_bc_transmit+0xd5/0x170 [sunrpc]
> [ 5087.889880] __rpc_execute+0x7e/0x3f0 [sunrpc]
> [ 5087.891022] rpc_run_bc_task+0x78/0xd0 [sunrpc]
> [ 5087.892097] bc_svc_process+0x281/0x340 [sunrpc]
> [ 5087.895456] nfs41_callback_svc+0x130/0x1c0 [nfsv4]
> [ 5087.897018] ? remove_wait_queue+0x60/0x60
> [ 5087.898038] kthread+0xf5/0x130
> [ 5087.899531] ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
> [ 5087.900912] ? kthread_bind+0x10/0x10
> [ 5087.902438] ret_from_fork+0x1f/0x30

--
Chuck Lever




2019-03-12 14:36:26

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: oops on a sunrpc tracepoint

On Tue, Mar 12, 2019 at 10:33 AM Chuck Lever <[email protected]> wrote:
>
>
>
> > On Mar 12, 2019, at 10:29 AM, Olga Kornievskaia <[email protected]> wrote:
> >
> > Hi Chuck,
> >
> > Is this a known issue?
>
> Nope.

Ok.
Looks like anything to do with callback channeling trying to use the
trace_event_raw_event_xprt_transmit....
Here's another one:

localhost login: [ 467.462259] BUG: unable to handle kernel NULL
pointer dereference at 0000000000000004
[ 467.464411] #PF error: [normal kernel read fault]
[ 467.465445] PGD 80000000728c1067 P4D 80000000728c1067 PUD 728c0067 PMD 0
[ 467.466980] Oops: 0000 [#1] SMP PTI
[ 467.467759] CPU: 0 PID: 3517 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
[ 467.469393] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 467.471840] RIP: 0010:trace_event_raw_event_xprt_transmit+0x7c/0xf0 [sunrpc]
[ 467.473392] Code: f6 48 85 c0 74 4b 49 8b 94 24 98 00 00 00 48 89
e7 0f b7 92 e0 00 00 00 89 50 08 49 8b 94 24 98 00 00 00 48 8b 92 a8
00 00 00 <8b> 52 04 89 50 0c 41 8b 94 24 a8 00 00 00 0f ca 89 50 10 41
8b 94
[ 467.477605] RSP: 0018:ffffabe7434fbcd0 EFLAGS: 00010282
[ 467.478793] RAX: ffff99720fc3bce0 RBX: 0000000000000003 RCX: 0000000000000000
[ 467.480409] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffabe7434fbcd0
[ 467.482011] RBP: ffff99726f631948 R08: 0000000000000246 R09: 0000000000000000
[ 467.483591] R10: 0000000070000000 R11: 0000000000000000 R12: ffff997277dfcc00
[ 467.485226] R13: 0000000000000000 R14: 0000000000000000 R15: ffff99722fecdca8
[ 467.486830] FS: 0000000000000000(0000) GS:ffff99727bc00000(0000)
knlGS:0000000000000000
[ 467.488596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 467.489931] CR2: 0000000000000004 CR3: 00000000270e6006 CR4: 00000000001606f0
[ 467.491559] Call Trace:
[ 467.492128] xprt_transmit+0x303/0x3f0 [sunrpc]
[ 467.493143] ? rpc_task_need_encode+0x40/0x40 [sunrpc]
[ 467.494328] call_bc_transmit+0x49/0x170 [sunrpc]
[ 467.495379] __rpc_execute+0x7e/0x3f0 [sunrpc]
[ 467.496451] rpc_run_bc_task+0x78/0xd0 [sunrpc]
[ 467.497467] bc_svc_process+0x281/0x340 [sunrpc]
[ 467.498507] nfs41_callback_svc+0x130/0x1c0 [nfsv4]
[ 467.499751] ? remove_wait_queue+0x60/0x60
[ 467.500686] kthread+0xf5/0x130
[ 467.501438] ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
[ 467.502640] ? kthread_bind+0x10/0x10
[ 467.503454] ret_from_fork+0x1f/0x30


>
>
> > [ 5087.845704] BUG: unable to handle kernel NULL pointer dereference
> > at 0000000000000004
> > [ 5087.848463] #PF error: [normal kernel read fault]
> > [ 5087.849852] PGD 0 P4D 0
> > [ 5087.850618] Oops: 0000 [#1] SMP PTI
> > [ 5087.851577] CPU: 0 PID: 5964 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
> > [ 5087.853388] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> > [ 5087.860280] RIP: 0010:trace_event_raw_event_xprt_enq_xmit+0x6f/0xf0 [sunrpc]
> > [ 5087.862149] Code: 00 00 00 48 89 ee 48 89 e7 e8 7d 69 65 ef 48 85
> > c0 74 4a 41 0f b7 94 24 e0 00 00 00 48 89 e7 89 50 08 49 8b 94 24 a8
> > 00 00 00 <8b> 52 04 89 50 0c 49 8b 94 24 c0 00 00 00 8b 92 a8 00 00 00
> > 0f ca
> > [ 5087.866380] RSP: 0018:ffffc03c43297cf8 EFLAGS: 00010286
> > [ 5087.867797] RAX: ffff9ef936d134a0 RBX: 0000000000000003 RCX: 0000000000000000
> > [ 5087.869456] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc03c43297cf8
> > [ 5087.871092] RBP: ffff9ef92f5b28f0 R08: 0000000000000246 R09: 0000000000000000
> > [ 5087.872767] R10: fffff11380b706c0 R11: 0000000000000000 R12: ffff9ef9338bda00
> > [ 5087.874659] R13: 0000000000000004 R14: ffff9ef932e813f0 R15: ffff9ef932e81000
> > [ 5087.876538] FS: 0000000000000000(0000) GS:ffff9ef93bc00000(0000)
> > knlGS:0000000000000000
> > [ 5087.879404] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 5087.881202] CR2: 0000000000000004 CR3: 000000000f0f2002 CR4: 00000000001606f0
> > [ 5087.883331] Call Trace:
> > [ 5087.885770] xprt_request_enqueue_transmit+0x2b6/0x4a0 [sunrpc]
> > [ 5087.887341] ? rpc_task_need_encode+0x40/0x40 [sunrpc]
> > [ 5087.888710] call_bc_transmit+0xd5/0x170 [sunrpc]
> > [ 5087.889880] __rpc_execute+0x7e/0x3f0 [sunrpc]
> > [ 5087.891022] rpc_run_bc_task+0x78/0xd0 [sunrpc]
> > [ 5087.892097] bc_svc_process+0x281/0x340 [sunrpc]
> > [ 5087.895456] nfs41_callback_svc+0x130/0x1c0 [nfsv4]
> > [ 5087.897018] ? remove_wait_queue+0x60/0x60
> > [ 5087.898038] kthread+0xf5/0x130
> > [ 5087.899531] ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
> > [ 5087.900912] ? kthread_bind+0x10/0x10
> > [ 5087.902438] ret_from_fork+0x1f/0x30
>
> --
> Chuck Lever
>
>
>

2019-03-12 14:47:23

by Chuck Lever III

[permalink] [raw]
Subject: Re: oops on a sunrpc tracepoint



> On Mar 12, 2019, at 10:36 AM, Olga Kornievskaia <[email protected]> wrote:
>
> On Tue, Mar 12, 2019 at 10:33 AM Chuck Lever <[email protected]> wrote:
>>
>>
>>
>>> On Mar 12, 2019, at 10:29 AM, Olga Kornievskaia <[email protected]> wrote:
>>>
>>> Hi Chuck,
>>>
>>> Is this a known issue?
>>
>> Nope.
>
> Ok.
> Looks like anything to do with callback channeling trying to use the
> trace_event_raw_event_xprt_transmit....

Callback rqst's are missing either an rq_task or rq_task->tk_client
(I can't remember at the moment). It would be nicer for the forward
channel infrastructure if such rqst's were complete...

Otherwise, trace_xprt_transmit and trace_xprt_enq_xmit need to check
if task and task->tk_client are present before dereferencing them.


> Here's another one:
>
> localhost login: [ 467.462259] BUG: unable to handle kernel NULL
> pointer dereference at 0000000000000004
> [ 467.464411] #PF error: [normal kernel read fault]
> [ 467.465445] PGD 80000000728c1067 P4D 80000000728c1067 PUD 728c0067 PMD 0
> [ 467.466980] Oops: 0000 [#1] SMP PTI
> [ 467.467759] CPU: 0 PID: 3517 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
> [ 467.469393] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> [ 467.471840] RIP: 0010:trace_event_raw_event_xprt_transmit+0x7c/0xf0 [sunrpc]
> [ 467.473392] Code: f6 48 85 c0 74 4b 49 8b 94 24 98 00 00 00 48 89
> e7 0f b7 92 e0 00 00 00 89 50 08 49 8b 94 24 98 00 00 00 48 8b 92 a8
> 00 00 00 <8b> 52 04 89 50 0c 41 8b 94 24 a8 00 00 00 0f ca 89 50 10 41
> 8b 94
> [ 467.477605] RSP: 0018:ffffabe7434fbcd0 EFLAGS: 00010282
> [ 467.478793] RAX: ffff99720fc3bce0 RBX: 0000000000000003 RCX: 0000000000000000
> [ 467.480409] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffabe7434fbcd0
> [ 467.482011] RBP: ffff99726f631948 R08: 0000000000000246 R09: 0000000000000000
> [ 467.483591] R10: 0000000070000000 R11: 0000000000000000 R12: ffff997277dfcc00
> [ 467.485226] R13: 0000000000000000 R14: 0000000000000000 R15: ffff99722fecdca8
> [ 467.486830] FS: 0000000000000000(0000) GS:ffff99727bc00000(0000)
> knlGS:0000000000000000
> [ 467.488596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 467.489931] CR2: 0000000000000004 CR3: 00000000270e6006 CR4: 00000000001606f0
> [ 467.491559] Call Trace:
> [ 467.492128] xprt_transmit+0x303/0x3f0 [sunrpc]
> [ 467.493143] ? rpc_task_need_encode+0x40/0x40 [sunrpc]
> [ 467.494328] call_bc_transmit+0x49/0x170 [sunrpc]
> [ 467.495379] __rpc_execute+0x7e/0x3f0 [sunrpc]
> [ 467.496451] rpc_run_bc_task+0x78/0xd0 [sunrpc]
> [ 467.497467] bc_svc_process+0x281/0x340 [sunrpc]
> [ 467.498507] nfs41_callback_svc+0x130/0x1c0 [nfsv4]
> [ 467.499751] ? remove_wait_queue+0x60/0x60
> [ 467.500686] kthread+0xf5/0x130
> [ 467.501438] ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
> [ 467.502640] ? kthread_bind+0x10/0x10
> [ 467.503454] ret_from_fork+0x1f/0x30
>
>
>>
>>
>>> [ 5087.845704] BUG: unable to handle kernel NULL pointer dereference
>>> at 0000000000000004
>>> [ 5087.848463] #PF error: [normal kernel read fault]
>>> [ 5087.849852] PGD 0 P4D 0
>>> [ 5087.850618] Oops: 0000 [#1] SMP PTI
>>> [ 5087.851577] CPU: 0 PID: 5964 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
>>> [ 5087.853388] Hardware name: VMware, Inc. VMware Virtual
>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
>>> [ 5087.860280] RIP: 0010:trace_event_raw_event_xprt_enq_xmit+0x6f/0xf0 [sunrpc]
>>> [ 5087.862149] Code: 00 00 00 48 89 ee 48 89 e7 e8 7d 69 65 ef 48 85
>>> c0 74 4a 41 0f b7 94 24 e0 00 00 00 48 89 e7 89 50 08 49 8b 94 24 a8
>>> 00 00 00 <8b> 52 04 89 50 0c 49 8b 94 24 c0 00 00 00 8b 92 a8 00 00 00
>>> 0f ca
>>> [ 5087.866380] RSP: 0018:ffffc03c43297cf8 EFLAGS: 00010286
>>> [ 5087.867797] RAX: ffff9ef936d134a0 RBX: 0000000000000003 RCX: 0000000000000000
>>> [ 5087.869456] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc03c43297cf8
>>> [ 5087.871092] RBP: ffff9ef92f5b28f0 R08: 0000000000000246 R09: 0000000000000000
>>> [ 5087.872767] R10: fffff11380b706c0 R11: 0000000000000000 R12: ffff9ef9338bda00
>>> [ 5087.874659] R13: 0000000000000004 R14: ffff9ef932e813f0 R15: ffff9ef932e81000
>>> [ 5087.876538] FS: 0000000000000000(0000) GS:ffff9ef93bc00000(0000)
>>> knlGS:0000000000000000
>>> [ 5087.879404] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 5087.881202] CR2: 0000000000000004 CR3: 000000000f0f2002 CR4: 00000000001606f0
>>> [ 5087.883331] Call Trace:
>>> [ 5087.885770] xprt_request_enqueue_transmit+0x2b6/0x4a0 [sunrpc]
>>> [ 5087.887341] ? rpc_task_need_encode+0x40/0x40 [sunrpc]
>>> [ 5087.888710] call_bc_transmit+0xd5/0x170 [sunrpc]
>>> [ 5087.889880] __rpc_execute+0x7e/0x3f0 [sunrpc]
>>> [ 5087.891022] rpc_run_bc_task+0x78/0xd0 [sunrpc]
>>> [ 5087.892097] bc_svc_process+0x281/0x340 [sunrpc]
>>> [ 5087.895456] nfs41_callback_svc+0x130/0x1c0 [nfsv4]
>>> [ 5087.897018] ? remove_wait_queue+0x60/0x60
>>> [ 5087.898038] kthread+0xf5/0x130
>>> [ 5087.899531] ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
>>> [ 5087.900912] ? kthread_bind+0x10/0x10
>>> [ 5087.902438] ret_from_fork+0x1f/0x30
>>
>> --
>> Chuck Lever
>>
>>
>>

--
Chuck Lever




2019-03-12 18:41:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: oops on a sunrpc tracepoint

On Tue, 2019-03-12 at 10:47 -0400, Chuck Lever wrote:
> > On Mar 12, 2019, at 10:36 AM, Olga Kornievskaia <[email protected]>
> > wrote:
> >
> > On Tue, Mar 12, 2019 at 10:33 AM Chuck Lever <
> > [email protected]> wrote:
> > >
> > >
> > > > On Mar 12, 2019, at 10:29 AM, Olga Kornievskaia <[email protected]
> > > > > wrote:
> > > >
> > > > Hi Chuck,
> > > >
> > > > Is this a known issue?
> > >
> > > Nope.
> >
> > Ok.
> > Looks like anything to do with callback channeling trying to use
> > the
> > trace_event_raw_event_xprt_transmit....
>
> Callback rqst's are missing either an rq_task or rq_task->tk_client
> (I can't remember at the moment). It would be nicer for the forward
> channel infrastructure if such rqst's were complete...
>
> Otherwise, trace_xprt_transmit and trace_xprt_enq_xmit need to check
> if task and task->tk_client are present before dereferencing them.

Back channel tasks have no associated task->tk_client. They are owned
by a server task, not by a client.

In either case, the transport code should avoid concerning itself with
the client structure. That's a higher layer object.

> > Here's another one:
> >
> > localhost login: [ 467.462259] BUG: unable to handle kernel NULL
> > pointer dereference at 0000000000000004
> > [ 467.464411] #PF error: [normal kernel read fault]
> > [ 467.465445] PGD 80000000728c1067 P4D 80000000728c1067 PUD
> > 728c0067 PMD 0
> > [ 467.466980] Oops: 0000 [#1] SMP PTI
> > [ 467.467759] CPU: 0 PID: 3517 Comm: NFSv4 callback Not tainted
> > 5.0.0-rc7+ #1
> > [ 467.469393] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> > [ 467.471840] RIP:
> > 0010:trace_event_raw_event_xprt_transmit+0x7c/0xf0 [sunrpc]
> > [ 467.473392] Code: f6 48 85 c0 74 4b 49 8b 94 24 98 00 00 00 48
> > 89
> > e7 0f b7 92 e0 00 00 00 89 50 08 49 8b 94 24 98 00 00 00 48 8b 92
> > a8
> > 00 00 00 <8b> 52 04 89 50 0c 41 8b 94 24 a8 00 00 00 0f ca 89 50 10
> > 41
> > 8b 94
> > [ 467.477605] RSP: 0018:ffffabe7434fbcd0 EFLAGS: 00010282
> > [ 467.478793] RAX: ffff99720fc3bce0 RBX: 0000000000000003 RCX:
> > 0000000000000000
> > [ 467.480409] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > ffffabe7434fbcd0
> > [ 467.482011] RBP: ffff99726f631948 R08: 0000000000000246 R09:
> > 0000000000000000
> > [ 467.483591] R10: 0000000070000000 R11: 0000000000000000 R12:
> > ffff997277dfcc00
> > [ 467.485226] R13: 0000000000000000 R14: 0000000000000000 R15:
> > ffff99722fecdca8
> > [ 467.486830] FS: 0000000000000000(0000)
> > GS:ffff99727bc00000(0000)
> > knlGS:0000000000000000
> > [ 467.488596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 467.489931] CR2: 0000000000000004 CR3: 00000000270e6006 CR4:
> > 00000000001606f0
> > [ 467.491559] Call Trace:
> > [ 467.492128] xprt_transmit+0x303/0x3f0 [sunrpc]
> > [ 467.493143] ? rpc_task_need_encode+0x40/0x40 [sunrpc]
> > [ 467.494328] call_bc_transmit+0x49/0x170 [sunrpc]
> > [ 467.495379] __rpc_execute+0x7e/0x3f0 [sunrpc]
> > [ 467.496451] rpc_run_bc_task+0x78/0xd0 [sunrpc]
> > [ 467.497467] bc_svc_process+0x281/0x340 [sunrpc]
> > [ 467.498507] nfs41_callback_svc+0x130/0x1c0 [nfsv4]
> > [ 467.499751] ? remove_wait_queue+0x60/0x60
> > [ 467.500686] kthread+0xf5/0x130
> > [ 467.501438] ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
> > [ 467.502640] ? kthread_bind+0x10/0x10
> > [ 467.503454] ret_from_fork+0x1f/0x30
> >
> >
> > >
> > > > [ 5087.845704] BUG: unable to handle kernel NULL pointer
> > > > dereference
> > > > at 0000000000000004
> > > > [ 5087.848463] #PF error: [normal kernel read fault]
> > > > [ 5087.849852] PGD 0 P4D 0
> > > > [ 5087.850618] Oops: 0000 [#1] SMP PTI
> > > > [ 5087.851577] CPU: 0 PID: 5964 Comm: NFSv4 callback Not
> > > > tainted 5.0.0-rc7+ #1
> > > > [ 5087.853388] Hardware name: VMware, Inc. VMware Virtual
> > > > Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> > > > [ 5087.860280] RIP:
> > > > 0010:trace_event_raw_event_xprt_enq_xmit+0x6f/0xf0 [sunrpc]
> > > > [ 5087.862149] Code: 00 00 00 48 89 ee 48 89 e7 e8 7d 69 65 ef
> > > > 48 85
> > > > c0 74 4a 41 0f b7 94 24 e0 00 00 00 48 89 e7 89 50 08 49 8b 94
> > > > 24 a8
> > > > 00 00 00 <8b> 52 04 89 50 0c 49 8b 94 24 c0 00 00 00 8b 92 a8
> > > > 00 00 00
> > > > 0f ca
> > > > [ 5087.866380] RSP: 0018:ffffc03c43297cf8 EFLAGS: 00010286
> > > > [ 5087.867797] RAX: ffff9ef936d134a0 RBX: 0000000000000003 RCX:
> > > > 0000000000000000
> > > > [ 5087.869456] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > > > ffffc03c43297cf8
> > > > [ 5087.871092] RBP: ffff9ef92f5b28f0 R08: 0000000000000246 R09:
> > > > 0000000000000000
> > > > [ 5087.872767] R10: fffff11380b706c0 R11: 0000000000000000 R12:
> > > > ffff9ef9338bda00
> > > > [ 5087.874659] R13: 0000000000000004 R14: ffff9ef932e813f0 R15:
> > > > ffff9ef932e81000
> > > > [ 5087.876538] FS: 0000000000000000(0000)
> > > > GS:ffff9ef93bc00000(0000)
> > > > knlGS:0000000000000000
> > > > [ 5087.879404] CS: 0010 DS: 0000 ES: 0000 CR0:
> > > > 0000000080050033
> > > > [ 5087.881202] CR2: 0000000000000004 CR3: 000000000f0f2002 CR4:
> > > > 00000000001606f0
> > > > [ 5087.883331] Call Trace:
> > > > [ 5087.885770] xprt_request_enqueue_transmit+0x2b6/0x4a0
> > > > [sunrpc]
> > > > [ 5087.887341] ? rpc_task_need_encode+0x40/0x40 [sunrpc]
> > > > [ 5087.888710] call_bc_transmit+0xd5/0x170 [sunrpc]
> > > > [ 5087.889880] __rpc_execute+0x7e/0x3f0 [sunrpc]
> > > > [ 5087.891022] rpc_run_bc_task+0x78/0xd0 [sunrpc]
> > > > [ 5087.892097] bc_svc_process+0x281/0x340 [sunrpc]
> > > > [ 5087.895456] nfs41_callback_svc+0x130/0x1c0 [nfsv4]
> > > > [ 5087.897018] ? remove_wait_queue+0x60/0x60
> > > > [ 5087.898038] kthread+0xf5/0x130
> > > > [ 5087.899531] ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
> > > > [ 5087.900912] ? kthread_bind+0x10/0x10
> > > > [ 5087.902438] ret_from_fork+0x1f/0x30
> > >
> > > --
> > > Chuck Lever
> > >
> > >
> > >
>
> --
> Chuck Lever
>
>
>
--
Trond Myklebust
CTO, Hammerspace Inc
4300 El Camino Real, Suite 105
Los Altos, CA 94022
http://www.hammer.space


2019-03-12 18:54:31

by Chuck Lever III

[permalink] [raw]
Subject: Re: oops on a sunrpc tracepoint



> On Mar 12, 2019, at 2:41 PM, Trond Myklebust <[email protected]> wrote:
>
> On Tue, 2019-03-12 at 10:47 -0400, Chuck Lever wrote:
>>> On Mar 12, 2019, at 10:36 AM, Olga Kornievskaia <[email protected]>
>>> wrote:
>>>
>>> On Tue, Mar 12, 2019 at 10:33 AM Chuck Lever <
>>> [email protected]> wrote:
>>>>
>>>>
>>>>> On Mar 12, 2019, at 10:29 AM, Olga Kornievskaia <[email protected]
>>>>>> wrote:
>>>>>
>>>>> Hi Chuck,
>>>>>
>>>>> Is this a known issue?
>>>>
>>>> Nope.
>>>
>>> Ok.
>>> Looks like anything to do with callback channeling trying to use
>>> the
>>> trace_event_raw_event_xprt_transmit....
>>
>> Callback rqst's are missing either an rq_task or rq_task->tk_client
>> (I can't remember at the moment). It would be nicer for the forward
>> channel infrastructure if such rqst's were complete...
>>
>> Otherwise, trace_xprt_transmit and trace_xprt_enq_xmit need to check
>> if task and task->tk_client are present before dereferencing them.
>
> Back channel tasks have no associated task->tk_client. They are owned
> by a server task, not by a client.

Understood, I'm simply suggesting that there could be a "phony"
client for callbacks so that other code doesn't have to check to
see if tk_client is populated before dereferencing it.


> In either case, the transport code should avoid concerning itself with
> the client structure. That's a higher layer object.

Right. The only reason the transport is concerning itself with
tk_client is the task:%u@%u identifier in trace points.

I would think that, for instance, someone adding an ad hoc kprobe
would want to avoid a fault here, but would have to know that
callback rqst's passing through the forward channel code do not
have a tk_client.


--
Chuck Lever