LinuxLists.cc - help with nfsd kernel oops

2022-08-11 01:04:28

Subject: help with nfsd kernel oops

Hi folks (Chuck/Jeff specifically),

We've had this outstanding kernel oops that happens (infrequently) in
copy_offload testing (stack trace in the end). I've been trying to
debug it for a while, added printks and such. I can hand-wavey explain
what I'm seeing but I need help (a) nailing down exactly the problem
and (b) get a helpful hint how to address it?

Ok so what happens. Test case: source file is opened, locked,
(blah-blah destination file), copy_notify to the source server, copy
is done (src dest), source file unlocked (etc dest file), files
closed. Copy/Copy_notify, uses a locking stateid.

When unlocking happens it triggers LOCKU and FREE_STATEID. Copy_notify
stateid is associated with the locking stateid on the server. When the
last reference on the locking stateid goes nfs4_put_stateid() also
calls nfs4_free_cpntf_statelist() which deals with cleaning up the
copy_notify stateid.

In the laundry thread, there is a failsafe that if for some reason the
copy_notify state was not cleaned up/expired, then it would be deleted
there.

However in the failing case, where everything should be cleaned up as
it's supposed to be, instead I see calling to put_ol_stateid_locked()
(before FREE_STATEID is processed) which cleans up the parent but
doesn't touch the copy_notify stateids so instead the laundry thread
runs and walks the copy_notify list (since it's not empty) and tries
to free the entries but that leads to this oops (since part of the
memory was freed by put_ol_stateid_locked() and parent stateid)?.

Perhaps the fix is to add the nfs4_free_cpntf_statelist() to
put_ol_stateid_locked() which I tried and it seems to work. But I'm
not confident about it.

Thoughts?

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fa67ecd5fe63..b988d3c4e5fd 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1408,6 +1408,7 @@ static void put_ol_stateid_locked(struct
nfs4_ol_stateid *stp,
}

idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
+ nfs4_free_cpntf_statelist(clp->net, s);
list_add(&stp->st_locks, reaplist);
}

[ 338.681529] ------------[ cut here ]------------
[ 338.683090] kernel BUG at lib/list_debug.c:53!
[ 338.684372] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[ 338.685977] CPU: 1 PID: 493 Comm: kworker/u256:27 Tainted: G B
5.19.0-rc6+ #104
[ 338.688266] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 338.691019] Workqueue: nfsd4 laundromat_main [nfsd]
[ 338.692224] RIP: 0010:__list_del_entry_valid.cold.3+0x3d/0x53
[ 338.693626] Code: 0b 4c 89 e1 4c 89 ee 48 c7 c7 e0 1a e3 8f e8 5b
60 fe ff 0f 0b 48 89 e9 4c 89 ea 48 89 de 48 c7 c7 60 1a e3 8f e8 44
60 fe ff <0f> 0b 48 89 ea 48 89 de 48 c7 c7 00 1a e3 8f e8 30 60 fe ff
0f 0b
[ 338.697651] RSP: 0018:ffff88800d03fc68 EFLAGS: 00010286
[ 338.698762] RAX: 000000000000006d RBX: ffff888028a14798 RCX: 0000000000000000
[ 338.700442] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff917e9240
[ 338.702257] RBP: ffff88801bb0ae90 R08: ffffed100a795f0e R09: ffffed100a795f0e
[ 338.704016] R10: ffff888053caf86b R11: ffffed100a795f0d R12: ffff88801bb0ae90
[ 338.705703] R13: d9c0013300000a39 R14: 000000000000005a R15: ffff88801b9f5800
[ 338.707510] FS: 0000000000000000(0000) GS:ffff888053c80000(0000)
knlGS:0000000000000000
[ 338.709319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 338.710715] CR2: 00005640baab74d0 CR3: 0000000017574005 CR4: 00000000001706e0
[ 338.712282] Call Trace:
[ 338.712898] <TASK>
[ 338.713430] _free_cpntf_state_locked+0x6b/0x120 [nfsd]
[ 338.714806] nfs4_laundromat+0x8ef/0xf30 [nfsd]
[ 338.716013] ? dequeue_entity+0x18b/0x6c0
[ 338.716970] ? release_lock_stateid+0x60/0x60 [nfsd]
[ 338.718169] ? _raw_spin_unlock+0x15/0x30
[ 338.719064] ? __switch_to+0x2fa/0x690
[ 338.719879] ? __schedule+0x67d/0xf20
[ 338.720678] laundromat_main+0x15/0x40 [nfsd]
[ 338.721760] process_one_work+0x3b4/0x6b0
[ 338.722629] worker_thread+0x5a/0x640
[ 338.723425] ? process_one_work+0x6b0/0x6b0
[ 338.724324] kthread+0x162/0x190
[ 338.725030] ? kthread_complete_and_exit+0x20/0x20
[ 338.726074] ret_from_fork+0x22/0x30
[ 338.726856] </TASK>

2022-08-11 10:48:06

by Jeff Layton

[permalink] [raw]

Subject: Re: help with nfsd kernel oops

On Wed, 2022-08-10 at 20:50 -0400, Olga Kornievskaia wrote:
> Hi folks (Chuck/Jeff specifically),
>
> We've had this outstanding kernel oops that happens (infrequently) in
> copy_offload testing (stack trace in the end). I've been trying to
> debug it for a while, added printks and such. I can hand-wavey explain
> what I'm seeing but I need help (a) nailing down exactly the problem
> and (b) get a helpful hint how to address it?
>
> Ok so what happens. Test case: source file is opened, locked,
> (blah-blah destination file), copy_notify to the source server, copy
> is done (src dest), source file unlocked (etc dest file), files
> closed. Copy/Copy_notify, uses a locking stateid.
>
> When unlocking happens it triggers LOCKU and FREE_STATEID. Copy_notify
> stateid is associated with the locking stateid on the server. When the
> last reference on the locking stateid goes nfs4_put_stateid() also
> calls nfs4_free_cpntf_statelist() which deals with cleaning up the
> copy_notify stateid.
>
> In the laundry thread, there is a failsafe that if for some reason the
> copy_notify state was not cleaned up/expired, then it would be deleted
> there.
>
> However in the failing case, where everything should be cleaned up as
> it's supposed to be, instead I see calling to put_ol_stateid_locked()
> (before FREE_STATEID is processed) which cleans up the parent but
> doesn't touch the copy_notify stateids so instead the laundry thread
> runs and walks the copy_notify list (since it's not empty) and tries
> to free the entries but that leads to this oops (since part of the
> memory was freed by put_ol_stateid_locked() and parent stateid)?.
>
> Perhaps the fix is to add the nfs4_free_cpntf_statelist() to
> put_ol_stateid_locked() which I tried and it seems to work. But I'm
> not confident about it.
>
> Thoughts?
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index fa67ecd5fe63..b988d3c4e5fd 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1408,6 +1408,7 @@ static void put_ol_stateid_locked(struct
> nfs4_ol_stateid *stp,
> }
>
> idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
> + nfs4_free_cpntf_statelist(clp->net, s);
> list_add(&stp->st_locks, reaplist);
> }
>
>
>
> [ 338.681529] ------------[ cut here ]------------
> [ 338.683090] kernel BUG at lib/list_debug.c:53!
> [ 338.684372] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> [ 338.685977] CPU: 1 PID: 493 Comm: kworker/u256:27 Tainted: G B
> 5.19.0-rc6+ #104
> [ 338.688266] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [ 338.691019] Workqueue: nfsd4 laundromat_main [nfsd]
> [ 338.692224] RIP: 0010:__list_del_entry_valid.cold.3+0x3d/0x53
> [ 338.693626] Code: 0b 4c 89 e1 4c 89 ee 48 c7 c7 e0 1a e3 8f e8 5b
> 60 fe ff 0f 0b 48 89 e9 4c 89 ea 48 89 de 48 c7 c7 60 1a e3 8f e8 44
> 60 fe ff <0f> 0b 48 89 ea 48 89 de 48 c7 c7 00 1a e3 8f e8 30 60 fe ff
> 0f 0b
> [ 338.697651] RSP: 0018:ffff88800d03fc68 EFLAGS: 00010286
> [ 338.698762] RAX: 000000000000006d RBX: ffff888028a14798 RCX: 0000000000000000
> [ 338.700442] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff917e9240
> [ 338.702257] RBP: ffff88801bb0ae90 R08: ffffed100a795f0e R09: ffffed100a795f0e
> [ 338.704016] R10: ffff888053caf86b R11: ffffed100a795f0d R12: ffff88801bb0ae90
> [ 338.705703] R13: d9c0013300000a39 R14: 000000000000005a R15: ffff88801b9f5800
> [ 338.707510] FS: 0000000000000000(0000) GS:ffff888053c80000(0000)
> knlGS:0000000000000000
> [ 338.709319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 338.710715] CR2: 00005640baab74d0 CR3: 0000000017574005 CR4: 00000000001706e0
> [ 338.712282] Call Trace:
> [ 338.712898] <TASK>
> [ 338.713430] _free_cpntf_state_locked+0x6b/0x120 [nfsd]
> [ 338.714806] nfs4_laundromat+0x8ef/0xf30 [nfsd]
> [ 338.716013] ? dequeue_entity+0x18b/0x6c0
> [ 338.716970] ? release_lock_stateid+0x60/0x60 [nfsd]
> [ 338.718169] ? _raw_spin_unlock+0x15/0x30
> [ 338.719064] ? __switch_to+0x2fa/0x690
> [ 338.719879] ? __schedule+0x67d/0xf20
> [ 338.720678] laundromat_main+0x15/0x40 [nfsd]
> [ 338.721760] process_one_work+0x3b4/0x6b0
> [ 338.722629] worker_thread+0x5a/0x640
> [ 338.723425] ? process_one_work+0x6b0/0x6b0
> [ 338.724324] kthread+0x162/0x190
> [ 338.725030] ? kthread_complete_and_exit+0x20/0x20
> [ 338.726074] ret_from_fork+0x22/0x30
> [ 338.726856] </TASK>

Does the kernel you're testing have Dai's patch that he posted on August
1st?

[PATCH v2] NFSD: fix use-after-free on source server when doing inter-server copy

This sounds very similar to the problem he was fixing.
--
Jeff Layton <[email protected]>

2022-08-11 13:23:57

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: help with nfsd kernel oops

On Thu, Aug 11, 2022 at 6:45 AM Jeff Layton <[email protected]> wrote:
>
> On Wed, 2022-08-10 at 20:50 -0400, Olga Kornievskaia wrote:
> > Hi folks (Chuck/Jeff specifically),
> >
> > We've had this outstanding kernel oops that happens (infrequently) in
> > copy_offload testing (stack trace in the end). I've been trying to
> > debug it for a while, added printks and such. I can hand-wavey explain
> > what I'm seeing but I need help (a) nailing down exactly the problem
> > and (b) get a helpful hint how to address it?
> >
> > Ok so what happens. Test case: source file is opened, locked,
> > (blah-blah destination file), copy_notify to the source server, copy
> > is done (src dest), source file unlocked (etc dest file), files
> > closed. Copy/Copy_notify, uses a locking stateid.
> >
> > When unlocking happens it triggers LOCKU and FREE_STATEID. Copy_notify
> > stateid is associated with the locking stateid on the server. When the
> > last reference on the locking stateid goes nfs4_put_stateid() also
> > calls nfs4_free_cpntf_statelist() which deals with cleaning up the
> > copy_notify stateid.
> >
> > In the laundry thread, there is a failsafe that if for some reason the
> > copy_notify state was not cleaned up/expired, then it would be deleted
> > there.
> >
> > However in the failing case, where everything should be cleaned up as
> > it's supposed to be, instead I see calling to put_ol_stateid_locked()
> > (before FREE_STATEID is processed) which cleans up the parent but
> > doesn't touch the copy_notify stateids so instead the laundry thread
> > runs and walks the copy_notify list (since it's not empty) and tries
> > to free the entries but that leads to this oops (since part of the
> > memory was freed by put_ol_stateid_locked() and parent stateid)?.
> >
> > Perhaps the fix is to add the nfs4_free_cpntf_statelist() to
> > put_ol_stateid_locked() which I tried and it seems to work. But I'm
> > not confident about it.
> >
> > Thoughts?
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index fa67ecd5fe63..b988d3c4e5fd 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1408,6 +1408,7 @@ static void put_ol_stateid_locked(struct
> > nfs4_ol_stateid *stp,
> > }
> >
> > idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
> > + nfs4_free_cpntf_statelist(clp->net, s);
> > list_add(&stp->st_locks, reaplist);
> > }
> >
> >
> >
> > [ 338.681529] ------------[ cut here ]------------
> > [ 338.683090] kernel BUG at lib/list_debug.c:53!
> > [ 338.684372] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> > [ 338.685977] CPU: 1 PID: 493 Comm: kworker/u256:27 Tainted: G B
> > 5.19.0-rc6+ #104
> > [ 338.688266] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [ 338.691019] Workqueue: nfsd4 laundromat_main [nfsd]
> > [ 338.692224] RIP: 0010:__list_del_entry_valid.cold.3+0x3d/0x53
> > [ 338.693626] Code: 0b 4c 89 e1 4c 89 ee 48 c7 c7 e0 1a e3 8f e8 5b
> > 60 fe ff 0f 0b 48 89 e9 4c 89 ea 48 89 de 48 c7 c7 60 1a e3 8f e8 44
> > 60 fe ff <0f> 0b 48 89 ea 48 89 de 48 c7 c7 00 1a e3 8f e8 30 60 fe ff
> > 0f 0b
> > [ 338.697651] RSP: 0018:ffff88800d03fc68 EFLAGS: 00010286
> > [ 338.698762] RAX: 000000000000006d RBX: ffff888028a14798 RCX: 0000000000000000
> > [ 338.700442] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff917e9240
> > [ 338.702257] RBP: ffff88801bb0ae90 R08: ffffed100a795f0e R09: ffffed100a795f0e
> > [ 338.704016] R10: ffff888053caf86b R11: ffffed100a795f0d R12: ffff88801bb0ae90
> > [ 338.705703] R13: d9c0013300000a39 R14: 000000000000005a R15: ffff88801b9f5800
> > [ 338.707510] FS: 0000000000000000(0000) GS:ffff888053c80000(0000)
> > knlGS:0000000000000000
> > [ 338.709319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 338.710715] CR2: 00005640baab74d0 CR3: 0000000017574005 CR4: 00000000001706e0
> > [ 338.712282] Call Trace:
> > [ 338.712898] <TASK>
> > [ 338.713430] _free_cpntf_state_locked+0x6b/0x120 [nfsd]
> > [ 338.714806] nfs4_laundromat+0x8ef/0xf30 [nfsd]
> > [ 338.716013] ? dequeue_entity+0x18b/0x6c0
> > [ 338.716970] ? release_lock_stateid+0x60/0x60 [nfsd]
> > [ 338.718169] ? _raw_spin_unlock+0x15/0x30
> > [ 338.719064] ? __switch_to+0x2fa/0x690
> > [ 338.719879] ? __schedule+0x67d/0xf20
> > [ 338.720678] laundromat_main+0x15/0x40 [nfsd]
> > [ 338.721760] process_one_work+0x3b4/0x6b0
> > [ 338.722629] worker_thread+0x5a/0x640
> > [ 338.723425] ? process_one_work+0x6b0/0x6b0
> > [ 338.724324] kthread+0x162/0x190
> > [ 338.725030] ? kthread_complete_and_exit+0x20/0x20
> > [ 338.726074] ret_from_fork+0x22/0x30
> > [ 338.726856] </TASK>
>
> Does the kernel you're testing have Dai's patch that he posted on August
> 1st?
>
> [PATCH v2] NFSD: fix use-after-free on source server when doing inter-server copy
>
> This sounds very similar to the problem he was fixing.

I'm testing upstream (5.19-rc6 Chuck's based) for both source and
destination server.

> --
> Jeff Layton <[email protected]>

2022-08-11 13:27:43

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: help with nfsd kernel oops

On Thu, Aug 11, 2022 at 9:09 AM Olga Kornievskaia <[email protected]> wrote:
>
> On Thu, Aug 11, 2022 at 6:45 AM Jeff Layton <[email protected]> wrote:
> >
> > On Wed, 2022-08-10 at 20:50 -0400, Olga Kornievskaia wrote:
> > > Hi folks (Chuck/Jeff specifically),
> > >
> > > We've had this outstanding kernel oops that happens (infrequently) in
> > > copy_offload testing (stack trace in the end). I've been trying to
> > > debug it for a while, added printks and such. I can hand-wavey explain
> > > what I'm seeing but I need help (a) nailing down exactly the problem
> > > and (b) get a helpful hint how to address it?
> > >
> > > Ok so what happens. Test case: source file is opened, locked,
> > > (blah-blah destination file), copy_notify to the source server, copy
> > > is done (src dest), source file unlocked (etc dest file), files
> > > closed. Copy/Copy_notify, uses a locking stateid.
> > >
> > > When unlocking happens it triggers LOCKU and FREE_STATEID. Copy_notify
> > > stateid is associated with the locking stateid on the server. When the
> > > last reference on the locking stateid goes nfs4_put_stateid() also
> > > calls nfs4_free_cpntf_statelist() which deals with cleaning up the
> > > copy_notify stateid.
> > >
> > > In the laundry thread, there is a failsafe that if for some reason the
> > > copy_notify state was not cleaned up/expired, then it would be deleted
> > > there.
> > >
> > > However in the failing case, where everything should be cleaned up as
> > > it's supposed to be, instead I see calling to put_ol_stateid_locked()
> > > (before FREE_STATEID is processed) which cleans up the parent but
> > > doesn't touch the copy_notify stateids so instead the laundry thread
> > > runs and walks the copy_notify list (since it's not empty) and tries
> > > to free the entries but that leads to this oops (since part of the
> > > memory was freed by put_ol_stateid_locked() and parent stateid)?.
> > >
> > > Perhaps the fix is to add the nfs4_free_cpntf_statelist() to
> > > put_ol_stateid_locked() which I tried and it seems to work. But I'm
> > > not confident about it.
> > >
> > > Thoughts?
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index fa67ecd5fe63..b988d3c4e5fd 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -1408,6 +1408,7 @@ static void put_ol_stateid_locked(struct
> > > nfs4_ol_stateid *stp,
> > > }
> > >
> > > idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
> > > + nfs4_free_cpntf_statelist(clp->net, s);
> > > list_add(&stp->st_locks, reaplist);
> > > }
> > >
> > >
> > >
> > > [ 338.681529] ------------[ cut here ]------------
> > > [ 338.683090] kernel BUG at lib/list_debug.c:53!
> > > [ 338.684372] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> > > [ 338.685977] CPU: 1 PID: 493 Comm: kworker/u256:27 Tainted: G B
> > > 5.19.0-rc6+ #104
> > > [ 338.688266] Hardware name: VMware, Inc. VMware Virtual
> > > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > > [ 338.691019] Workqueue: nfsd4 laundromat_main [nfsd]
> > > [ 338.692224] RIP: 0010:__list_del_entry_valid.cold.3+0x3d/0x53
> > > [ 338.693626] Code: 0b 4c 89 e1 4c 89 ee 48 c7 c7 e0 1a e3 8f e8 5b
> > > 60 fe ff 0f 0b 48 89 e9 4c 89 ea 48 89 de 48 c7 c7 60 1a e3 8f e8 44
> > > 60 fe ff <0f> 0b 48 89 ea 48 89 de 48 c7 c7 00 1a e3 8f e8 30 60 fe ff
> > > 0f 0b
> > > [ 338.697651] RSP: 0018:ffff88800d03fc68 EFLAGS: 00010286
> > > [ 338.698762] RAX: 000000000000006d RBX: ffff888028a14798 RCX: 0000000000000000
> > > [ 338.700442] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff917e9240
> > > [ 338.702257] RBP: ffff88801bb0ae90 R08: ffffed100a795f0e R09: ffffed100a795f0e
> > > [ 338.704016] R10: ffff888053caf86b R11: ffffed100a795f0d R12: ffff88801bb0ae90
> > > [ 338.705703] R13: d9c0013300000a39 R14: 000000000000005a R15: ffff88801b9f5800
> > > [ 338.707510] FS: 0000000000000000(0000) GS:ffff888053c80000(0000)
> > > knlGS:0000000000000000
> > > [ 338.709319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 338.710715] CR2: 00005640baab74d0 CR3: 0000000017574005 CR4: 00000000001706e0
> > > [ 338.712282] Call Trace:
> > > [ 338.712898] <TASK>
> > > [ 338.713430] _free_cpntf_state_locked+0x6b/0x120 [nfsd]
> > > [ 338.714806] nfs4_laundromat+0x8ef/0xf30 [nfsd]
> > > [ 338.716013] ? dequeue_entity+0x18b/0x6c0
> > > [ 338.716970] ? release_lock_stateid+0x60/0x60 [nfsd]
> > > [ 338.718169] ? _raw_spin_unlock+0x15/0x30
> > > [ 338.719064] ? __switch_to+0x2fa/0x690
> > > [ 338.719879] ? __schedule+0x67d/0xf20
> > > [ 338.720678] laundromat_main+0x15/0x40 [nfsd]
> > > [ 338.721760] process_one_work+0x3b4/0x6b0
> > > [ 338.722629] worker_thread+0x5a/0x640
> > > [ 338.723425] ? process_one_work+0x6b0/0x6b0
> > > [ 338.724324] kthread+0x162/0x190
> > > [ 338.725030] ? kthread_complete_and_exit+0x20/0x20
> > > [ 338.726074] ret_from_fork+0x22/0x30
> > > [ 338.726856] </TASK>
> >
> > Does the kernel you're testing have Dai's patch that he posted on August
> > 1st?
> >
> > [PATCH v2] NFSD: fix use-after-free on source server when doing inter-server copy
> >
> > This sounds very similar to the problem he was fixing.
>
> I'm testing upstream (5.19-rc6 Chuck's based) for both source and
> destination server.

Oh, I missed that thread entirely (will be reading it now). I'll try
Dai's patch but I think our approaches differ.

>
> > --
> > Jeff Layton <[email protected]>

2022-08-12 01:40:11

by Dai Ngo

[permalink] [raw]

Subject: Re: help with nfsd kernel oops

On 8/10/22 5:50 PM, Olga Kornievskaia wrote:
> Hi folks (Chuck/Jeff specifically),
>
> We've had this outstanding kernel oops that happens (infrequently) in
> copy_offload testing (stack trace in the end). I've been trying to
> debug it for a while, added printks and such. I can hand-wavey explain
> what I'm seeing but I need help (a) nailing down exactly the problem
> and (b) get a helpful hint how to address it?
>
> Ok so what happens. Test case: source file is opened, locked,
> (blah-blah destination file), copy_notify to the source server, copy
> is done (src dest), source file unlocked (etc dest file), files
> closed. Copy/Copy_notify, uses a locking stateid.
>
> When unlocking happens it triggers LOCKU and FREE_STATEID. Copy_notify
> stateid is associated with the locking stateid on the server. When the
> last reference on the locking stateid goes nfs4_put_stateid() also
> calls nfs4_free_cpntf_statelist() which deals with cleaning up the
> copy_notify stateid.
>
> In the laundry thread, there is a failsafe that if for some reason the
> copy_notify state was not cleaned up/expired, then it would be deleted
> there.
>
> However in the failing case, where everything should be cleaned up as
> it's supposed to be, instead I see calling to put_ol_stateid_locked()
> (before FREE_STATEID is processed) which cleans up the parent but
> doesn't touch the copy_notify stateids so instead the laundry thread
> runs and walks the copy_notify list (since it's not empty) and tries
> to free the entries but that leads to this oops (since part of the
> memory was freed by put_ol_stateid_locked() and parent stateid)?.
>
> Perhaps the fix is to add the nfs4_free_cpntf_statelist() to
> put_ol_stateid_locked() which I tried and it seems to work. But I'm
> not confident about it.
>
> Thoughts?
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index fa67ecd5fe63..b988d3c4e5fd 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1408,6 +1408,7 @@ static void put_ol_stateid_locked(struct
> nfs4_ol_stateid *stp,
> }
>
> idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
> + nfs4_free_cpntf_statelist(clp->net, s);
> list_add(&stp->st_locks, reaplist);
> }
>
>
In the use-after-free scenario, the copy_state is inserted on the
sc_cp_list of the lock state.

So when put_ol_stateid_locked is called from nfsd4_close_open_stateid,
with your proposed patch, nfs4_free_cpntf_statelist does not remove
any copy_state since 's' is the open state.

Also put_ol_stateid_locked can return without calling idr_remove
and nfs4_free_cpntf_statelist (your proposed fix).

-Dai

>
>
> [ 338.681529] ------------[ cut here ]------------
> [ 338.683090] kernel BUG at lib/list_debug.c:53!
> [ 338.684372] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> [ 338.685977] CPU: 1 PID: 493 Comm: kworker/u256:27 Tainted: G B
> 5.19.0-rc6+ #104
> [ 338.688266] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [ 338.691019] Workqueue: nfsd4 laundromat_main [nfsd]
> [ 338.692224] RIP: 0010:__list_del_entry_valid.cold.3+0x3d/0x53
> [ 338.693626] Code: 0b 4c 89 e1 4c 89 ee 48 c7 c7 e0 1a e3 8f e8 5b
> 60 fe ff 0f 0b 48 89 e9 4c 89 ea 48 89 de 48 c7 c7 60 1a e3 8f e8 44
> 60 fe ff <0f> 0b 48 89 ea 48 89 de 48 c7 c7 00 1a e3 8f e8 30 60 fe ff
> 0f 0b
> [ 338.697651] RSP: 0018:ffff88800d03fc68 EFLAGS: 00010286
> [ 338.698762] RAX: 000000000000006d RBX: ffff888028a14798 RCX: 0000000000000000
> [ 338.700442] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff917e9240
> [ 338.702257] RBP: ffff88801bb0ae90 R08: ffffed100a795f0e R09: ffffed100a795f0e
> [ 338.704016] R10: ffff888053caf86b R11: ffffed100a795f0d R12: ffff88801bb0ae90
> [ 338.705703] R13: d9c0013300000a39 R14: 000000000000005a R15: ffff88801b9f5800
> [ 338.707510] FS: 0000000000000000(0000) GS:ffff888053c80000(0000)
> knlGS:0000000000000000
> [ 338.709319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 338.710715] CR2: 00005640baab74d0 CR3: 0000000017574005 CR4: 00000000001706e0
> [ 338.712282] Call Trace:
> [ 338.712898] <TASK>
> [ 338.713430] _free_cpntf_state_locked+0x6b/0x120 [nfsd]
> [ 338.714806] nfs4_laundromat+0x8ef/0xf30 [nfsd]
> [ 338.716013] ? dequeue_entity+0x18b/0x6c0
> [ 338.716970] ? release_lock_stateid+0x60/0x60 [nfsd]
> [ 338.718169] ? _raw_spin_unlock+0x15/0x30
> [ 338.719064] ? __switch_to+0x2fa/0x690
> [ 338.719879] ? __schedule+0x67d/0xf20
> [ 338.720678] laundromat_main+0x15/0x40 [nfsd]
> [ 338.721760] process_one_work+0x3b4/0x6b0
> [ 338.722629] worker_thread+0x5a/0x640
> [ 338.723425] ? process_one_work+0x6b0/0x6b0
> [ 338.724324] kthread+0x162/0x190
> [ 338.725030] ? kthread_complete_and_exit+0x20/0x20
> [ 338.726074] ret_from_fork+0x22/0x30
> [ 338.726856] </TASK>

2022-08-12 14:31:36

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: help with nfsd kernel oops

On Thu, Aug 11, 2022 at 9:34 PM <[email protected]> wrote:
>
> On 8/10/22 5:50 PM, Olga Kornievskaia wrote:
> > Hi folks (Chuck/Jeff specifically),
> >
> > We've had this outstanding kernel oops that happens (infrequently) in
> > copy_offload testing (stack trace in the end). I've been trying to
> > debug it for a while, added printks and such. I can hand-wavey explain
> > what I'm seeing but I need help (a) nailing down exactly the problem
> > and (b) get a helpful hint how to address it?
> >
> > Ok so what happens. Test case: source file is opened, locked,
> > (blah-blah destination file), copy_notify to the source server, copy
> > is done (src dest), source file unlocked (etc dest file), files
> > closed. Copy/Copy_notify, uses a locking stateid.
> >
> > When unlocking happens it triggers LOCKU and FREE_STATEID. Copy_notify
> > stateid is associated with the locking stateid on the server. When the
> > last reference on the locking stateid goes nfs4_put_stateid() also
> > calls nfs4_free_cpntf_statelist() which deals with cleaning up the
> > copy_notify stateid.
> >
> > In the laundry thread, there is a failsafe that if for some reason the
> > copy_notify state was not cleaned up/expired, then it would be deleted
> > there.
> >
> > However in the failing case, where everything should be cleaned up as
> > it's supposed to be, instead I see calling to put_ol_stateid_locked()
> > (before FREE_STATEID is processed) which cleans up the parent but
> > doesn't touch the copy_notify stateids so instead the laundry thread
> > runs and walks the copy_notify list (since it's not empty) and tries
> > to free the entries but that leads to this oops (since part of the
> > memory was freed by put_ol_stateid_locked() and parent stateid)?.
> >
> > Perhaps the fix is to add the nfs4_free_cpntf_statelist() to
> > put_ol_stateid_locked() which I tried and it seems to work. But I'm
> > not confident about it.
> >
> > Thoughts?
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index fa67ecd5fe63..b988d3c4e5fd 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1408,6 +1408,7 @@ static void put_ol_stateid_locked(struct
> > nfs4_ol_stateid *stp,
> > }
> >
> > idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
> > + nfs4_free_cpntf_statelist(clp->net, s);
> > list_add(&stp->st_locks, reaplist);
> > }
> >
> >
> In the use-after-free scenario, the copy_state is inserted on the
> sc_cp_list of the lock state.
>
> So when put_ol_stateid_locked is called from nfsd4_close_open_stateid,
> with your proposed patch, nfs4_free_cpntf_statelist does not remove
> any copy_state since 's' is the open state.

i dont know about all the cases but i'm 100% certain that "s" is there
a locking state (in linux client using nfstest_ssc). Since i've tested
it and it works and i have heavy debugging added to knfsd tracking all
the pointers of copy_notify stateids and its parents.

> Also put_ol_stateid_locked can return without calling idr_remove
> and nfs4_free_cpntf_statelist (your proposed fix).
>
> -Dai
>
> >
> >
> > [ 338.681529] ------------[ cut here ]------------
> > [ 338.683090] kernel BUG at lib/list_debug.c:53!
> > [ 338.684372] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> > [ 338.685977] CPU: 1 PID: 493 Comm: kworker/u256:27 Tainted: G B
> > 5.19.0-rc6+ #104
> > [ 338.688266] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [ 338.691019] Workqueue: nfsd4 laundromat_main [nfsd]
> > [ 338.692224] RIP: 0010:__list_del_entry_valid.cold.3+0x3d/0x53
> > [ 338.693626] Code: 0b 4c 89 e1 4c 89 ee 48 c7 c7 e0 1a e3 8f e8 5b
> > 60 fe ff 0f 0b 48 89 e9 4c 89 ea 48 89 de 48 c7 c7 60 1a e3 8f e8 44
> > 60 fe ff <0f> 0b 48 89 ea 48 89 de 48 c7 c7 00 1a e3 8f e8 30 60 fe ff
> > 0f 0b
> > [ 338.697651] RSP: 0018:ffff88800d03fc68 EFLAGS: 00010286
> > [ 338.698762] RAX: 000000000000006d RBX: ffff888028a14798 RCX: 0000000000000000
> > [ 338.700442] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff917e9240
> > [ 338.702257] RBP: ffff88801bb0ae90 R08: ffffed100a795f0e R09: ffffed100a795f0e
> > [ 338.704016] R10: ffff888053caf86b R11: ffffed100a795f0d R12: ffff88801bb0ae90
> > [ 338.705703] R13: d9c0013300000a39 R14: 000000000000005a R15: ffff88801b9f5800
> > [ 338.707510] FS: 0000000000000000(0000) GS:ffff888053c80000(0000)
> > knlGS:0000000000000000
> > [ 338.709319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 338.710715] CR2: 00005640baab74d0 CR3: 0000000017574005 CR4: 00000000001706e0
> > [ 338.712282] Call Trace:
> > [ 338.712898] <TASK>
> > [ 338.713430] _free_cpntf_state_locked+0x6b/0x120 [nfsd]
> > [ 338.714806] nfs4_laundromat+0x8ef/0xf30 [nfsd]
> > [ 338.716013] ? dequeue_entity+0x18b/0x6c0
> > [ 338.716970] ? release_lock_stateid+0x60/0x60 [nfsd]
> > [ 338.718169] ? _raw_spin_unlock+0x15/0x30
> > [ 338.719064] ? __switch_to+0x2fa/0x690
> > [ 338.719879] ? __schedule+0x67d/0xf20
> > [ 338.720678] laundromat_main+0x15/0x40 [nfsd]
> > [ 338.721760] process_one_work+0x3b4/0x6b0
> > [ 338.722629] worker_thread+0x5a/0x640
> > [ 338.723425] ? process_one_work+0x6b0/0x6b0
> > [ 338.724324] kthread+0x162/0x190
> > [ 338.725030] ? kthread_complete_and_exit+0x20/0x20
> > [ 338.726074] ret_from_fork+0x22/0x30
> > [ 338.726856] </TASK>

2022-08-12 19:52:55

by Dai Ngo

[permalink] [raw]

Subject: Re: help with nfsd kernel oops

On 8/12/22 7:23 AM, Olga Kornievskaia wrote:
> On Thu, Aug 11, 2022 at 9:34 PM <[email protected]> wrote:
>> On 8/10/22 5:50 PM, Olga Kornievskaia wrote:
>>> Hi folks (Chuck/Jeff specifically),
>>>
>>> We've had this outstanding kernel oops that happens (infrequently) in
>>> copy_offload testing (stack trace in the end). I've been trying to
>>> debug it for a while, added printks and such. I can hand-wavey explain
>>> what I'm seeing but I need help (a) nailing down exactly the problem
>>> and (b) get a helpful hint how to address it?
>>>
>>> Ok so what happens. Test case: source file is opened, locked,
>>> (blah-blah destination file), copy_notify to the source server, copy
>>> is done (src dest), source file unlocked (etc dest file), files
>>> closed. Copy/Copy_notify, uses a locking stateid.
>>>
>>> When unlocking happens it triggers LOCKU and FREE_STATEID. Copy_notify
>>> stateid is associated with the locking stateid on the server. When the
>>> last reference on the locking stateid goes nfs4_put_stateid() also
>>> calls nfs4_free_cpntf_statelist() which deals with cleaning up the
>>> copy_notify stateid.
>>>
>>> In the laundry thread, there is a failsafe that if for some reason the
>>> copy_notify state was not cleaned up/expired, then it would be deleted
>>> there.
>>>
>>> However in the failing case, where everything should be cleaned up as
>>> it's supposed to be, instead I see calling to put_ol_stateid_locked()
>>> (before FREE_STATEID is processed) which cleans up the parent but
>>> doesn't touch the copy_notify stateids so instead the laundry thread
>>> runs and walks the copy_notify list (since it's not empty) and tries
>>> to free the entries but that leads to this oops (since part of the
>>> memory was freed by put_ol_stateid_locked() and parent stateid)?.
>>>
>>> Perhaps the fix is to add the nfs4_free_cpntf_statelist() to
>>> put_ol_stateid_locked() which I tried and it seems to work. But I'm
>>> not confident about it.
>>>
>>> Thoughts?
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index fa67ecd5fe63..b988d3c4e5fd 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -1408,6 +1408,7 @@ static void put_ol_stateid_locked(struct
>>> nfs4_ol_stateid *stp,
>>> }
>>>
>>> idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
>>> + nfs4_free_cpntf_statelist(clp->net, s);
>>> list_add(&stp->st_locks, reaplist);
>>> }
>>>
>>>
>> In the use-after-free scenario, the copy_state is inserted on the
>> sc_cp_list of the lock state.
>>
>> So when put_ol_stateid_locked is called from nfsd4_close_open_stateid,
>> with your proposed patch, nfs4_free_cpntf_statelist does not remove
>> any copy_state since 's' is the open state.
> i dont know about all the cases but i'm 100% certain that "s" is there
> a locking state (in linux client using nfstest_ssc). Since i've tested
> it and it works and i have heavy debugging added to knfsd tracking all
> the pointers of copy_notify stateids and its parents.

The fix for this problem requires the copy state to be freed when
nfsd4_close_open_stateid is called. With your patch, the copy state
is freed from:

nfsd4_close_open_stateid
unhash_open_stateid
release_open_stateid_locks
put_ol_stateid_locked
nfs4_free_cpntf_statelist

with your patch, nfs4_free_cpntf_statelist is also called for v4.0
client which is unnecessary.

in addition, there are 5 places that call put_ol_stateid_locked:

release_open_stateid_locks
release_open_stateid
release_openowner
nfsd4_close_open_stateid
nfsd4_release_lockowner

only the call to put_ol_stateid_locked from release_open_stateid_locks
is needed. The over 4 are unnecessary overhead.

My patch frees the copy state only from nfsd4_close_open_stateid and
only for v4.x client.

-Dai

>
>> Also put_ol_stateid_locked can return without calling idr_remove
>> and nfs4_free_cpntf_statelist (your proposed fix).
>>
>> -Dai
>>
>>>
>>> [ 338.681529] ------------[ cut here ]------------
>>> [ 338.683090] kernel BUG at lib/list_debug.c:53!
>>> [ 338.684372] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
>>> [ 338.685977] CPU: 1 PID: 493 Comm: kworker/u256:27 Tainted: G B
>>> 5.19.0-rc6+ #104
>>> [ 338.688266] Hardware name: VMware, Inc. VMware Virtual
>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>> [ 338.691019] Workqueue: nfsd4 laundromat_main [nfsd]
>>> [ 338.692224] RIP: 0010:__list_del_entry_valid.cold.3+0x3d/0x53
>>> [ 338.693626] Code: 0b 4c 89 e1 4c 89 ee 48 c7 c7 e0 1a e3 8f e8 5b
>>> 60 fe ff 0f 0b 48 89 e9 4c 89 ea 48 89 de 48 c7 c7 60 1a e3 8f e8 44
>>> 60 fe ff <0f> 0b 48 89 ea 48 89 de 48 c7 c7 00 1a e3 8f e8 30 60 fe ff
>>> 0f 0b
>>> [ 338.697651] RSP: 0018:ffff88800d03fc68 EFLAGS: 00010286
>>> [ 338.698762] RAX: 000000000000006d RBX: ffff888028a14798 RCX: 0000000000000000
>>> [ 338.700442] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff917e9240
>>> [ 338.702257] RBP: ffff88801bb0ae90 R08: ffffed100a795f0e R09: ffffed100a795f0e
>>> [ 338.704016] R10: ffff888053caf86b R11: ffffed100a795f0d R12: ffff88801bb0ae90
>>> [ 338.705703] R13: d9c0013300000a39 R14: 000000000000005a R15: ffff88801b9f5800
>>> [ 338.707510] FS: 0000000000000000(0000) GS:ffff888053c80000(0000)
>>> knlGS:0000000000000000
>>> [ 338.709319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 338.710715] CR2: 00005640baab74d0 CR3: 0000000017574005 CR4: 00000000001706e0
>>> [ 338.712282] Call Trace:
>>> [ 338.712898] <TASK>
>>> [ 338.713430] _free_cpntf_state_locked+0x6b/0x120 [nfsd]
>>> [ 338.714806] nfs4_laundromat+0x8ef/0xf30 [nfsd]
>>> [ 338.716013] ? dequeue_entity+0x18b/0x6c0
>>> [ 338.716970] ? release_lock_stateid+0x60/0x60 [nfsd]
>>> [ 338.718169] ? _raw_spin_unlock+0x15/0x30
>>> [ 338.719064] ? __switch_to+0x2fa/0x690
>>> [ 338.719879] ? __schedule+0x67d/0xf20
>>> [ 338.720678] laundromat_main+0x15/0x40 [nfsd]
>>> [ 338.721760] process_one_work+0x3b4/0x6b0
>>> [ 338.722629] worker_thread+0x5a/0x640
>>> [ 338.723425] ? process_one_work+0x6b0/0x6b0
>>> [ 338.724324] kthread+0x162/0x190
>>> [ 338.725030] ? kthread_complete_and_exit+0x20/0x20
>>> [ 338.726074] ret_from_fork+0x22/0x30
>>> [ 338.726856] </TASK>