2015-03-15 12:56:15

by Christoph Hellwig

[permalink] [raw]
Subject: nfsd use after free in 4.0-rc

generic/011 1s ...[ 154.375068] general protection fault: 0000 [#1] SMP
[ 154.376050] Modules linked in:
[ 154.376785] CPU: 2 PID: 3818 Comm: nfsd Not tainted 4.0.0-rc3+ #150
[ 154.377891] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 154.377891] task: ffff88007b294410 ti: ffff88007a910000 task.ti: ffff88007a910000
[ 154.377891] RIP: 0010:[<ffffffff81102050>] [<ffffffff81102050>] __lock_acquire+0x140/0x1e20
[ 154.377891] RSP: 0018:ffff88007a9139e8 EFLAGS: 00010002
[ 154.377891] RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6f03 RCX: 0000000000000000
[ 154.377891] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6f1b
[ 154.377891] RBP: ffff88007a913ac8 R08: 0000000000000001 R09: 0000000000000000
[ 154.377891] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88007b294410
[ 154.377891] R13: 6b6b6b6b6b6b6f1b R14: 0000000000000000 R15: 0000000000000000
[ 154.377891] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[ 154.377891] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 154.377891] CR2: 00007ffff85d1fec CR3: 0000000076ebb000 CR4: 00000000000007e0
[ 154.377891] Stack:
[ 154.377891] ffff88007b294410 ffffffff824c0a20 ffff88007b294c08 0000000000000002
[ 154.377891] ffff88007a913af8 ffffffff0000032c ffff880000000000 0000000000000000
[ 154.377891] ffff88007a913b18 0000000000000046 ffff88007b294c00 ffffffff0000001a
[ 154.377891] Call Trace:
[ 154.377891] [<ffffffff811042ff>] lock_acquire+0x9f/0x120
[ 154.377891] [<ffffffff813c603e>] ? nfsd4_process_open2+0x1de/0x1010
[ 154.377891] [<ffffffff810fff5c>] ? lockdep_init_map+0xbc/0x520
[ 154.397191] [<ffffffff81e3fcec>] _raw_spin_lock+0x2c/0x40
[ 154.397191] [<ffffffff813c603e>] ? nfsd4_process_open2+0x1de/0x1010
[ 154.397191] [<ffffffff81e40446>] ? _raw_spin_unlock+0x26/0x30
[ 154.397191] [<ffffffff813c603e>] nfsd4_process_open2+0x1de/0x1010
[ 154.397191] [<ffffffff813c5e60>] ? nfsd4_process_open1+0x3d0/0x3d0
[ 154.397191] [<ffffffff811d79f3>] ? inode_permission+0x13/0x50
[ 154.397191] [<ffffffff813aa462>] ? nfsd_permission+0x72/0x130
[ 154.397191] [<ffffffff813a744a>] ? fh_verify+0x14a/0x540
[ 154.397191] [<ffffffff813b6fa0>] nfsd4_open+0x370/0x780
[ 154.397191] [<ffffffff813b6c30>] ? nfsd4_link+0xf0/0xf0
[ 154.397191] [<ffffffff813b782c>] nfsd4_proc_compound+0x47c/0x680
[ 154.397191] [<ffffffff813a4711>] nfsd_dispatch+0xa1/0x1b0
[ 154.397191] [<ffffffff81d5864a>] svc_process_common+0x2da/0x570
[ 154.397191] [<ffffffff81d58ca6>] svc_process+0x176/0x1e0
[ 154.397191] [<ffffffff813a3fe7>] nfsd+0x157/0x1d0
[ 154.397191] [<ffffffff813a3e90>] ? nfsd_destroy+0xc0/0xc0
[ 154.397191] [<ffffffff813a3e90>] ? nfsd_destroy+0xc0/0xc0
[ 154.397191] [<ffffffff810dda0f>] kthread+0xdf/0x100
[ 154.397191] [<ffffffff810dd930>] ? __init_kthread_worker+0x70/0x70
[ 154.397191] [<ffffffff81e40918>] ret_from_fork+0x58/0x90
[ 154.397191] [<ffffffff810dd930>] ? __init_kthread_worker+0x70/0x70
[ 154.397191] Code: 85 db 75 53 0f 1f 80 00 00 00 00 31 c0 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 0f 1f 84 00 00 00 00 00 <49> 81 7d 00 c0 58 75 82 b8 00 00 00 00 44 0f 44 c0 41 83 fe 01
[ 154.397191] RIP [<ffffffff81102050>] __lock_acquire+0x140/0x1e20
[ 154.397191] RSP <ffff88007a9139e8>
[ 154.397191] ---[ end trace ce8f0fa2103c18f2 ]---
[ 165.320204] Slab corruption (Tainted: G D ): nfsd4_openowners start=ffff88007b3fa8b0, len=528
[ 165.321157] Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
[ 165.321660] Last user: [<ffffffff813c0a43>](nfs4_free_openowner+0x13/0x20)
[ 165.322281] 030: 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b lkkkkkkkkkkkkkkk
[ 165.323172] Prev obj: start=ffff88007b3fa688, len=528
[ 165.323743] Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
[ 165.324365] Last user: [<ffffffff813c0a43>](nfs4_free_openowner+0x13/0x20)
[ 165.325035] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 165.325925] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 165.326809] Next obj: start=ffff88007b3faad8, len=528
[ 165.327366] Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
[ 165.327916] Last user:
[<ffffffff813c0a43>](nfs4_free_openowner+0x13/0x20)
[ 165.328572] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 165.329439] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk


2015-03-15 22:08:15

by Jeff Layton

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Sun, 15 Mar 2015 05:56:14 -0700
Christoph Hellwig <[email protected]> wrote:

> generic/011 1s ...[ 154.375068] general protection fault: 0000 [#1] SMP
> [ 154.376050] Modules linked in:
> [ 154.376785] CPU: 2 PID: 3818 Comm: nfsd Not tainted 4.0.0-rc3+ #150
> [ 154.377891] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [ 154.377891] task: ffff88007b294410 ti: ffff88007a910000 task.ti: ffff88007a910000
> [ 154.377891] RIP: 0010:[<ffffffff81102050>] [<ffffffff81102050>] __lock_acquire+0x140/0x1e20
> [ 154.377891] RSP: 0018:ffff88007a9139e8 EFLAGS: 00010002
> [ 154.377891] RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6f03 RCX: 0000000000000000
> [ 154.377891] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6f1b
> [ 154.377891] RBP: ffff88007a913ac8 R08: 0000000000000001 R09: 0000000000000000
> [ 154.377891] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88007b294410
> [ 154.377891] R13: 6b6b6b6b6b6b6f1b R14: 0000000000000000 R15: 0000000000000000
> [ 154.377891] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
> [ 154.377891] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 154.377891] CR2: 00007ffff85d1fec CR3: 0000000076ebb000 CR4: 00000000000007e0
> [ 154.377891] Stack:
> [ 154.377891] ffff88007b294410 ffffffff824c0a20 ffff88007b294c08 0000000000000002
> [ 154.377891] ffff88007a913af8 ffffffff0000032c ffff880000000000 0000000000000000
> [ 154.377891] ffff88007a913b18 0000000000000046 ffff88007b294c00 ffffffff0000001a
> [ 154.377891] Call Trace:
> [ 154.377891] [<ffffffff811042ff>] lock_acquire+0x9f/0x120
> [ 154.377891] [<ffffffff813c603e>] ? nfsd4_process_open2+0x1de/0x1010
> [ 154.377891] [<ffffffff810fff5c>] ? lockdep_init_map+0xbc/0x520
> [ 154.397191] [<ffffffff81e3fcec>] _raw_spin_lock+0x2c/0x40
> [ 154.397191] [<ffffffff813c603e>] ? nfsd4_process_open2+0x1de/0x1010
> [ 154.397191] [<ffffffff81e40446>] ? _raw_spin_unlock+0x26/0x30
> [ 154.397191] [<ffffffff813c603e>] nfsd4_process_open2+0x1de/0x1010

Could you run gdb against nfsd.ko and do a:

list *(nfsd4_process_open2+0x1de)

I'd be interesting to see where it crashed. My suspicion would be
trying to lock the cl->cl_lock, but I can't tell for sure (and from
where).

> [ 154.397191] [<ffffffff813c5e60>] ? nfsd4_process_open1+0x3d0/0x3d0
> [ 154.397191] [<ffffffff811d79f3>] ? inode_permission+0x13/0x50
> [ 154.397191] [<ffffffff813aa462>] ? nfsd_permission+0x72/0x130
> [ 154.397191] [<ffffffff813a744a>] ? fh_verify+0x14a/0x540
> [ 154.397191] [<ffffffff813b6fa0>] nfsd4_open+0x370/0x780
> [ 154.397191] [<ffffffff813b6c30>] ? nfsd4_link+0xf0/0xf0
> [ 154.397191] [<ffffffff813b782c>] nfsd4_proc_compound+0x47c/0x680
> [ 154.397191] [<ffffffff813a4711>] nfsd_dispatch+0xa1/0x1b0
> [ 154.397191] [<ffffffff81d5864a>] svc_process_common+0x2da/0x570
> [ 154.397191] [<ffffffff81d58ca6>] svc_process+0x176/0x1e0
> [ 154.397191] [<ffffffff813a3fe7>] nfsd+0x157/0x1d0
> [ 154.397191] [<ffffffff813a3e90>] ? nfsd_destroy+0xc0/0xc0
> [ 154.397191] [<ffffffff813a3e90>] ? nfsd_destroy+0xc0/0xc0
> [ 154.397191] [<ffffffff810dda0f>] kthread+0xdf/0x100
> [ 154.397191] [<ffffffff810dd930>] ? __init_kthread_worker+0x70/0x70
> [ 154.397191] [<ffffffff81e40918>] ret_from_fork+0x58/0x90
> [ 154.397191] [<ffffffff810dd930>] ? __init_kthread_worker+0x70/0x70
> [ 154.397191] Code: 85 db 75 53 0f 1f 80 00 00 00 00 31 c0 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 0f 1f 84 00 00 00 00 00 <49> 81 7d 00 c0 58 75 82 b8 00 00 00 00 44 0f 44 c0 41 83 fe 01
> [ 154.397191] RIP [<ffffffff81102050>] __lock_acquire+0x140/0x1e20
> [ 154.397191] RSP <ffff88007a9139e8>
> [ 154.397191] ---[ end trace ce8f0fa2103c18f2 ]---
> [ 165.320204] Slab corruption (Tainted: G D ): nfsd4_openowners start=ffff88007b3fa8b0, len=528
> [ 165.321157] Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> [ 165.321660] Last user: [<ffffffff813c0a43>](nfs4_free_openowner+0x13/0x20)
> [ 165.322281] 030: 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b lkkkkkkkkkkkkkkk

Certainly looks like a use-after-free.

> [ 165.323172] Prev obj: start=ffff88007b3fa688, len=528
> [ 165.323743] Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> [ 165.324365] Last user: [<ffffffff813c0a43>](nfs4_free_openowner+0x13/0x20)
> [ 165.325035] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 165.325925] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 165.326809] Next obj: start=ffff88007b3faad8, len=528
> [ 165.327366] Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> [ 165.327916] Last user:
> [<ffffffff813c0a43>](nfs4_free_openowner+0x13/0x20)
> [ 165.328572] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 165.329439] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Jeff Layton <[email protected]>

2015-03-16 11:46:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Sun, Mar 15, 2015 at 06:08:11PM -0400, Jeff Layton wrote:
> Could you run gdb against nfsd.ko and do a:
>
> list *(nfsd4_process_open2+0x1de)
>
> I'd be interesting to see where it crashed. My suspicion would be
> trying to lock the cl->cl_lock, but I can't tell for sure (and from
> where).

That's deep inside the spinlock assembly code, but if I got back far
enough I get here:

(gdb) l *(nfsd4_process_open2+0x1c6)
0xffffffff813c6026 is in nfsd4_process_open2
(../fs/nfsd/nfs4state.c:3238).
3233 stp->st_stateowner = nfs4_get_stateowner(&oo->oo_owner);
3234 get_nfs4_file(fp);
3235 stp->st_stid.sc_file = fp;
3236 stp->st_access_bmap = 0;
3237 stp->st_deny_bmap = 0;
3238 stp->st_openstp = NULL;
3239 spin_lock(&oo->oo_owner.so_client->cl_lock);
3240 list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
3241 spin_lock(&fp->fi_lock);
3242 list_add(&stp->st_perfile, &fp->fi_stateids);

2015-03-16 12:20:08

by Jeff Layton

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Mon, 16 Mar 2015 04:46:48 -0700
Christoph Hellwig <[email protected]> wrote:

> On Sun, Mar 15, 2015 at 06:08:11PM -0400, Jeff Layton wrote:
> > Could you run gdb against nfsd.ko and do a:
> >
> > list *(nfsd4_process_open2+0x1de)
> >
> > I'd be interesting to see where it crashed. My suspicion would be
> > trying to lock the cl->cl_lock, but I can't tell for sure (and from
> > where).
>
> That's deep inside the spinlock assembly code, but if I got back far
> enough I get here:
>
> (gdb) l *(nfsd4_process_open2+0x1c6)
> 0xffffffff813c6026 is in nfsd4_process_open2
> (../fs/nfsd/nfs4state.c:3238).
> 3233 stp->st_stateowner = nfs4_get_stateowner(&oo->oo_owner);
> 3234 get_nfs4_file(fp);
> 3235 stp->st_stid.sc_file = fp;
> 3236 stp->st_access_bmap = 0;
> 3237 stp->st_deny_bmap = 0;
> 3238 stp->st_openstp = NULL;
> 3239 spin_lock(&oo->oo_owner.so_client->cl_lock);

Thanks.

Yep, makes sense. oo is probably corrupt, so once you try to
dereference the so_client pointer embedded within, it barfs. I was also
able to reproduce some list corruption in the openowner slabcache
yesterday evening too using that test.

I just tried a v3.19 kernel on the server and could reproduce this
there with generic/011 as well, so this looks like a preexisting bug of
some sort. Perhaps the recent client changes to allow parallel opens
are helping to expose it?

In any case, it looks likely to be an openstate refcount imbalance,
where we're putting the last reference while there are still users of
it. The openowner references are mostly owned by the stateids that
are associated with them. It might be interesting to turn up poisoning
of the stateid_slab as well...

> 3240 list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
> 3241 spin_lock(&fp->fi_lock);
> 3242 list_add(&stp->st_perfile, &fp->fi_stateids);


--
Jeff Layton <[email protected]>

2015-03-16 12:27:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Mon, Mar 16, 2015 at 08:20:04AM -0400, Jeff Layton wrote:
> I just tried a v3.19 kernel on the server and could reproduce this
> there with generic/011 as well, so this looks like a preexisting bug of
> some sort. Perhaps the recent client changes to allow parallel opens
> are helping to expose it?

That sounds like a good explanation, as I've never seen this before
those changes were merged.

2015-03-16 15:58:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Mon, Mar 16, 2015 at 04:46:48AM -0700, Christoph Hellwig wrote:
> On Sun, Mar 15, 2015 at 06:08:11PM -0400, Jeff Layton wrote:
> > Could you run gdb against nfsd.ko and do a:
> >
> > list *(nfsd4_process_open2+0x1de)
> >
> > I'd be interesting to see where it crashed. My suspicion would be
> > trying to lock the cl->cl_lock, but I can't tell for sure (and from
> > where).
>
> That's deep inside the spinlock assembly code, but if I got back far
> enough I get here:
>
> (gdb) l *(nfsd4_process_open2+0x1c6)
> 0xffffffff813c6026 is in nfsd4_process_open2
> (../fs/nfsd/nfs4state.c:3238).
> 3233 stp->st_stateowner = nfs4_get_stateowner(&oo->oo_owner);
> 3234 get_nfs4_file(fp);
> 3235 stp->st_stid.sc_file = fp;
> 3236 stp->st_access_bmap = 0;
> 3237 stp->st_deny_bmap = 0;
> 3238 stp->st_openstp = NULL;
> 3239 spin_lock(&oo->oo_owner.so_client->cl_lock);
> 3240 list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
> 3241 spin_lock(&fp->fi_lock);
> 3242 list_add(&stp->st_perfile, &fp->fi_stateids);

I assume you're testing only NFS v4.1?

--b.

2015-03-16 16:19:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Mon, Mar 16, 2015 at 05:27:31AM -0700, Christoph Hellwig wrote:
> On Mon, Mar 16, 2015 at 08:20:04AM -0400, Jeff Layton wrote:
> > I just tried a v3.19 kernel on the server and could reproduce this
> > there with generic/011 as well, so this looks like a preexisting bug of
> > some sort. Perhaps the recent client changes to allow parallel opens
> > are helping to expose it?
>
> That sounds like a good explanation, as I've never seen this before
> those changes were merged.

Possibly unrelated, but I'm suspicious of the release_open_stateid() on
nfs4_get_vfs_file() failure: I believe a) we're not holding any locks
over those two calls, and b) the stateid is globally visible ever since
init_open_stateid. So by the time we call release_open_stateid(),
another open could have found that stateid and be trying to work with
it, right? Maybe I'm missing something.

nfs4_get_vfs_file() is where an actual vfs open can happen, so it's
doing a lot of work, and can sleep--so it seems a particularly likely
point for someone else to race with us.

--b.

2015-03-16 16:53:18

by Jeff Layton

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Mon, 16 Mar 2015 12:19:55 -0400
[email protected] (J. Bruce Fields) wrote:

> On Mon, Mar 16, 2015 at 05:27:31AM -0700, Christoph Hellwig wrote:
> > On Mon, Mar 16, 2015 at 08:20:04AM -0400, Jeff Layton wrote:
> > > I just tried a v3.19 kernel on the server and could reproduce this
> > > there with generic/011 as well, so this looks like a preexisting bug of
> > > some sort. Perhaps the recent client changes to allow parallel opens
> > > are helping to expose it?
> >
> > That sounds like a good explanation, as I've never seen this before
> > those changes were merged.
>
> Possibly unrelated, but I'm suspicious of the release_open_stateid() on
> nfs4_get_vfs_file() failure: I believe a) we're not holding any locks
> over those two calls, and b) the stateid is globally visible ever since
> init_open_stateid. So by the time we call release_open_stateid(),
> another open could have found that stateid and be trying to work with
> it, right? Maybe I'm missing something.
>
> nfs4_get_vfs_file() is where an actual vfs open can happen, so it's
> doing a lot of work, and can sleep--so it seems a particularly likely
> point for someone else to race with us.
>

In theory it's possible for that stateid to be found and used that way.
It really shouldn't happen though because it's a brand new stateid and
the client should have no idea what it actually is.

If that does happen though, it should be safe. release_open_stateid does
actually respect the refcount. It basically takes the lock, unhashes it
and then does a put on the stateid. If the refcount goes to 0, then it
puts it onto the reaplist to be freed later.

So if that were to happen we might end up with a stale stateid, but it
really shouldn't, and it certainly ought not cause any real refcounting
problems.
--
Jeff Layton <[email protected]>

2015-03-16 17:10:06

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Mon, Mar 16, 2015 at 12:53:14PM -0400, Jeff Layton wrote:
> On Mon, 16 Mar 2015 12:19:55 -0400
> [email protected] (J. Bruce Fields) wrote:
>
> > On Mon, Mar 16, 2015 at 05:27:31AM -0700, Christoph Hellwig wrote:
> > > On Mon, Mar 16, 2015 at 08:20:04AM -0400, Jeff Layton wrote:
> > > > I just tried a v3.19 kernel on the server and could reproduce this
> > > > there with generic/011 as well, so this looks like a preexisting bug of
> > > > some sort. Perhaps the recent client changes to allow parallel opens
> > > > are helping to expose it?
> > >
> > > That sounds like a good explanation, as I've never seen this before
> > > those changes were merged.
> >
> > Possibly unrelated, but I'm suspicious of the release_open_stateid() on
> > nfs4_get_vfs_file() failure: I believe a) we're not holding any locks
> > over those two calls, and b) the stateid is globally visible ever since
> > init_open_stateid. So by the time we call release_open_stateid(),
> > another open could have found that stateid and be trying to work with
> > it, right? Maybe I'm missing something.
> >
> > nfs4_get_vfs_file() is where an actual vfs open can happen, so it's
> > doing a lot of work, and can sleep--so it seems a particularly likely
> > point for someone else to race with us.
> >
>
> In theory it's possible for that stateid to be found and used that way.
> It really shouldn't happen though because it's a brand new stateid and
> the client should have no idea what it actually is.

nfsd4_find_existing_open() looks like it will still find such a stateid.
I haven't tried to trace through what happens if it finds a stateid that
then immediately has release_open_stateid called on it.

--b.

>
> If that does happen though, it should be safe. release_open_stateid does
> actually respect the refcount. It basically takes the lock, unhashes it
> and then does a put on the stateid. If the refcount goes to 0, then it
> puts it onto the reaplist to be freed later.
>
> So if that were to happen we might end up with a stale stateid, but it
> really shouldn't, and it certainly ought not cause any real refcounting
> problems.
> --
> Jeff Layton <[email protected]>

2015-03-16 17:37:43

by Jeff Layton

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Mon, 16 Mar 2015 13:10:05 -0400
"J. Bruce Fields" <[email protected]> wrote:

> On Mon, Mar 16, 2015 at 12:53:14PM -0400, Jeff Layton wrote:
> > On Mon, 16 Mar 2015 12:19:55 -0400
> > [email protected] (J. Bruce Fields) wrote:
> >
> > > On Mon, Mar 16, 2015 at 05:27:31AM -0700, Christoph Hellwig wrote:
> > > > On Mon, Mar 16, 2015 at 08:20:04AM -0400, Jeff Layton wrote:
> > > > > I just tried a v3.19 kernel on the server and could reproduce this
> > > > > there with generic/011 as well, so this looks like a preexisting bug of
> > > > > some sort. Perhaps the recent client changes to allow parallel opens
> > > > > are helping to expose it?
> > > >
> > > > That sounds like a good explanation, as I've never seen this before
> > > > those changes were merged.
> > >
> > > Possibly unrelated, but I'm suspicious of the release_open_stateid() on
> > > nfs4_get_vfs_file() failure: I believe a) we're not holding any locks
> > > over those two calls, and b) the stateid is globally visible ever since
> > > init_open_stateid. So by the time we call release_open_stateid(),
> > > another open could have found that stateid and be trying to work with
> > > it, right? Maybe I'm missing something.
> > >
> > > nfs4_get_vfs_file() is where an actual vfs open can happen, so it's
> > > doing a lot of work, and can sleep--so it seems a particularly likely
> > > point for someone else to race with us.
> > >
> >
> > In theory it's possible for that stateid to be found and used that way.
> > It really shouldn't happen though because it's a brand new stateid and
> > the client should have no idea what it actually is.
>
> nfsd4_find_existing_open() looks like it will still find such a stateid.
> I haven't tried to trace through what happens if it finds a stateid that
> then immediately has release_open_stateid called on it.
>
> --b.
>

Oh, good point. I guess what matters is whether you bump the refcount
before the it has already dropped to 0 or not. If you get it before,
then it should be pretty harmless (though the client will probably end
up in stateid recovery). If after, then it's likely to get torn down
out from under you.

Probably we ought to make that atomic_inc an atomic_inc_not_zero, and
simply return NULL if that returns false. Either that or make sure we
take the cl_lock before bumping the refcount.

Still, the problem seems to be in the openowner slab, not in the
stateid slab so far. It's possible that that is the bug, but I think
we'll need to figure out how that is happening if so.

> >
> > If that does happen though, it should be safe. release_open_stateid
> > does actually respect the refcount. It basically takes the lock,
> > unhashes it and then does a put on the stateid. If the refcount
> > goes to 0, then it puts it onto the reaplist to be freed later.
> >
> > So if that were to happen we might end up with a stale stateid, but
> > it really shouldn't, and it certainly ought not cause any real
> > refcounting problems.
> > --
> > Jeff Layton <[email protected]>


--
Jeff Layton <[email protected]>

2015-03-16 18:28:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Mon, Mar 16, 2015 at 11:58:45AM -0400, J. Bruce Fields wrote:
> > 3240 list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
> > 3241 spin_lock(&fp->fi_lock);
> > 3242 list_add(&stp->st_perfile, &fp->fi_stateids);
>
> I assume you're testing only NFS v4.1?

Exactly. I'm testing with a version of this patch applied to force 4.1:

http://git.infradead.org/users/hch/pnfs.git/commitdiff/72ef9b95aaed593ac061bb380bc27ced4fd67b4b

2015-03-21 14:07:03

by Jeff Layton

[permalink] [raw]
Subject: Re: nfsd use after free in 4.0-rc

On Mon, 16 Mar 2015 11:28:10 -0700
Christoph Hellwig <[email protected]> wrote:

> On Mon, Mar 16, 2015 at 11:58:45AM -0400, J. Bruce Fields wrote:
> > > 3240 list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
> > > 3241 spin_lock(&fp->fi_lock);
> > > 3242 list_add(&stp->st_perfile, &fp->fi_stateids);
> >
> > I assume you're testing only NFS v4.1?
>
> Exactly. I'm testing with a version of this patch applied to force 4.1:
>
> http://git.infradead.org/users/hch/pnfs.git/commitdiff/72ef9b95aaed593ac061bb380bc27ced4fd67b4b

I've been using 4.2, fwiw...

So far, this bug has turned out to be pretty elusive, I've looked all
over the so_count handling and I don't see anywhere that we've got a
refcount imbalance. I must be missing something, but it all looks right
AFAICT.

I've also instrumented the code to look for 0->1 transitions on the
so_count, and that hasn't fired. I also looked to see whether Bruce's
hunch about the nfsd4_find_existing_open thing might be a problem with
the sc_count going 0->1 since we're taking a reference there w/o
holding the cl_lock. I haven't seen that happen either.

Mostly when I see this bug without memory poisoning enabled, it
manifests itself as list corruption in one of the stateowner lists
during nfsd4_close. Curiously, the attached patch seems to make that
problem go away, but the generic/011 test seems to fail most of the
time with this in the log:

Cannot chdir out of pid directory: Stale file handle

...but if I turn up poisoning of the nfsd4_openowners slab then I get
an oops similar to what HCH is seeing.

--
Jeff Layton <[email protected]>


Attachments:
(No filename) (1.60 kB)
0001-nfsd-poison-so_owner.data-before-freeing-it.patch (0.00 B)
Download all attachments