Date: Mon, 16 Mar 2015 08:20:04 -0400
From: Jeff Layton <jeff.layton@primarydata.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: nfsd use after free in 4.0-rc
Message-ID: <20150316082004.348e39af@tlielax.poochiereds.net>
In-Reply-To: <20150316114648.GA7432@infradead.org>
References: <20150315125614.GA766@infradead.org>
	<20150315180811.02847842@tlielax.poochiereds.net>
	<20150316114648.GA7432@infradead.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Sender: linux-nfs-owner@vger.kernel.org

On Mon, 16 Mar 2015 04:46:48 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> On Sun, Mar 15, 2015 at 06:08:11PM -0400, Jeff Layton wrote:
> > Could you run gdb against nfsd.ko and do a:
> > 
> >     list *(nfsd4_process_open2+0x1de)
> > 
> > I'd be interesting to see where it crashed. My suspicion would be
> > trying to lock the cl->cl_lock, but I can't tell for sure (and from
> > where).
> 
> That's deep inside the spinlock assembly code, but if I got back far
> enough I get here:
> 
> (gdb) l *(nfsd4_process_open2+0x1c6)
> 0xffffffff813c6026 is in nfsd4_process_open2
> (../fs/nfsd/nfs4state.c:3238).
> 3233		stp->st_stateowner = nfs4_get_stateowner(&oo->oo_owner);
> 3234		get_nfs4_file(fp);
> 3235		stp->st_stid.sc_file = fp;
> 3236		stp->st_access_bmap = 0;
> 3237		stp->st_deny_bmap = 0;
> 3238		stp->st_openstp = NULL;
> 3239		spin_lock(&oo->oo_owner.so_client->cl_lock);

Thanks.

Yep, makes sense. oo is probably corrupt, so once you try to
dereference the so_client pointer embedded within, it barfs. I was also
able to reproduce some list corruption in the openowner slabcache
yesterday evening too using that test.

I just tried a v3.19 kernel on the server and could reproduce this
there with generic/011 as well, so this looks like a preexisting bug of
some sort. Perhaps the recent client changes to allow parallel opens
are helping to expose it?

In any case, it looks likely to be an openstate refcount imbalance,
where we're putting the last reference while there are still users of
it. The openowner references are mostly owned by the stateids that
are associated with them. It might be interesting to turn up poisoning
of the stateid_slab as well...

> 3240		list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
> 3241		spin_lock(&fp->fi_lock);
> 3242		list_add(&stp->st_perfile, &fp->fi_stateids);


-- 
Jeff Layton <jeff.layton@primarydata.com>