Return-Path: Received: from mail-yk0-f178.google.com ([209.85.160.178]:35543 "EHLO mail-yk0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750822AbbCPMUI (ORCPT ); Mon, 16 Mar 2015 08:20:08 -0400 Received: by ykfs63 with SMTP id s63so16735499ykf.2 for ; Mon, 16 Mar 2015 05:20:07 -0700 (PDT) Date: Mon, 16 Mar 2015 08:20:04 -0400 From: Jeff Layton To: Christoph Hellwig Cc: linux-nfs@vger.kernel.org Subject: Re: nfsd use after free in 4.0-rc Message-ID: <20150316082004.348e39af@tlielax.poochiereds.net> In-Reply-To: <20150316114648.GA7432@infradead.org> References: <20150315125614.GA766@infradead.org> <20150315180811.02847842@tlielax.poochiereds.net> <20150316114648.GA7432@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: On Mon, 16 Mar 2015 04:46:48 -0700 Christoph Hellwig wrote: > On Sun, Mar 15, 2015 at 06:08:11PM -0400, Jeff Layton wrote: > > Could you run gdb against nfsd.ko and do a: > > > > list *(nfsd4_process_open2+0x1de) > > > > I'd be interesting to see where it crashed. My suspicion would be > > trying to lock the cl->cl_lock, but I can't tell for sure (and from > > where). > > That's deep inside the spinlock assembly code, but if I got back far > enough I get here: > > (gdb) l *(nfsd4_process_open2+0x1c6) > 0xffffffff813c6026 is in nfsd4_process_open2 > (../fs/nfsd/nfs4state.c:3238). > 3233 stp->st_stateowner = nfs4_get_stateowner(&oo->oo_owner); > 3234 get_nfs4_file(fp); > 3235 stp->st_stid.sc_file = fp; > 3236 stp->st_access_bmap = 0; > 3237 stp->st_deny_bmap = 0; > 3238 stp->st_openstp = NULL; > 3239 spin_lock(&oo->oo_owner.so_client->cl_lock); Thanks. Yep, makes sense. oo is probably corrupt, so once you try to dereference the so_client pointer embedded within, it barfs. I was also able to reproduce some list corruption in the openowner slabcache yesterday evening too using that test. I just tried a v3.19 kernel on the server and could reproduce this there with generic/011 as well, so this looks like a preexisting bug of some sort. Perhaps the recent client changes to allow parallel opens are helping to expose it? In any case, it looks likely to be an openstate refcount imbalance, where we're putting the last reference while there are still users of it. The openowner references are mostly owned by the stateids that are associated with them. It might be interesting to turn up poisoning of the stateid_slab as well... > 3240 list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); > 3241 spin_lock(&fp->fi_lock); > 3242 list_add(&stp->st_perfile, &fp->fi_stateids); -- Jeff Layton