Return-Path: Received: from mail-ww0-f44.google.com ([74.125.82.44]:50734 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752857Ab1ANVxQ convert rfc822-to-8bit (ORCPT ); Fri, 14 Jan 2011 16:53:16 -0500 In-Reply-To: <4E70BD9B-DB23-42EC-B28D-998E16FEC189@oracle.com> References: <4E70BD9B-DB23-42EC-B28D-998E16FEC189@oracle.com> Date: Sat, 15 Jan 2011 08:48:24 +1100 Message-ID: Subject: Re: NFS dio aio bug From: Nick Piggin To: Chuck Lever Cc: Trond Myklebust , linux-fsdevel , Linux NFS Mailing List Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Sat, Jan 15, 2011 at 3:08 AM, Chuck Lever wrote: > Hi Nick- > > On Jan 13, 2011, at 8:29 PM, Nick Piggin wrote: > >> Hi Trond, >> >> I'm getting use after frees in aio code in NFS > > Can you describe how to reproduce this? It was with the aio-dio stress code from xfstests, #207 I think or 208. Running it for a short time and then ^C ing it would tend to trigger it. I'll have to get you more details after I come back from travelling in a week's time. > >> [ 2703.396766] Call Trace: >> [ 2703.396858] ?[] ? native_sched_clock+0x27/0x80 >> [ 2703.396959] ?[] ? put_lock_stats+0xe/0x40 >> [ 2703.397058] ?[] ? lock_release_holdtime+0xa8/0x140 >> [ 2703.397159] ?[] lock_acquire+0x95/0x1b0 >> [ 2703.397260] ?[] ? aio_put_req+0x2b/0x60 >> [ 2703.397361] ?[] ? get_parent_ip+0x11/0x50 >> [ 2703.397464] ?[] _raw_spin_lock_irq+0x41/0x80 >> [ 2703.397564] ?[] ? aio_put_req+0x2b/0x60 >> [ 2703.397662] ?[] aio_put_req+0x2b/0x60 >> [ 2703.397761] ?[] do_io_submit+0x2be/0x7c0 >> [ 2703.397895] ?[] sys_io_submit+0xb/0x10 >> [ 2703.397995] ?[] system_call_fastpath+0x16/0x1b >> >> Adding some tracing, it is due to nfs completing the request then >> returning something other than -EIOCBQUEUED, so aio.c >> also completes the request. > > Is this with reads, writes, or both? ?Are the I/O requests smaller than, equal to, or larger than rsize or wsize? > > We have a related bug report: hitting the BUG at fs/aio.c:552 (OEL5) and similar for more recent kernels. ?Looks like dreq refcounting is faulty somehow. I only saw it with writes. The request was being completed in nfs direct write path when I added some tracing. It was very easy to reproduce, I just didn't have time to bisect it, but I can do that when I get back if you don't have it solved by then.