Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933297Ab1C3WBn (ORCPT ); Wed, 30 Mar 2011 18:01:43 -0400 Received: from mga14.intel.com ([143.182.124.37]:40747 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933263Ab1C3VGa (ORCPT ); Wed, 30 Mar 2011 17:06:30 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.63,270,1299484800"; d="scan'208";a="411276975" From: Andi Kleen References: <20110330203.501921634@firstfloor.org> In-Reply-To: <20110330203.501921634@firstfloor.org> To: chuck.lever@oracle.com, Trond.Myklebust@netapp.com, gregkh@suse.de, ak@linux.intel.com, linux-kernel@vger.kernel.org, stable@kernel.org, tim.bird@am.sony.com Subject: [PATCH] [51/275] NFS: Fix "kernel BUG at fs/aio.c:554!" Message-Id: <20110330210447.623F53E1A05@tassilo.jf.intel.com> Date: Wed, 30 Mar 2011 14:04:47 -0700 (PDT) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3469 Lines: 109 2.6.35-longterm review patch. If anyone has any objections, please let me know. ------------------ From: Chuck Lever commit 839f7ad6932d95f4d5ae7267b95c574714ff3d5b upstream. Nick Piggin reports: > I'm getting use after frees in aio code in NFS > > [ 2703.396766] Call Trace: > [ 2703.396858] [] ? native_sched_clock+0x27/0x80 > [ 2703.396959] [] ? put_lock_stats+0xe/0x40 > [ 2703.397058] [] ? lock_release_holdtime+0xa8/0x140 > [ 2703.397159] [] lock_acquire+0x95/0x1b0 > [ 2703.397260] [] ? aio_put_req+0x2b/0x60 > [ 2703.397361] [] ? get_parent_ip+0x11/0x50 > [ 2703.397464] [] _raw_spin_lock_irq+0x41/0x80 > [ 2703.397564] [] ? aio_put_req+0x2b/0x60 > [ 2703.397662] [] aio_put_req+0x2b/0x60 > [ 2703.397761] [] do_io_submit+0x2be/0x7c0 > [ 2703.397895] [] sys_io_submit+0xb/0x10 > [ 2703.397995] [] system_call_fastpath+0x16/0x1b > > Adding some tracing, it is due to nfs completing the request then > returning something other than -EIOCBQUEUED, so aio.c > also completes the request. To address this, prevent the NFS direct I/O engine from completing async iocbs when the forward path returns an error without starting any I/O. This fix appears to survive ^C during both "xfstest no. 208" and "fsx -Z." It's likely this bug has existed for a very long while, as we are seeing very similar symptoms in OEL 5. Copying stable. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen --- fs/nfs/direct.c | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-) Index: linux-2.6.35.y/fs/nfs/direct.c =================================================================== --- linux-2.6.35.y.orig/fs/nfs/direct.c 2011-03-29 22:51:49.201464226 -0700 +++ linux-2.6.35.y/fs/nfs/direct.c 2011-03-29 23:02:59.231319840 -0700 @@ -402,15 +402,18 @@ pos += vec->iov_len; } + /* + * If no bytes were started, return the error, and let the + * generic layer handle the completion. + */ + if (requested_bytes == 0) { + nfs_direct_req_release(dreq); + return result < 0 ? result : -EIO; + } + if (put_dreq(dreq)) nfs_direct_complete(dreq); - - if (requested_bytes != 0) - return 0; - - if (result < 0) - return result; - return -EIO; + return 0; } static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov, @@ -830,15 +833,18 @@ pos += vec->iov_len; } + /* + * If no bytes were started, return the error, and let the + * generic layer handle the completion. + */ + if (requested_bytes == 0) { + nfs_direct_req_release(dreq); + return result < 0 ? result : -EIO; + } + if (put_dreq(dreq)) nfs_direct_write_complete(dreq, dreq->inode); - - if (requested_bytes != 0) - return 0; - - if (result < 0) - return result; - return -EIO; + return 0; } static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/