Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp4529558imb; Wed, 6 Mar 2019 16:04:35 -0800 (PST) X-Google-Smtp-Source: APXvYqzlw0ARpe7Molx0rqDQjG11vvylcxQ4h1uCyaUcxZhlYC7+h1yyCybDU8Y9qygmFQtpFFJJ X-Received: by 2002:a65:4547:: with SMTP id x7mr1221318pgr.350.1551917075416; Wed, 06 Mar 2019 16:04:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551917075; cv=none; d=google.com; s=arc-20160816; b=ueiIeGZ6euVR/nW7WkPXk7qf4B4HL1L4OXYssjtYcDYMzvoEnNetOsYoJeoITXcGFg GvaO9FZIzo/DvmnA18e7RftuVB/GYHUzWKKJYUKjgNBrsPCtEO5FXG8n0uK+6HeLGA0d VL0ptC2ApsV4Qjhx9Y882pVL1A8NJsK9VKu6YwUfkPKuieVUg91lPlIpbB1wt5LQFNJD SphFViLNtdtx//ORjR8RLC6oC1fEC+PaZih9ZsY41Q0vrmQqEg3I9V3xRfefkbYhkD8j NNYCESIT0T6oVu0Ea+wQ0maTCUcZfR9oO0kckPr33uMS5qnlbuoK18zKptX4n4u/8vtq FCwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=S3z+LJ0vcJ9BTqisZ6/LN4ebFH8Dqhd+AwJotMNVCuc=; b=dHvmcn/IGzVlEZjndRhK1kuLdsAJiESUUreP6Bfq8kRZQNudFsb3Ud1NntmcjOScak NPLxgZCthknppBbSxg17YVN0iF3ifXpYRN3nv987JeixrnQHaeXIbYhl3atywrjTbH9X mHi1kxETAvmRLMY19VnDZ8CnM45AW7JXB7n8jvwwXMumyrOJCTbpUZCv5lEAFOZxOA6z U98EdYG+V6N1G7Z08grSOAFNR3BHdr21gFdgrmxr5PShkmdgsZQcY/k/WCUwpK0UPgTn JgfkqcEhKXJU1HSaWcGFaAPN1vf33WmRcdXD/PcmZG6lUthpKxFUVe7FRJ/F1mxSukCg 8bJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 72si2778030ple.250.2019.03.06.16.04.20; Wed, 06 Mar 2019 16:04:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726720AbfCGAD4 (ORCPT + 99 others); Wed, 6 Mar 2019 19:03:56 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:39968 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726226AbfCGADa (ORCPT ); Wed, 6 Mar 2019 19:03:30 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92 #3 (Red Hat Linux)) id 1h1gVJ-00086w-Bj; Thu, 07 Mar 2019 00:03:17 +0000 From: Al Viro To: Linus Torvalds Cc: Eric Dumazet , David Miller , Jason Baron , kgraul@linux.ibm.com, ktkhai@virtuozzo.com, kyeongdon.kim@lge.com, Linux List Kernel Mailing , Netdev , pabeni@redhat.com, syzkaller-bugs@googlegroups.com, xiyou.wangcong@gmail.com, Christoph Hellwig , zhengbin , bcrl@kvack.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, houtao1@huawei.com, yi.zhang@huawei.com Subject: [PATCH 4/8] aio_poll(): get rid of weird refcounting Date: Thu, 7 Mar 2019 00:03:12 +0000 Message-Id: <20190307000316.31133-4-viro@ZenIV.linux.org.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190307000316.31133-1-viro@ZenIV.linux.org.uk> References: <20190307000316.31133-1-viro@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Al Viro The only reason for taking the extra ref to iocb is that we want to access it after vfs_poll() and an early wakeup could have it already completed by the time vfs_poll() returns. That's very easy to avoid, though - we need to know which lock to grab and, having grabbed it, we need to check if an early wakeup has already happened. So let's just copy the reference to waitqueue into aio_poll_table and instead of having the "woken" flag in iocb, move it to aio_poll() stack frame and put its address into iocb (and make sure it's cleared, so aio_poll_wake() won't step onto it after aio_poll() exits). That's enough to get rid of the refcount. In async case freeing is done by aio_poll_complete() (and aio_poll() will only touch the iocb past the vfs_poll() if it's guaranteed that aio_poll_complete() has not happened yet), in all other cases we make sure that wakeups hadn't and won't happen and deal with disposal of iocb ourselves. Signed-off-by: Al Viro --- fs/aio.c | 55 +++++++++++++++++++++++++++---------------------------- 1 file changed, 27 insertions(+), 28 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 22b288997441..ee062253e303 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -180,8 +180,8 @@ struct fsync_iocb { struct poll_iocb { struct file *file; struct wait_queue_head *head; + bool *taken; __poll_t events; - bool woken; bool cancelled; struct wait_queue_entry wait; struct work_struct work; @@ -209,8 +209,6 @@ struct aio_kiocb { struct list_head ki_list; /* the aio core uses this * for cancellation */ - refcount_t ki_refcnt; - /* * If the aio_resfd field of the userspace iocb is not zero, * this is the underlying eventfd context to deliver events to. @@ -1034,7 +1032,6 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) percpu_ref_get(&ctx->reqs); req->ki_ctx = ctx; INIT_LIST_HEAD(&req->ki_list); - refcount_set(&req->ki_refcnt, 0); req->ki_eventfd = NULL; return req; } @@ -1069,13 +1066,10 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id) static inline void iocb_put(struct aio_kiocb *iocb) { - if (refcount_read(&iocb->ki_refcnt) == 0 || - refcount_dec_and_test(&iocb->ki_refcnt)) { - if (iocb->ki_filp) - fput(iocb->ki_filp); - percpu_ref_put(&iocb->ki_ctx->reqs); - kmem_cache_free(kiocb_cachep, iocb); - } + if (iocb->ki_filp) + fput(iocb->ki_filp); + percpu_ref_put(&iocb->ki_ctx->reqs); + kmem_cache_free(kiocb_cachep, iocb); } static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, @@ -1672,8 +1666,10 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, if (mask && !(mask & req->events)) return 0; - req->woken = true; - + if (unlikely(req->taken)) { + *req->taken = true; + req->taken = NULL; + } if (mask) { /* * Try to complete the iocb inline if we can. Use @@ -1698,6 +1694,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, struct aio_poll_table { struct poll_table_struct pt; + struct wait_queue_head *head; struct aio_kiocb *iocb; int error; }; @@ -1715,7 +1712,7 @@ aio_poll_queue_proc(struct file *file, struct wait_queue_head *head, } pt->error = 0; - pt->iocb->poll.head = head; + pt->head = pt->iocb->poll.head = head; add_wait_queue(head, &pt->iocb->poll.wait); } @@ -1738,7 +1735,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP; req->head = NULL; - req->woken = false; + req->taken = &async; req->cancelled = false; apt.pt._qproc = aio_poll_queue_proc; @@ -1750,36 +1747,38 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) INIT_LIST_HEAD(&req->wait.entry); init_waitqueue_func_entry(&req->wait, aio_poll_wake); - /* one for removal from waitqueue, one for this function */ - refcount_set(&aiocb->ki_refcnt, 2); - - mask = vfs_poll(req->file, &apt.pt) & req->events; - if (unlikely(!req->head)) { + mask = req->events; + mask &= vfs_poll(req->file, &apt.pt); + /* + * Careful: we might've been put into waitqueue *and* already + * woken up before vfs_poll() returns. The caller is holding + * a reference to file, so it couldn't have been killed under + * us, no matter what. However, in case of early wakeup, @req + * might be already gone by now. + */ + if (unlikely(!apt.head)) { /* we did not manage to set up a waitqueue, done */ goto out; } - spin_lock_irq(&ctx->ctx_lock); - spin_lock(&req->head->lock); - if (req->woken) { /* already taken up by aio_poll_wake() */ - async = true; + spin_lock(&apt.head->lock); + if (async) { /* already taken up by aio_poll_wake() */ apt.error = 0; } else if (!mask && !apt.error) { /* actually waiting for an event */ list_add_tail(&aiocb->ki_list, &ctx->active_reqs); aiocb->ki_cancel = aio_poll_cancel; + req->taken = NULL; async = true; } else { /* if we get an error or a mask we are done */ WARN_ON_ONCE(list_empty(&req->wait.entry)); list_del_init(&req->wait.entry); /* no wakeup in the future either; aiocb is ours to dispose of */ } - spin_unlock(&req->head->lock); + spin_unlock(&apt.head->lock); spin_unlock_irq(&ctx->ctx_lock); - out: - if (async && !apt.error) + if (!async && !apt.error) aio_poll_complete(aiocb, mask); - iocb_put(aiocb); return apt.error; } -- 2.11.0