Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp2111706imb; Sun, 3 Mar 2019 18:37:17 -0800 (PST) X-Google-Smtp-Source: APXvYqwZxnMHdSptlK9SSk9Tx9LCuup7bo/AdW7bSF5rthi6nzcwatDDv7g+AMFXO0VveTYmZ0h9 X-Received: by 2002:a62:8c:: with SMTP id 134mr3835201pfa.27.1551667037619; Sun, 03 Mar 2019 18:37:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551667037; cv=none; d=google.com; s=arc-20160816; b=mcpOnd/bU3wyu3wU+yttKRCybkJXVDIKXyiPkOGiSNLcEnk240o1esZPETtAOEcDoI 9ydmhFgze81WW3l1KhTDlN8PehFXo0xPV6Le3+GEgJkjkF2RID+1iibLsYPy97XF2VAh 9tso73o/ryX0HZOs0hhudckW1cdVyExV7yqOQsVAJtPr9XQrHZcb8Zi1IwxAx2ZxF7aQ 9c4TEzpGO+pCVt+Pvl9wyRsRkX2+F37FiH/slEI2BOiL07YrKFphOIfUO4cUp+yV4Miw 7PqdpQQok2QT4JXeSUBioLojw56OTOeo//y3e6K4PcDf+eOYvMYGPy87vIwFHRLzk6QA GBDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=bG7RwcQ1wAcJicDFNnOyzsUeO9G/rxL+U5OEvP8MWTI=; b=QmEDynDh/HuDmgYIctJebxZC0tNEDFEqUxtKuFC7n8x2DrnFNoHsiWb4tOtIQjBPqs 4hCEoNNE9sn4OW7lufYC7GS54OKVrsCfQYEtKy8NHuLUUXbCazuFxF0nIdgFhpLbcn5h /nahruLCflQcNf+3SKpmqm5PgBrZecTB7N3lOJbmAwlyC0LRHSHo8CeQLlznQURcYdBf 8/hu08mWlj7gQIbBDUl0cR0sCqQmMVRIh9GYt/x1aDaBsBQ4zVS0fANYVLxmHbKg1Vwa q08mo1mcWciJTAZDqziQo/T0DNDoW60zlYwBnzdmLMeQuFsMpZTIJaNRIXvSITFKd7uo oFjg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q189si4189530pga.544.2019.03.03.18.37.02; Sun, 03 Mar 2019 18:37:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726216AbfCDCgf (ORCPT + 99 others); Sun, 3 Mar 2019 21:36:35 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:43660 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725933AbfCDCgf (ORCPT ); Sun, 3 Mar 2019 21:36:35 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92 #3 (Red Hat Linux)) id 1h0dSk-00032K-A5; Mon, 04 Mar 2019 02:36:18 +0000 Date: Mon, 4 Mar 2019 02:36:18 +0000 From: Al Viro To: Linus Torvalds Cc: Eric Dumazet , David Miller , Jason Baron , kgraul@linux.ibm.com, ktkhai@virtuozzo.com, kyeongdon.kim@lge.com, Linux List Kernel Mailing , Netdev , pabeni@redhat.com, syzkaller-bugs@googlegroups.com, xiyou.wangcong@gmail.com, Christoph Hellwig Subject: Re: [PATCH] aio: prevent the final fput() in the middle of vfs_poll() (Re: KASAN: use-after-free Read in unix_dgram_poll) Message-ID: <20190304023618.GS2217@ZenIV.linux.org.uk> References: <000000000000f39c7b05832e0219@google.com> <20190303135502.GP2217@ZenIV.linux.org.uk> <20190303151846.GQ2217@ZenIV.linux.org.uk> <20190303203011.GR2217@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Mar 03, 2019 at 02:23:33PM -0800, Linus Torvalds wrote: OK, having dug through the archives, the reasons were not strong. So that part is OK... > @@ -1060,6 +1071,8 @@ static inline void iocb_put(struct aio_kiocb *iocb) > { > if (refcount_read(&iocb->ki_refcnt) == 0 || > refcount_dec_and_test(&iocb->ki_refcnt)) { > + if (iocb->ki_filp) > + fput(iocb->ki_filp); > percpu_ref_put(&iocb->ki_ctx->reqs); > kmem_cache_free(kiocb_cachep, iocb); > } That reminds me - refcount_t here is rather ridiculous; what we have is * everything other than aio_poll: ki_refcnt is 0 all along * aio_poll: originally 0, then set to 2, then two iocb_put() are done (either both synchronous to aio_poll(), or one sync and one async). That's not a refcount at all. It's a flag, set only for aio_poll ones. And that test in iocb_put() is "if flag is set, clear it and bugger off". What's worse, AFAICS the callers in aio_poll() are buggered (see below) > static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask) > { > - struct file *file = iocb->poll.file; > - > aio_complete(iocb, mangle_poll(mask), 0); > - fput(file); > } No reasons to keep that function at all now... > - if (unlikely(apt.error)) { > - fput(req->file); > + if (unlikely(apt.error)) > return apt.error; > - } > > if (mask) > aio_poll_complete(aiocb, mask); Looking at that thing... How does it manage to avoid leaks when we try to use it on e.g. /dev/tty, which has poll_wait(file, &tty->read_wait, wait); poll_wait(file, &tty->write_wait, wait); in n_tty_poll()? AFAICS, we'll succeed adding it to the first queue, then have aio_poll_queue_proc() fail and set apt.error to -EINVAL. Suppose we are looking for EPOLLIN and there's nothing ready to read. We'll go mask = vfs_poll(req->file, &apt.pt) & req->events; mask is 0. if (unlikely(!req->head)) { nope - it's &tty->read_wait, not NULL /* we did not manage to set up a waitqueue, done */ goto out; } spin_lock_irq(&ctx->ctx_lock); spin_lock(&req->head->lock); if (req->woken) { nope - no wakeups so far /* wake_up context handles the rest */ mask = 0; apt.error = 0; } else if (mask || apt.error) { apt.error is non-zero /* if we get an error or a mask we are done */ WARN_ON_ONCE(list_empty(&req->wait.entry)); list_del_init(&req->wait.entry); OK, removed from queue } else { /* actually waiting for an event */ list_add_tail(&aiocb->ki_list, &ctx->active_reqs); aiocb->ki_cancel = aio_poll_cancel; } spin_unlock(&req->head->lock); spin_unlock_irq(&ctx->ctx_lock); out: if (unlikely(apt.error)) { fput(req->file); return apt.error; ... and we return -EINVAL to __io_submit_one(), where we hit /* * If ret is 0, we'd either done aio_complete() ourselves or have * arranged for that to be done asynchronously. Anything non-zero * means that we need to destroy req ourselves. */ if (ret) goto out_put_req; return 0; out_put_req: if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: put_reqs_available(ctx, 1); return ret; and all knowledge of req is lost. But we'd done only *one* iocb_put() in that case, and ->ki_refcnt had been set to 2 by aio_poll(). How could it avoid a leak? The same goes for "->poll() returns something without bothering to call poll_wait()" case, actually... IOW, I would rather have aio_poll() (along with your fput-a-bit-later change) do this - out: if (mask && !apt.error) aio_complete(aiocb, mangle_poll(mask), 0); iocb_put(aiocb); return apt.error; Comments?