Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp5563617imb; Thu, 7 Mar 2019 19:38:20 -0800 (PST) X-Google-Smtp-Source: APXvYqx79bdBHlokbjzV8muf7m5gD1O9YdX7y/KqCjl+u64i5wCFb/cHHwF7AQj2dMpz711rJ5Le X-Received: by 2002:a65:56cb:: with SMTP id w11mr7206033pgs.374.1552016300461; Thu, 07 Mar 2019 19:38:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1552016300; cv=none; d=google.com; s=arc-20160816; b=XyBW52lJyQUC3c6ba9W86ajwgnMSWhNT0U8KF60ItX04rGJlHPNqvdJd8ee/TXVRni hlpp40w9YloPULCB0gF2UqHnUXceiaYAbtpDHysPH0PDlUiigOz1GnhGgz3m7AuCN3Iq U1A0SyE5drJK3MwnAon4+2FPyb5jo8YNSq4xCW/0GytMMBJG1ORwrbmyA/wcWTpGpLPU 2sv6U9sXvE59Rsfw2OJ5AFkVYcWnHtrldLgEYnW+yCxfAQfHAGh01B5Pdsoj+Rsku+7Y blglVCC5EwfqcelR3pgecAB6TuIkaRbIppCO3Jmuyon74VGD9MpbCtbgZy5A5OGeEPOC ZNHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=UeKB69Or5WCBxA4vQ6a0z09XVf7xLFIsHCmp+lnBuwQ=; b=U9jSbXASKyy1ZazMNuZr//L/nxza0SOkKWwJ57EmcOsbC4a9Bxza661tIsN+IZny2N A2K9IaHefAwco9c0V7e67epi/08a2xx582SoxklLvClbvz8JLyXdkxYdSZ8c5Z8rWXCg YDJ7EBafBi9/khZvwZRShski9XL9VbzILIoNRi4JgEks3HYr5bR2Qq105bymTC+IkTlx HjoUKuldd8VRA0jLHqwwuMUZ/NXxSxLRAZBCyqSXOzGcOlPdEreA5aPMIAZ7j9WczuS9 4MbFGR8dwIfrT8nJt+fAts1yuP9J46kTggZI3y/emPMWWQrJrjHvj2yfKBa6JTgEgx3V 57jQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g10si5436827pgk.395.2019.03.07.19.38.04; Thu, 07 Mar 2019 19:38:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726378AbfCHDhH (ORCPT + 99 others); Thu, 7 Mar 2019 22:37:07 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:59130 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726243AbfCHDhG (ORCPT ); Thu, 7 Mar 2019 22:37:06 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92 #3 (Red Hat Linux)) id 1h26JW-0007sr-Pl; Fri, 08 Mar 2019 03:36:50 +0000 Date: Fri, 8 Mar 2019 03:36:50 +0000 From: Al Viro To: Linus Torvalds Cc: Eric Dumazet , David Miller , Jason Baron , kgraul@linux.ibm.com, ktkhai@virtuozzo.com, kyeongdon kim , Linux List Kernel Mailing , Netdev , pabeni@redhat.com, syzkaller-bugs@googlegroups.com, Cong Wang , Christoph Hellwig , zhengbin , bcrl@kvack.org, linux-fsdevel , linux-aio@kvack.org, houtao1@huawei.com, yi.zhang@huawei.com Subject: Re: [PATCH 1/8] aio: make sure file is pinned Message-ID: <20190308033650.GD2217@ZenIV.linux.org.uk> References: <20190307000316.31133-1-viro@ZenIV.linux.org.uk> <20190307004159.GY2217@ZenIV.linux.org.uk> <20190307004828.GZ2217@ZenIV.linux.org.uk> <20190307012036.GA2217@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 06, 2019 at 05:30:21PM -0800, Linus Torvalds wrote: > On Wed, Mar 6, 2019 at 5:20 PM Al Viro wrote: > > > > I'll try to massage that series on top of your patch; I still hate the > > post-vfs_poll() logics in aio_poll() ;-/ Give me about half an hour > > and I'll have something to post. > > No inherent hurry, I sent the ping just to make sure it hadn't gotten lost. > > And yeah, I think the post-vfs_poll() logic cannot possibly be > necessary. My gut feel is that *if* we have the refcounting right, > then we should be able to just let the wakeup come in at any later > point, and ordering shouldn't matter all that much, and we shouldn't > even need any locking. > > I'd like to think that it can be done with something like "just 'or' > in the mask atomically" (so that we don't care about ordering between > the synchronous vfs_poll() and the async poll wakeup), together with > "when refcount goes to zero, finish the thing off and complete it" (so > that we don't care who finishes first). > > No "woken" logic, no "who fired first" logic, no BS. Just make the > operations work regardless of ordering. > > And maybe it can't be done. But the current model seems just so hacky > that it can't be the right model. Umm... It is kinda-sorta doable; we do need something vaguely similar to ->woken ("should we add it to the list of cancellables, or is the async reference already gone?"), but other than that it seems to be feasible. See vfs.git#work.aio; the crucial bits are in these commits: keep io_event in aio_kiocb get rid of aio_complete() res/res2 arguments move aio_complete() to final iocb_put(), try to fix aio_poll() logics The first two are preparations, the last is where the fixes (hopefully) happen. The logics in aio_poll() after vfs_poll(): * we might want to steal the async reference (e.g. due to event returned from the very beginning, or due to attempt to put on more than one waitqueue, which makes results unreliable). That's _NOT_ possible if the thing had been put on a waitqueue, but currently isn't there. It might be either due to early wakeup having done everything or the same having scheduled aio_poll_complete_work(). In either case, the best we can do is to ignore the return value of vfs_poll() and, in case of error, mark the sucker cancelled. We *can't* return an error in that case. * if we want and can steal the async reference, rip it from waitqueue; otherwise, put it on the "cancellable" list, unless it's already gone or unless we are simulating the cancel ourselves. * if vfs_poll() has reported something we want and we have successufully stolen the iocb, put it there, have the reference we'd taken over dropped and return 0 Comments?