Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp946353ybp; Wed, 9 Oct 2019 06:42:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqx5hg+FZ3F05kR8Z8/1IAZMDQoLVxBQ/ddgAYoRL/FPfONwzbSVU+zWpLcE7M9eKP4mKS26 X-Received: by 2002:a17:906:cd11:: with SMTP id oz17mr2904194ejb.71.1570628540633; Wed, 09 Oct 2019 06:42:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570628540; cv=none; d=google.com; s=arc-20160816; b=ykMQERETApEVvIUDjnw2NnqWk8Ix66K6ChTs6bLlHygLtGdMWKFMBFuuqPa48MOZHg YdPbAcSA6iVZ/FjuvkkYaDFyfiT/ONSW3T6Qpsw0VHjjN2FoyJpUujLpdvricW6GwQY0 zoL4HmKgr3hlFhnvNSAQwDWo3p9DjldRmebCnqGpufstJOTFD3l94J8DaF+8AUfMglKY zeg3E5eqB9lqn+4nOR3ZlGr/3Jlbbn6VkSTWdOuvOk0F9uFbcSbGVMVzwiQKgceK46a4 ldZ83tQ+geAIy3wwTeVooWyM2FFmwROB/5YqYb29/JeHH0ObVvpKMH3HQSAEaIM3kxl7 OaGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=MMFiJmeBxVWTY0YVFJrHwTLFQHtZ7hFP8mD0Ylvgcqw=; b=I7dhd7ddSa2GfxXoZklM1/BkuuWL8kM47Ts9u3cj7FaU8iZbY+RgMGSUccPWch5eBz qjlt06xkc2iVYKcIdiQeeZRINCt0qJyEkhE4sqyuk9ZFhczNYtF8orK/fmey6W1sUjuZ hg5JItHsJQAgjyDN4mBoDGcyCqnyaEmLDRMRmv5nrEGH8KLXp2pM3dd5ohVfqDCJXUKl JzzXc9e4g2oTTMMHwiz1sCiCZZvPL/mxtEmHV3Ny3VWxZ2XkHnX9y1OO05CxhRS5nRPl HiBV3sQFO2alPu7cQs/aNLx4c+YgA8M/kKHYRoUJ/jaKZq4rQ3mYFjRa/fiICnifl++p Etgg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f17si1472231eda.232.2019.10.09.06.41.50; Wed, 09 Oct 2019 06:42:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731345AbfJINl0 (ORCPT + 99 others); Wed, 9 Oct 2019 09:41:26 -0400 Received: from mx2.suse.de ([195.135.220.15]:58886 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731243AbfJINl0 (ORCPT ); Wed, 9 Oct 2019 09:41:26 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 15A07B186; Wed, 9 Oct 2019 13:41:24 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 727191E4851; Wed, 9 Oct 2019 15:41:23 +0200 (CEST) Date: Wed, 9 Oct 2019 15:41:23 +0200 From: Jan Kara To: Christoph Hellwig Cc: Jan Kara , Matthew Bobrowski , tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, david@fromorbit.com, darrick.wong@oracle.com Subject: Re: [PATCH v4 8/8] ext4: introduce direct I/O write path using iomap infrastructure Message-ID: <20191009134123.GE5050@quack2.suse.cz> References: <9ef408b4079d438c0e6071b862c56fc8b65c3451.1570100361.git.mbobrowski@mbobrowski.org> <20191008151238.GK5078@quack2.suse.cz> <20191009071145.GB32281@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191009071145.GB32281@infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed 09-10-19 00:11:45, Christoph Hellwig wrote: > On Tue, Oct 08, 2019 at 05:12:38PM +0200, Jan Kara wrote: > > Seeing how difficult it is when a filesystem wants to complete the iocb > > synchronously (regardless whether it is async or sync) and have all the > > information in one place for further processing, I think it would be the > > easiest to provide iomap_dio_rw_wait() that forces waiting for the iocb to > > complete *and* returns the appropriate return value instead of pretty > > useless EIOCBQUEUED. It is actually pretty trivial (patch attached). With > > this we can then just call iomap_dio_rw_sync() for the inode extension case > > with ->end_io doing just the unwritten extent processing and then call > > ext4_handle_inode_extension() from ext4_direct_write_iter() where we would > > have all the information we need. > > > > Christoph, Darrick, what do you think about extending iomap like in the > > attached patch (plus sample use in XFS)? > > I vaguely remember suggesting something like this but Brian or Dave > convinced me it wasn't a good idea. This will require a trip to the > xfs or fsdevel archives from when the inode_dio_wait was added in XFS. I think you refer to this [1] message from Brian: It's not quite that simple.. FWIW, the discussion (between Dave and I) for how best to solve this started offline prior to sending the patch and pretty much started with the idea of changing the async I/O to sync as you suggest here. I backed off from that because it's too subtle given the semantics between the higher level aio code and lower level dio code for async I/O. By that I mean either can be responsible for calling the ->ki_complete() callback in the iocb on I/O completion. IOW, if we receive an async direct I/O, clear ->ki_complete() as you describe above and submit it, the dio code will wait on I/O and return the size of the I/O on successful completion. It will not have called ->ki_complete(), however. Rather, the >0 return value indicates that aio_rw_done() must call ->ki_complete() after xfs_file_write_iter() returns, but we would have already cleared the function pointer. I think it is technically possible to use this technique by clearing and restoring ->ki_complete(), but in general we've visited this "change the I/O type" approach twice now and we've (collectively) got it wrong both times (the first error in thinking was that XFS would need to call ->ki_complete()). IMO, this demonstrates that it's not worth the complexity to insert ourselves into this dependency chain when we can accomplish the same thing with a simple dio wait call. --- Which is fair enough. I've been looking at the same and arrived at similar conclusion ;) (BTW, funnily enough ocfs2 seems to do this dance with clearing and restoring ki_complete). But what I propose is something different - just wait for IO in iomap_dio_rw() which avoids the need to clear & restore ->ki_complete() while at the same time while providing fully-sync IO experience to the caller. So Brians objection doesn't apply here. > But if we decide it actully works this time around please don't add the > __ variant but just add the parameter to iomap_dio_rw directly. Yeah, I was undecided on this one as well. Will change this and post the patches to fsdevel & xfs lists. Honza [1] https://lore.kernel.org/linux-xfs/20190325135124.GD52167@bfoster/ -- Jan Kara SUSE Labs, CR