Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp2150410ybl; Thu, 29 Aug 2019 04:23:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqx7YbaOWEcCGibfEAZWNo1vlOEm94q9l7eQ3amJtyzWQCoBH481gMXN1qs5KxH9Skqw97cy X-Received: by 2002:a17:902:f095:: with SMTP id go21mr9540801plb.216.1567077785774; Thu, 29 Aug 2019 04:23:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567077785; cv=none; d=google.com; s=arc-20160816; b=GvBzm8RziiTqO81cQrxenYikUxbCQ1SS7Euw7t7q+pazw448ZAQ75pvxrl7Jmi2Pbt fgqvwWMUCehDJpDrIsnE1NGIDiQx/WqDS0/f5SGYZJuEz0IUWFfYbXqV0cOdMJwmIbZm J/ikxsO79TN60Dq1v1fnZySOplwxRtDRs5BFIXRVEc+GGOxSU2Kk7adMRGbSzjdZz2C3 ggFduA3+OJJDCgV2OY/gYVvyAlOQHJ60vru3GdUEHOTeyrGqiX6XNoDfC2RXu+SnEs+s +d7VW55uV1bpLzvlZhtrNQG4kywA1OjHCWYVucQ3FHv4o/y/c6JdQYctoL9ecExjpQFP umVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=MO1at3htvy+QBiGMsZ4p3IurvkhEqFEhhsP44+X95CE=; b=F6g4R9KF5mGMCbPcDCpfmkWYQAbeLwFJogElddzJ/jFltsoqSiNQ58UYzm3H+UX1Vn 0hfof4js1GrH4/7GGrlqIhkJXYLdfC/FNsevqQWxyeYUsiwS3aVXFh4RDYaetHAWo9cG mvrU8mjaC9r9Ia3O5D+UP6mMhbdNDx9UnFBr1ZUkbsuFLxBLbjph53PaKCP3EXpKQvn6 M55NE8RLMks53L8cS9j2WMBvpzCHoBmiWnTiOwadJNaA692kbL2WTBWUOkLhqtLOfJpx yAQCZ9ZixBz4X7bJWg83ZOvJoWcH/wbRWWvUZTa7prVJ4OeNmDB1J2ZOaaGFCKo0RvUD 2sng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mbobrowski-org.20150623.gappssmtp.com header.s=20150623 header.b=rJojcqdn; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 123si2499468pfy.61.2019.08.29.04.22.44; Thu, 29 Aug 2019 04:23:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@mbobrowski-org.20150623.gappssmtp.com header.s=20150623 header.b=rJojcqdn; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727118AbfH2LU6 (ORCPT + 99 others); Thu, 29 Aug 2019 07:20:58 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:36788 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726232AbfH2LU6 (ORCPT ); Thu, 29 Aug 2019 07:20:58 -0400 Received: by mail-pg1-f195.google.com with SMTP id l21so1437079pgm.3 for ; Thu, 29 Aug 2019 04:20:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=MO1at3htvy+QBiGMsZ4p3IurvkhEqFEhhsP44+X95CE=; b=rJojcqdn4jRu5pAqPMUfE93udKprNxEHdg/UMkhlc2PipSnotKuR429QUIPq+cn3cV CfBv9d7MeYJZxSeJ/ux5WjlYNSyTOUeK+exsWcN01/HwgXkPwiMKlVspY1lFg1T0LZUo KY/UTYXyWNOa+PtCaX69THNbsOp9QaZGboYOwoGc2f6Jsj1i1IocSoFpr+AI2KsKKkMp j/FxR6PwtOMR63Oio7sZn5jYz00HzE6dji8DCgzzToWMOE4e7meFLJMtVnkx3H+oLqrv V+YUt1P/rZoIGoCkdyGOrmzSF0yxYNkFfR9i1I1yNvVLmtZobazbEI+cR8xCTHG/xGY5 J0rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=MO1at3htvy+QBiGMsZ4p3IurvkhEqFEhhsP44+X95CE=; b=bELi19futdSt9vyKMl69RxIe117KFZgDhFyoGWgrRJ+eR3nM9P8Mn9JE/vyJpQL5wW orPkMVd/3akP/OJXutdeKeWFMN8ODis5c4UkLaMbropy9sIElOijOGr+WNwAN5QY/8bh JCKHkzvJMhxMyu1cgu/Eh7JLzHYsv0HiuckyNkI2KSb7kglAB04TjhQEztC44kTDuA5h ZhBhf9G/IAAc/9am/bu2uYDtHJrCaLrCUq2G1NaFR1hzW/VkYaXACI7kbhoPG0LMXNue /P97og6iDlqoHxLwVqCbZm5Bf9GgSMFr199J24vtSrEpXaAfLNo17gYmrS0OV56QSQvZ afFA== X-Gm-Message-State: APjAAAWwyx+XRvmIoKGTA+CqZtLr5Si1pGQ51jBWiYB0DHeJKkJ12baD urtz9S/sFSTPCKwKv+UOUFGhQmSv1epho24= X-Received: by 2002:a17:90a:1a8d:: with SMTP id p13mr9579500pjp.15.1567077657344; Thu, 29 Aug 2019 04:20:57 -0700 (PDT) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id f26sm3055950pfq.38.2019.08.29.04.20.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Aug 2019 04:20:56 -0700 (PDT) Date: Thu, 29 Aug 2019 21:20:50 +1000 From: Matthew Bobrowski To: Christoph Hellwig Cc: Jan Kara , "Theodore Y. Ts'o" , "Darrick J. Wong" , Ritesh Harjani , adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, aneesh.kumar@linux.ibm.com Subject: Re: [PATCH 0/5] ext4: direct IO via iomap infrastructure Message-ID: <20190829112048.GA2486@poseidon.bobrowski.net> References: <20190822120015.GA3330@poseidon.bobrowski.net> <20190822141126.70A94A407B@d06av23.portsmouth.uk.ibm.com> <20190824031830.GB2174@poseidon.bobrowski.net> <20190824035554.GA1037502@magnolia> <20190824230427.GA32012@infradead.org> <20190827095221.GA1568@poseidon.bobrowski.net> <20190828120509.GC22165@poseidon.bobrowski.net> <20190828142729.GB24857@mit.edu> <20190828180215.GE22343@quack2.suse.cz> <20190829063608.GA17426@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190829063608.GA17426@infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Awesome, and thank you *all* for your very valueable input. On Wed, Aug 28, 2019 at 11:36:08PM -0700, Christoph Hellwig wrote: > On Wed, Aug 28, 2019 at 08:02:15PM +0200, Jan Kara wrote: > > > The original reason why we created the DIO_STATE_UNWRITTEN flag was a > > > fast path, where the common case is writing blocks to an existing > > > location in a file where the blocks are already allocated, and marked > > > as written. So consulting the on-disk extent tree to determine > > > whether unwritten extents need to be converted and/or split is > > > certainly doable. However, it's expensive for the common case. So > > > having a hint whether we need to schedule a workqueue to possibly > > > convert an unwritten region is helpful. If we can just free the bio > > > and exit the I/O completion handler without having to take shared > > > locks to examine the on-disk extent tree, so much the better. > > > > Yes, but for determining whether extent conversion on IO completion is > > needed we now use IOMAP_DIO_UNWRITTEN flag iomap infrastructure provides to > > us. So we don't have to track this internally in ext4 anymore. > > Exactly. As mentioned before the ioend to track unwritten thing was > in XFS by the time ext4 copied the ioend approach. but we actually got > rid of that long before the iomap conversion. Maybe to make everything > easier to understand and bisect you might want to get rid of the ioend > for direct I/O in ext4 as a prep path as well. > > The relevant commit is: 273dda76f757 ("xfs: don't use ioends for direct > write completions") Uh ha! So, we conclude that there's no need to muck around with hairy ioend's, or the need to denote whether there's unwritten extents held against the inode using tricky state flag for that matter. > > > To be honest, i'm not 100% sure what would happen if we removed that > > > restriction; it might be that things would work just fine (just slower > > > in some workloads), or whether there is some hidden dependency that > > > would explode. I suspect we'd have to try the experiment to be sure. > > > > As far as I remember the concern was that extent split may need block > > allocation and we may not have enough free blocks to do it. These days we > > have some blocks reserved in the filesystem to accomodate unexpected extent > > splits so this shouldn't happen anymore so the only real concern is the > > wasted performance due to unnecessary extent merge & split. Kind of a > > stress test for this would be to fire of lots of sequential AIO DIO > > requests against a hole in a file. > > Well, you can always add a don't merge flag to the actual allocation. > You might still get a merge for pathological case (fallocate adjacent > to a dio write just submitted), but if the merging is such a performance > over head here is easy ways to avoid it for the common case. After I've posted through the next version of this patch series, I will attempt to perform some stress testing to see what the performance hit could potentially be.