From: "Darrick J. Wong" Subject: Re: [PATCH v7.1] block: Coordinate flush requests Date: Wed, 19 Jan 2011 00:58:25 -0800 Message-ID: <20110119085825.GG27381@tux1.beaverton.ibm.com> References: <20110113025646.GB27381@tux1.beaverton.ibm.com> <20110113104240.GA30719@htj.dyndns.org> <20110113185946.GD27381@tux1.beaverton.ibm.com> <20110115133344.GB27123@htj.dyndns.org> Reply-To: djwong@us.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jens Axboe , "Theodore Ts'o" , Neil Brown , Andreas Dilger , Jan Kara , Mike Snitzer , linux-kernel , Keith Mannthey , Mingming Cao , linux-ext4@vger.kernel.org, Ric Wheeler , Christoph Hellwig , Josef Bacik To: Tejun Heo Return-path: Content-Disposition: inline In-Reply-To: <20110115133344.GB27123@htj.dyndns.org> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Sat, Jan 15, 2011 at 02:33:44PM +0100, Tejun Heo wrote: > Hello, > > On Thu, Jan 13, 2011 at 10:59:46AM -0800, Darrick J. Wong wrote: > > > But wouldn't it be better to implement this directly in the flush > > > machinary instead of as blkdev_issue_flush() wrapper? We have all the > > > information at the request queue level so we can easily detect whether > > > flushes can be merged or not and whether something is issued by > > > blkdev_issue_flush() or by directly submitting bio wouldn't matter at > > > all. Am I missing something? > > > > Could you clarify where exactly you meant by "in the flush > > machinery"? I think what you're suggesting is that > > blk_flush_complete_seq could scan the pending flush list to find a > > run of consecutive pure flush requests, submit the last one, and set > > things up so that during the completion of the flush, all the > > requests in that run are returned with the real flush's return code. > > Yeah, something along that line. > > > If that's what you meant, then yes, it could be done that way. > > However, I have a few reasons for choosing the blkdev_issue_flush > > approach. First, as far as I could tell in the kernel, all the code > > paths that involve upper layers issuing pure flushes go through > > blkdev_issue_flush, so it seems like a convenient place to capture > > all the incoming pure flushes. > > That means it _can_ be done there but doesn't mean it's the best spot. > > > Other parts of the kernel issue writes with the flush flag set, but > > we want those to go through the regular queuing mechanisms anyway. > > Second, with the proposed patch, any upper-layer code that has a > > requirement for a pure flush to be issued without coordination can > > do so by submitting the bio directly (or blkdev_issue_flush_now can > > be exported). > > I don't think anyone would need such capability but even if someone > does that should be implemented as a separate explicit > DONT_MERGE_THIS_FLUSH flag, not by exploting subtle inconsistencies on > the wrapper interface. > > > Third, baking the coordination into the flush machinery makes that > > machinery more complicated to understand and maintain. > > And the third point is completely nuts. That's like saying putting > things related together makes the concentrated code more complex, so > let's scatter things all over the place. No, it's not making the code > more complex. It's putting the complexity where it belongs. It > decreases maintenance overhead by increasing consistency. Not only > that it even results in visibly more consistent and sane _behavior_ > and the said complex machinary is less than 300 lines long. > > > So in short, I went with the blkdev_issue_flush approach because the > > code is easier to understand, and it's not losing any pure flushes > > despite casting a narrower net. > > No, not at all. You're adding an unobvious logic into a wrapper > interface creating incosistent behavior and creating artificial > distinctions between pure and non-pure flushes and flushes issued by > bio and the wrapper interface. Come on. Think about it again. You > can't be seriously standing by the above rationales. Since you're the primary author of that file anyway, I'll defer to your experience. I can squeeze in a few test runs of whatever patches you propose to the mailing list. --D