Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp1796929ybn; Thu, 26 Sep 2019 02:19:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqwzAbQMRdaM+lgwUL6BL1mTBXo8dDlpCcIFi0Y+o1OGwDLqpbE/Okj7tLBSqMz6+2siemlf X-Received: by 2002:a05:6402:1251:: with SMTP id l17mr2428290edw.270.1569489566414; Thu, 26 Sep 2019 02:19:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569489566; cv=none; d=google.com; s=arc-20160816; b=CGZtjAa2tnA5QC2JsDSHcx7OlB52oGOmXeUEoTrEhdsPj2Tb4IlcA1UmSmloJTFD5Y 3rMMI0OO5VjRQOG42KVl2csd3sClzub5tK7Da6MQaqeF5NHB+N2QpKYMFP3aPvV6HXx2 GXEaa3npjA59lOkcSXzxSPiBbh0VFTllhuInZb/CCHMZQ1uDrBdupvlkHkNr5lAa2FX1 KtV3m94T3CUPIazbwJmb/7U962RBLsHtbtExU+mWnzfCkIUbH/jlkNUW1q6QFwKFd1jI UPW73YJv4l0YLnbioQ0EqeCDWIXewieAGO2Wf8YuYEfdBKw6H4/qeRSS3GNuJuizqS6U Wbcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=z3aykv1jzQwSYJP0JUXQ6B14cx6q3WYoLAAJO7GUucU=; b=M0NfcWQPDxuBtDKLKV7E8FfXBquhJ7zS3mK1TbOKGkfDwrL1edWPcwJ0A8qHH+lseC VaDZJ1QssI4G2Kefsgo78YnIhPFtr3abZTx0WVwnnY5/Yy1wvfzzZTjNpFT4F3nAl6Mn 3nuQtntgcIOdB1oaAhAZ8CLxPVzuA7T+41MuSr7sXIB3GQp3aF1QBd9apQtrueXLnecr xLriIucCLHKCsEAn+oYX60Xl/vfKmqiMDGDj2wfaPcCvSTUSj0F+bmDEjTTWZPMcv1Gx 7jJraZ7PQ1jkM8tIaG2vSxpUgmg3jVtGOvCDBp1C3m6l3t5jKUPs6C2EmnmBzKiiA3h/ 8ryw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o25si668605ejc.361.2019.09.26.02.19.02; Thu, 26 Sep 2019 02:19:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2442688AbfIYIAm (ORCPT + 99 others); Wed, 25 Sep 2019 04:00:42 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:57379 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2442663AbfIYIAl (ORCPT ); Wed, 25 Sep 2019 04:00:41 -0400 Received: from dread.disaster.area (pa49-181-226-196.pa.nsw.optusnet.com.au [49.181.226.196]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 7FE0E43E145; Wed, 25 Sep 2019 18:00:36 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.2) (envelope-from ) id 1iD2Dy-0000zC-Ou; Wed, 25 Sep 2019 18:00:34 +1000 Date: Wed, 25 Sep 2019 18:00:34 +1000 From: Dave Chinner To: Linus Torvalds Cc: Konstantin Khlebnikov , Tejun Heo , linux-fsdevel , Linux-MM , Linux Kernel Mailing List , Jens Axboe , Michal Hocko , Mel Gorman , Johannes Weiner Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes Message-ID: <20190925080034.GD804@dread.disaster.area> References: <156896493723.4334.13340481207144634918.stgit@buzz> <875f3b55-4fe1-e2c3-5bee-ca79e4668e72@yandex-team.ru> <20190923145242.GF2233839@devbig004.ftw2.facebook.com> <20190924073940.GM6636@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=dRuLqZ1tmBNts2YiI0zFQg==:117 a=dRuLqZ1tmBNts2YiI0zFQg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=J70Eh1EUuV4A:10 a=7-415B0cAAAA:8 a=714uhUuBiEYtG38rzikA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 24, 2019 at 12:08:04PM -0700, Linus Torvalds wrote: > On Tue, Sep 24, 2019 at 12:39 AM Dave Chinner wrote: > > > > Stupid question: how is this any different to simply winding down > > our dirty writeback and throttling thresholds like so: > > > > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes > > Our dirty_background stuff is very questionable, but it exists (and > has those insane defaults) because of various legacy reasons. That's not what I was asking about. The context is in the previous lines you didn't quote: > > > > Is the faster speed reproducible? I don't quite understand why this > > > > would be. > > > > > > Writing to disk simply starts earlier. > > > > Stupid question: how is this any different to simply winding down > > our dirty writeback and throttling thresholds like so: i.e. I'm asking about the reasons for the performance differential not asking for an explanation of what writebehind is. If the performance differential really is caused by writeback starting sooner, then winding down dirty_background_bytes should produce exactly the same performance because it will start writeback -much faster-. If it doesn't, then the assertion that the difference is caused by earlier writeout is questionable and the code may not actually be doing what is claimed.... Basically, I'm asking for proof that the explanation is correct. > > to start background writeback when there's 100MB of dirty pages in > > memory, and then: > > > > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes > > The thing is, that also accounts for dirty shared mmap pages. And it > really will kill some benchmarks that people take very very seriously. Yes, I know that. I'm not suggesting that we do this, [snip] > Anyway, the end result of all this is that we have that > balance_dirty_pages() that is pretty darn complex and I suspect very > few people understand everything that goes on in that function. I'd agree with you there - most of the ground work for the balance_dirty_pages IO throttling feedback loop was all based on concepts I developed to solve dirty page writeback thrashing problems on Irix back in 2003. The code we have in Linux was written by Fenguang Wu with help for a lot of people, but the underlying concepts of delegating IO to dedicated writeback threads that calculate and track page cleaning rates (BDI writeback rates) and then throttling incoming page dirtying rate to the page cleaning rate all came out of my head.... So, much as it may surprise you, I am one of the few people who do actually understand how that whole complex mass of accounting and feedback is supposed to work. :) > Now, whether write-behind really _does_ help that, or whether it's > just yet another tweak and complication, I can't actually say. Neither can I at this point - I lack the data and that's why I was asking if there was a perf difference with the existing limits wound right down. Knowing whether the performance difference is simply a result of starting writeback IO sooner tells me an awful lot about what other behaviour is happening as a result of the changes in this patch. > But I > don't think 'dirty_background_bytes' is really an argument against > write-behind, it's just one knob on the very complex dirty handling we > have. Never said it was - just trying to determine if a one line explanation is true or not. Cheers, Dave. -- Dave Chinner david@fromorbit.com