Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp1809120ybn; Thu, 26 Sep 2019 02:32:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqyUUXLcq+pVjwZqC4MTi87cWoHsleHK94uVL4k5eplJRkHmckbGI7XErsLRi3Ab0fgOZGD8 X-Received: by 2002:a50:d49c:: with SMTP id s28mr2444391edi.101.1569490326534; Thu, 26 Sep 2019 02:32:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569490326; cv=none; d=google.com; s=arc-20160816; b=PfTt24EXy37VcmSDEyJwbSOVTOh76uBFE5C0kN8ncSqJgUIrCjdMZqe2WFsItXmtK2 w/i01dbujnscT1xf+5ztzx+Nki8I7GaZS4ca2DhRj0WqGlbCFDXd+j5Q6SiJgnJu242J Q5iFDQgab3yUX77Inmiph9u4kRq1g379gJuZC5UMePBug9c/HzJa1fwxxl1EZdt/ukfu M7rUUBCui9O5pkXxn3VHuf9Mi+JQOcyc2ZK26xaevnU+l9lXrnB2446Kemi2CPxSZaRJ UXrlVm+9lVuI4g0q13qbmjb7aPvPDLvUnsm1FbDJc4nUb36VS0F53F1ZuIxS0bN7Rnt+ 9piw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=UjJVoPC8hO0Vwcrzy18At+HZHe/CsFz/F8w0Oyj+czM=; b=sZegZ9293n27YZAH2c3/MILs6q2iauA901SbtvGbG6h9Z1u6ovJ8wyvr9jaoSFHlR+ Ar+3nZNxYfYnzGRutbKkJCxVxb48w4+GkEm7hxlMhmnZauuOtMIGSxUcL/BHnIctPV36 NmRA+U8jLdxQEO5K9ZpZPV+bAWWkWt8DcdnEmE15H8616oBcddlwesoS5kFQrGfTgU/A 5MbeIkVLl2sJhYXT536er1dBtYe5cOVQ2v1vVp229Kg8ez4g71dR0jE121PaqpWi0YkT q6A/ZpBszLa1dOyLpkYvdaBBAsIpk4wxhWu5deC/FGOWogNTlBspOC+XaajiQW984Q6r AzJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id oe23si702793ejb.199.2019.09.26.02.31.42; Thu, 26 Sep 2019 02:32:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405834AbfIYMyg (ORCPT + 99 others); Wed, 25 Sep 2019 08:54:36 -0400 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:52301 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2405798AbfIYMyg (ORCPT ); Wed, 25 Sep 2019 08:54:36 -0400 Received: from callcc.thunk.org (guestnat-104-133-0-98.corp.google.com [104.133.0.98] (may be forged)) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id x8PCsAFi024978 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 25 Sep 2019 08:54:17 -0400 Received: by callcc.thunk.org (Postfix, from userid 15806) id D5A834200BF; Wed, 25 Sep 2019 08:54:09 -0400 (EDT) Date: Wed, 25 Sep 2019 08:54:09 -0400 From: "Theodore Y. Ts'o" To: Dave Chinner Cc: Konstantin Khlebnikov , Tejun Heo , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jens Axboe , Michal Hocko , Mel Gorman , Johannes Weiner , Linus Torvalds Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes Message-ID: <20190925125409.GD18094@mit.edu> References: <156896493723.4334.13340481207144634918.stgit@buzz> <875f3b55-4fe1-e2c3-5bee-ca79e4668e72@yandex-team.ru> <20190923145242.GF2233839@devbig004.ftw2.facebook.com> <20190924073940.GM6636@dread.disaster.area> <20190925071854.GC804@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190925071854.GC804@dread.disaster.area> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 25, 2019 at 05:18:54PM +1000, Dave Chinner wrote: > > > ANd, really such strict writebehind behaviour is going to cause all > > > sorts of unintended problesm with filesystems because there will be > > > adverse interactions with delayed allocation. We need a substantial > > > amount of dirty data to be cached for writeback for fragmentation > > > minimisation algorithms to be able to do their job.... > > > > I think most sequentially written files never change after close. > > There are lots of apps that write zeros to initialise and allocate > space, then go write real data to them. Database WAL files are > commonly initialised like this... Fortunately, most of the time Enterprise Database files which are initialized with a fd which is then kept open. And it's only a single file. So that's a hueristic that's not too bad to handle so long as it's only triggered when there are no open file descriptors on said inode. If something is still keeping the file open, then we do need to be very careful about writebehind. That behind said, with databases, they are goind to be calling fdatasync(2) and fsync(2) all the time, so it's unlikely writebehind is goint to be that much of an issue, so long as the max writebehind knob isn't set too insanely low. It's been over ten years since I last looked at this, and so things may have very likely changed, but one enterprise database I looked at would fallocate 32M, and then write 32M of zeros to make sure blocks were marked as initialized, so that further random writes wouldn't cause metadata updates. Now, there *are* applications which log to files via append, and in the worst case, they don't actually keep a fd open. Examples of this would include scripts that call logger(1) very often. But in general, taking into account whether or not there is still a fd holding the inode open to influence how aggressively we do writeback does make sense. Finally, we should remember that this will impact battery life on laptops. Perhaps not so much now that most laptops have SSD's instead of HDD's, but aggressive writebehind does certainly have tradeoffs, and what makes sense for a NVMe attached SSD is going to be very different for a $2 USB thumb drive picked up at the checkout aisle of Staples.... - Ted