Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp658586ybe; Wed, 11 Sep 2019 02:41:36 -0700 (PDT) X-Google-Smtp-Source: APXvYqz2vAmajiGkRvPTCg1UOmYnF0eSIL15kPWOUyaJRcJXHA1D5fC5yfhoj//ELN5meSwsXg5Y X-Received: by 2002:a17:906:3187:: with SMTP id 7mr28276609ejy.238.1568194896319; Wed, 11 Sep 2019 02:41:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568194896; cv=none; d=google.com; s=arc-20160816; b=0JVcq7VG8voGVIi2PW/ke7GIFSgNzxVaKK3jBbockCihaRXyTJMFF0c8mwM69UExVp xHLp62H4NhFkGacR4lbrP7tzfHYifFraffLqzbgndA/In1g8RL8j6kUmHxpIYNrgSgHZ VDYJ25qqB+kfWpqVb/vPBmsOkX5wbD9YJE62KtAOePgiCVgyCS5ITbX1aBxBihEXOuiP TUu0MgOVYwTOMN24KAendFnQ4mJu7fjs7BbU80UoCssjaywmZ8LyDXmDY7zHfG0H6f5/ gg3TUGbp94vHVnSrvVehj9dcTnxTys7ZkFjJNkdRIF2JRJpAw61rZU08gytjMvqeiZrX fIwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:dkim-signature; bh=mvzKeifSnTSILTn0qraq+rN4CryGTOsIEYJQEQ6OnYg=; b=G4uwOJ6wzCb7MHNVBjwhPMvQLTTyohFxBQY69lA0BFCJMGlxVJptNnvuO/LmFMHa67 ifaxPyrZcWPYW0Amw4nSBMEuq8Wnil7bmBqfICar8q+KAI5RyAFPh4LI0ce1XFyshLwa SdX1bBhr5aILzGxs+hGh9AIEr9xkd4LKwydTIKD4qzwBLs9Dx1AISoNn0tCs9K+qzg3k rD09ssPT6bSSfWcXugZPWw8cbssrgwLCvfqiQ6JGI+VI5VappB9552kvZMg3izZKbPkK TJgMFYN/qsFREuq1mPbeEUeNIbRMmLG+SNVa/C4+CB7dCjSlRF4PKyrYFxbPEqq+cCIi cH7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm2 header.b=mtiyobYX; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=k5SckXFx; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v31si12454743edm.402.2019.09.11.02.41.04; Wed, 11 Sep 2019 02:41:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm2 header.b=mtiyobYX; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=k5SckXFx; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727476AbfIKJjb (ORCPT + 99 others); Wed, 11 Sep 2019 05:39:31 -0400 Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:43853 "EHLO wout3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726657AbfIKJjb (ORCPT ); Wed, 11 Sep 2019 05:39:31 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 80E0B614; Wed, 11 Sep 2019 05:39:29 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Wed, 11 Sep 2019 05:39:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm2; bh=mvzKeifSnTSILTn0qraq+rN4Cry GTOsIEYJQEQ6OnYg=; b=mtiyobYXM4j9EFSIKWpelWpgZOm+amyEdDeS0EUXrQI Dr0Gjx7szmVtnduulUVe3O6G9B/UhKhaeTGAU24JR8WcsrwcdUfodTMHaj+pj++b JxqwoCNsMr4xy3pxI6ewV6CcL2UbOFSEBRgxQv90tZqs6Bqln4/D7Zl8jXpyjki3 iZIaoGsBbkcWJbdo/wlOnO+k2OrkprPrRM6TZqlqvKNeUrC0X4GIV7fwr0LYlkrB eybjaBdLKN4LXUJQY6nhVPvF0cP0VOACC1LgVeII6B0YqSvVFgwsfqvh0Vy1/Hr+ qiTFfwY4QjLy3hkLKplwva/42/wK9CCTaZ2r/cLROTA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=mvzKei fSnTSILTn0qraq+rN4CryGTOsIEYJQEQ6OnYg=; b=k5SckXFxB6PoUTEitBhdke SomEjIEWYKhC9eJGnnHcLsihzDA6TTEPgFNZxqdrbW0LQ7g5IJVRoLT0nHfSMunH YHB0yZzjViLeZXMx7HVyNyK3QD0LvjInmNX0KJ0Y1MjRL5HBkNu0sd28Kwq3aJC6 ReUfbuvPshNdoCpzAeh7zV/O61lWLHUtx1Jw+7NKlda7PE/brwHXFG4jUooiHJAm wGaBut55D2tCZ/ky0okQNzJZXVSDO/ENtn4Edn+rrxQLp8miV4T8a0xqvLOtsuc3 0Q9IRtnvSHs106V5mK+ujB3P2U0jwcs4EiKqX/T8Bv1yC3VpqLr0f4AEHLqoDj4g == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrtddvgddujecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesthdtredttddtvdenucfhrhhomheptehnughrvghs ucfhrhgvuhhnugcuoegrnhgurhgvshesrghnrghrrgiivghlrdguvgeqnecukfhppeduge ekrdeiledrkeehrdefkeenucfrrghrrghmpehmrghilhhfrhhomheprghnughrvghssegr nhgrrhgriigvlhdruggvnecuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: from intern.anarazel.de (unknown [148.69.85.38]) by mail.messagingengine.com (Postfix) with ESMTPA id 2563CD60062; Wed, 11 Sep 2019 05:39:28 -0400 (EDT) Date: Wed, 11 Sep 2019 02:39:26 -0700 From: Andres Freund To: Dave Chinner , David Sterba Cc: Goldwyn Rodrigues , linux-fsdevel@vger.kernel.org, jack@suse.com, hch@infradead.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: Re: Odd locking pattern introduced as part of "nowait aio support" Message-ID: <20190911093926.pfkkx25mffzeuo32@alap3.anarazel.de> References: <20190910223327.mnegfoggopwqqy33@alap3.anarazel.de> <20190911040420.GB27547@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190911040420.GB27547@dread.disaster.area> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi, On 2019-09-11 14:04:20 +1000, Dave Chinner wrote: > On Tue, Sep 10, 2019 at 03:33:27PM -0700, Andres Freund wrote: > > Hi, > > > > Especially with buffered io it's fairly easy to hit contention on the > > inode lock, during writes. With something like io_uring, it's even > > easier, because it currently (but see [1]) farms out buffered writes to > > workers, which then can easily contend on the inode lock, even if only > > one process submits writes. But I've seen it in plenty other cases too. > > > > Looking at the code I noticed that several parts of the "nowait aio > > support" (cf 728fbc0e10b7f3) series introduced code like: > > > > static ssize_t > > ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > > { > > ... > > if (!inode_trylock(inode)) { > > if (iocb->ki_flags & IOCB_NOWAIT) > > return -EAGAIN; > > inode_lock(inode); > > } > > The ext4 code is just buggy here - we don't support RWF_NOWAIT on > buffered writes. But both buffered and non-buffered writes go through ext4_file_write_iter(). And there's a preceding check, at least these days, preventing IOCB_NOWAIT to apply to buffered writes: if (!o_direct && (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP; I do really wish buffered NOWAIT was supported... The overhead of having to do async buffered writes through the workqueue in io_uring, even if an already existing page is targeted, is quite noticable. Even if it failed with EAGAIN as soon as the buffered write's target isn't in the page cache, it'd be worthwhile. > > isn't trylocking and then locking in a blocking fashion an inefficient > > pattern? I.e. I think this should be > > > > if (iocb->ki_flags & IOCB_NOWAIT) { > > if (!inode_trylock(inode)) > > return -EAGAIN; > > } > > else > > inode_lock(inode); > > Yes, you are right. > > History: commit 91f9943e1c7b ("fs: support RWF_NOWAIT > for buffered reads") which introduced the first locking pattern > you describe in XFS. Seems, as part of the nowait work, the pattern was introduced in ext4, xfs and btrfs. And fixed in xfs. > That was followed soon after by: > > commit 942491c9e6d631c012f3c4ea8e7777b0b02edeab > Author: Christoph Hellwig > Date: Mon Oct 23 18:31:50 2017 -0700 > > xfs: fix AIM7 regression > > Apparently our current rwsem code doesn't like doing the trylock, then > lock for real scheme. So change our read/write methods to just do the > trylock for the RWF_NOWAIT case. This fixes a ~25% regression in > AIM7. > > Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads") > Reported-by: kernel test robot > Signed-off-by: Christoph Hellwig > Reviewed-by: Darrick J. Wong > Signed-off-by: Darrick J. Wong > > Which changed all the trylock/eagain/lock patterns to the second > form you quote. None of the other filesystems had AIM7 regressions > reported against them, so nobody changed them.... :( > > Obviously this isn't going to improve scalability to a very significant > > degree. But not unnecessarily doing two atomic ops on a contended lock > > can't hurt scalability either. Also, the current code just seems > > confusing. > > > > Am I missing something? > > Just that the sort of performance regression testing that uncovers > this sort of thing isn't widely done, and most filesystems are > concurrency limited in some way before they hit inode lock > scalability issues. Hence filesystem concurrency foccussed > benchmarks that could uncover it (like aim7) won't because the inode > locks don't end up stressed enough to make a difference to > benchmark performance. Ok. Goldwyn, do you want to write a patch, or do you want me to write one up? Greetings, Andres Freund