Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1162124pxu; Thu, 17 Dec 2020 03:51:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJyLxf5h1QC7SPVL19+NGGUjWllUsJPsDGjvoKsqDmc9mAfNdAFQVh6amOK3VSuxkI6yDrUX X-Received: by 2002:a17:906:354a:: with SMTP id s10mr35348853eja.335.1608205887414; Thu, 17 Dec 2020 03:51:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608205887; cv=none; d=google.com; s=arc-20160816; b=OYJR6oLS6hjsmj8uh0diyx6L2Mufp5sgrFmPWo4r7ITGi0QGcheFoLCT/tm56cS0Kn yK2XQaJcyzjFdZ4kGp4Qj/DcoBO/qeyGefviZ52IuelaNCN1zsTyUTJkHhW6qtm4cTZ4 6vyFB9fYq/ixelgtGeTvPe9tLy6OpwAyH6u5AQQsYV36cDRN6ysiPdkM3K9+VIGsmsUs AINEt39QPwew1LnrY/tipK7CIhvFy1UqOJ0WBKsD79/sOKRbMM9NcqDvghbDVal3Ccqu j/f9P0rHGfjeOmzHUvGwtF+m2w1MhngbsdLvDiEasW9TVumyRi9zM2/fKypN5eCRIzSL K+Ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature:dkim-signature; bh=5eBczb4YqlhC7zf0h6L4Tzg0nNsZh+/50osN0IUETE4=; b=tRd17qB8QCNQSBCP7LF3EyzixGBAczoH/PDcuoAuBiKjuJyn3xfvtynm2XXT0x8Oji CKpzQ2jlHTX5F1wQEmwDs1STnMoY8S+nFizzGy8UELA6ONNrFY+pyINFOpfEpkmUyRY1 bRkdeeGMEqMOpS16hQPfna5+cuCTphQbj3CLYQRKassfOApXQ7wtyMFHlRDhyn1JbawM yPDEO/FkQ/7rOykQUuPTL+oaX1jsMLMxtPHhmFRZWU+CNNDJyTCIbrsPHO/UQFiLaKIE TUIzKD93hHoerffIyg8mF2omza9Tag1GUpev7xhVCQCetcX9QpNqjlmTQd3vyFtt0dNR BLlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@themaw.net header.s=fm2 header.b=yCHxoa9w; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=BufWaAX+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c8si4227803edq.432.2020.12.17.03.51.03; Thu, 17 Dec 2020 03:51:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@themaw.net header.s=fm2 header.b=yCHxoa9w; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=BufWaAX+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728039AbgLQLuH (ORCPT + 99 others); Thu, 17 Dec 2020 06:50:07 -0500 Received: from wnew4-smtp.messagingengine.com ([64.147.123.18]:55985 "EHLO wnew4-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727999AbgLQLuG (ORCPT ); Thu, 17 Dec 2020 06:50:06 -0500 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailnew.west.internal (Postfix) with ESMTP id 35EFE94F; Thu, 17 Dec 2020 06:49:00 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 17 Dec 2020 06:49:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=themaw.net; h= message-id:subject:from:to:cc:date:in-reply-to:references :content-type:mime-version:content-transfer-encoding; s=fm2; bh= 5eBczb4YqlhC7zf0h6L4Tzg0nNsZh+/50osN0IUETE4=; b=yCHxoa9wEABa+1ih n8dde8Sx20BrpgUWtiAxkeANna540J5V/RKGfZthxXmh0m6MsHG+GEGOFBuJc+Eu jwmxuZEZzS0IBSYEwi+HndyVMEljEytcNDR9XSuwVcSBmV3Ey7Dq7OBNRSJ0+aCC fQUWJMdVnRQUa3JGHyItGdBbvQqbvt6Z0lm9KK3saatEuJhIL+ewzRy2BtTod+TF hEv5vJoJnKzI6xlW51zoMbzmUizKIMG6BRqqmhMfspaVQfTAfgjAQ/XRygnLSBy/ 0tT3sFXJLgiS79NGTIIMrOTQPuqb8LesOph7F8p1f3ZWufsDBHoqQrjCCeYJG7t5 R2ix0Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=5eBczb4YqlhC7zf0h6L4Tzg0nNsZh+/50osN0IUET E4=; b=BufWaAX+ZzZ6N2h1ilJGLj+fjdkAQWPjIwpi6KO2FneeLJoOSMOUgywFE Z1Gar6CL6Dpz4yPyxrhXrbPCKDYkRP4M7CQ0sYYMa7VdhBl497yqNQil48d/OSkh raTFeThtguUrcCrRdZw1+KwfR6szS87AkHmVGupIIQVPflfseMQkjMEu8W7jZXaC hQOTflP7hxHgK75xlb1WL9gJTVIiL4nbv2QJPB7Z+/pUdoHIrG9egwXxkrK7PTMj WGYbyOfnBr1T0rQB/oSb6FtvwOTouZdMIAFj0Mn2WPUgzpkOpTazQ6uJW/u6X7k2 LZoz2ySwOvWze5looNL+x3QDi11zg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudelgedgvdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkffuhffvffgjfhgtfggggfesthejredttderjeenucfhrhhomhepkfgrnhcu mfgvnhhtuceorhgrvhgvnhesthhhvghmrgifrdhnvghtqeenucggtffrrghtthgvrhhnpe eikeeggeeuvdevgfefiefhudekkeegheeileejveethedutedvveehudffjeevudenucff ohhmrghinhepkhgvrhhnvghlrdhorhhgnecukfhppedutdeirdeiledrvdegjedrvddthe enucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehrrghv vghnsehthhgvmhgrfidrnhgvth X-ME-Proxy: Received: from mickey.themaw.net (106-69-247-205.dyn.iinet.net.au [106.69.247.205]) by mail.messagingengine.com (Postfix) with ESMTPA id DC2961080059; Thu, 17 Dec 2020 06:48:53 -0500 (EST) Message-ID: Subject: Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement From: Ian Kent To: Fox Chen , Greg KH , Tejun Heo Cc: akpm@linux-foundation.org, dhowells@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, miklos@szeredi.hu, ricklind@linux.vnet.ibm.com, sfr@canb.auug.org.au, viro@zeniv.linux.org.uk Date: Thu, 17 Dec 2020 19:48:49 +0800 In-Reply-To: References: <159237905950.89469.6559073274338175600.stgit@mickey.themaw.net> <20201210164423.9084-1-foxhlchen@gmail.com> <822f02508d495ee7398450774eb13e5116ec82ac.camel@themaw.net> <13e21e4c9a5841243c8d130cf9324f6cfc4dc2e1.camel@themaw.net> <3e97846b52a46759c414bff855e49b07f0d908fc.camel@themaw.net> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2020-12-17 at 19:09 +0800, Ian Kent wrote: > On Thu, 2020-12-17 at 18:09 +0800, Ian Kent wrote: > > On Thu, 2020-12-17 at 16:54 +0800, Fox Chen wrote: > > > On Thu, Dec 17, 2020 at 12:46 PM Ian Kent > > > wrote: > > > > On Tue, 2020-12-15 at 20:59 +0800, Ian Kent wrote: > > > > > On Tue, 2020-12-15 at 16:33 +0800, Fox Chen wrote: > > > > > > On Mon, Dec 14, 2020 at 9:30 PM Ian Kent > > > > > > wrote: > > > > > > > On Mon, 2020-12-14 at 14:14 +0800, Fox Chen wrote: > > > > > > > > On Sun, Dec 13, 2020 at 11:46 AM Ian Kent < > > > > > > > > raven@themaw.net > > > > > > > > wrote: > > > > > > > > > On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote: > > > > > > > > > > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote: > > > > > > > > > > > > For the patches, there is a mutex_lock in kn- > > > > > > > > > > > > > attr_mutex, > > > > > > > > > > > > as > > > > > > > > > > > > Tejun > > > > > > > > > > > > mentioned here > > > > > > > > > > > > ( > > > > > > > > > > > > https://lore.kernel.org/lkml/X8fe0cmu+aq1gi7O@mtj.duckdns.org/ > > > > > > > > > > > > ), > > > > > > > > > > > > maybe a global > > > > > > > > > > > > rwsem for kn->iattr will be better?? > > > > > > > > > > > > > > > > > > > > > > I wasn't sure about that, IIRC a spin lock could > > > > > > > > > > > be > > > > > > > > > > > used > > > > > > > > > > > around > > > > > > > > > > > the > > > > > > > > > > > initial check and checked again at the end which > > > > > > > > > > > would > > > > > > > > > > > probably > > > > > > > > > > > have > > > > > > > > > > > been much faster but much less conservative and a > > > > > > > > > > > bit > > > > > > > > > > > more > > > > > > > > > > > ugly > > > > > > > > > > > so > > > > > > > > > > > I just went the conservative path since there was > > > > > > > > > > > so > > > > > > > > > > > much > > > > > > > > > > > change > > > > > > > > > > > already. > > > > > > > > > > > > > > > > > > > > Sorry, I hadn't looked at Tejun's reply yet and TBH > > > > > > > > > > didn't > > > > > > > > > > remember > > > > > > > > > > it. > > > > > > > > > > > > > > > > > > > > Based on what Tejun said it sounds like that needs > > > > > > > > > > work. > > > > > > > > > > > > > > > > > > Those attribute handling patches were meant to allow > > > > > > > > > taking > > > > > > > > > the > > > > > > > > > rw > > > > > > > > > sem read lock instead of the write lock for > > > > > > > > > kernfs_refresh_inode() > > > > > > > > > updates, with the added locking to protect the inode > > > > > > > > > attributes > > > > > > > > > update since it's called from the VFS both with and > > > > > > > > > without > > > > > > > > > the > > > > > > > > > inode lock. > > > > > > > > > > > > > > > > Oh, understood. I was asking also because lock on kn- > > > > > > > > > attr_mutex > > > > > > > > drags > > > > > > > > concurrent performance. > > > > > > > > > > > > > > > > > Looking around it looks like kernfs_iattrs() is > > > > > > > > > called > > > > > > > > > from > > > > > > > > > multiple > > > > > > > > > places without a node database lock at all. > > > > > > > > > > > > > > > > > > I'm thinking that, to keep my proposed change > > > > > > > > > straight > > > > > > > > > forward > > > > > > > > > and on topic, I should just leave > > > > > > > > > kernfs_refresh_inode() > > > > > > > > > taking > > > > > > > > > the node db write lock for now and consider the > > > > > > > > > attributes > > > > > > > > > handling > > > > > > > > > as a separate change. Once that's done we could > > > > > > > > > reconsider > > > > > > > > > what's > > > > > > > > > needed to use the node db read lock in > > > > > > > > > kernfs_refresh_inode(). > > > > > > > > > > > > > > > > You meant taking write lock of kernfs_rwsem for > > > > > > > > kernfs_refresh_inode()?? > > > > > > > > It may be a lot slower in my benchmark, let me test it. > > > > > > > > > > > > > > Yes, but make sure the write lock of kernfs_rwsem is > > > > > > > being > > > > > > > taken > > > > > > > not the read lock. > > > > > > > > > > > > > > That's a mistake I had initially? > > > > > > > > > > > > > > Still, that attributes handling is, I think, sufficient > > > > > > > to > > > > > > > warrant > > > > > > > a separate change since it looks like it might need work, > > > > > > > the > > > > > > > kernfs > > > > > > > node db probably should be kept stable for those > > > > > > > attribute > > > > > > > updates > > > > > > > but equally the existence of an instantiated dentry might > > > > > > > mitigate > > > > > > > the it. > > > > > > > > > > > > > > Some people might just know whether it's ok or not but I > > > > > > > would > > > > > > > like > > > > > > > to check the callers to work out what's going on. > > > > > > > > > > > > > > In any case it's academic if GCH isn't willing to > > > > > > > consider > > > > > > > the > > > > > > > series > > > > > > > for review and possible merge. > > > > > > > > > > > > > Hi Ian > > > > > > > > > > > > I removed kn->attr_mutex and changed read lock to write > > > > > > lock > > > > > > for > > > > > > kernfs_refresh_inode > > > > > > > > > > > > down_write(&kernfs_rwsem); > > > > > > kernfs_refresh_inode(kn, inode); > > > > > > up_write(&kernfs_rwsem); > > > > > > > > > > > > > > > > > > Unfortunate, changes in this way make things worse, my > > > > > > benchmark > > > > > > runs > > > > > > 100% slower than upstream sysfs. :( > > > > > > open+read+close a sysfs file concurrently took 1000us. > > > > > > (Currently, > > > > > > sysfs with a big mutex kernfs_mutex only takes ~500us > > > > > > for one open+read+close operation concurrently) > > > > > > > > > > Right, so it does need attention nowish. > > > > > > > > > > I'll have a look at it in a while, I really need to get a new > > > > > autofs > > > > > release out, and there are quite a few changes, and testing > > > > > is > > > > > seeing > > > > > a number of errors, some old, some newly introduced. It's > > > > > proving > > > > > difficult. > > > > > > > > I've taken a breather for the autofs testing and had a look at > > > > this. > > > > > > Thanks. :) > > > > > > > I think my original analysis of this was wrong. > > > > > > > > Could you try this patch please. > > > > I'm not sure how much difference it will make but, in > > > > principle, > > > > it's much the same as the previous approach except it doesn't > > > > increase the kernfs node struct size or mess with the other > > > > attribute handling code. > > > > > > > > Note, this is not even compile tested. > > > > > > I failed to apply this patch. So based on the original six > > > patches, > > > I > > > manually removed kn->attr_mutex, and added > > > inode_lock/inode_unlock to those two functions, they were like: > > > > > > int kernfs_iop_getattr(const struct path *path, struct kstat > > > *stat, > > > u32 request_mask, unsigned int > > > query_flags) > > > { > > > struct inode *inode = d_inode(path->dentry); > > > struct kernfs_node *kn = inode->i_private; > > > > > > inode_lock(inode); > > > down_read(&kernfs_rwsem); > > > kernfs_refresh_inode(kn, inode); > > > up_read(&kernfs_rwsem); > > > inode_unlock(inode); > > > > > > generic_fillattr(inode, stat); > > > return 0; > > > } > > > > > > int kernfs_iop_permission(struct inode *inode, int mask) > > > { > > > struct kernfs_node *kn; > > > > > > if (mask & MAY_NOT_BLOCK) > > > return -ECHILD; > > > > > > kn = inode->i_private; > > > > > > inode_lock(inode); > > > down_read(&kernfs_rwsem); > > > kernfs_refresh_inode(kn, inode); > > > up_read(&kernfs_rwsem); > > > inode_unlock(inode); > > > > > > return generic_permission(inode, mask); > > > } > > > > > > But I couldn't boot the kernel and there was no error on the > > > screen. > > > I guess it was deadlocked on /sys creation?? :D > > > > Right, I guess the locking documentation is out of date. I'm > > guessing > > the inode lock is taken somewhere over the .permission() call. If > > that > > usage is consistent it's easy fixed, if the usage is inconsistent > > it's > > hard to deal with and amounts to a bug. > > Yes, it is called, both shared on open, and exclusive on open > create, and without the inode lock at all at the start of path > resolution. > > That can't really be called a VFS bug since .permission() is > meant to check permissions not update the inode. > > This is probably what lead to the attr patches I had. > > If a suitable place to put a local per-object lock can't be > found for this, other than in the kernfs_node, then it's a > real problem from a contention POV. > > What could be done is to make the kernfs node attr_mutex > a pointer and dynamically allocate it but even that is too > costly a size addition to the kernfs node structure as > Tejun has said. I guess the question to ask is, is there really a need to call kernfs_refresh_inode() from functions that are usually reading/checking functions. Would it be sufficient to refresh the inode in the write/set operations in (if there's any) places where things like setattr_copy() is not already called? Perhaps GKH or Tejun could comment on this? Ian > > Those patches I referred to clearly aren't finished because > the eighth one is empty, which followed a patch I have titled > "kernfs: make attr_mutex a local kernfs node lock". > > I obviously gave up on it when the series was rejected. > But I'll give it some more thought. > > Ian > > > I'll have another look at it. > > > > Also, it sounds like I'm working from a more recent series. > > > > I had 8 patches, dropped the last three and added the one I posted. > > If I can work out what's going on I'll post the series for you to > > check. > > > > Ian > > > > > > kernfs: use kernfs read lock in .getattr() and .permission() > > > > > > > > From: Ian Kent > > > > > > > > From Documenation/filesystems.rst and (slightly outdated) > > > > comments > > > > in fs/attr.c the inode i_rwsem is used for attribute handling. > > > > > > > > This lock satisfies the requirememnts needed to reduce lock > > > > contention, > > > > namely a per-object lock needs to be used rather than a file > > > > system > > > > global lock with the kernfs node db held stable for read > > > > operations. > > > > > > > > In particular it should reduce lock contention seen when > > > > calling > > > > the > > > > kernfs .permission() method. > > > > > > > > The inode methods .getattr() and .permission() do not hold the > > > > inode > > > > i_rwsem lock when called as they are usually read operations. > > > > Also > > > > the .permission() method checks for rcu-walk mode and returns > > > > -ECHILD > > > > to the VFS if it is set. So the i_rwsem lock can be used in > > > > kernfs_iop_getattr() and kernfs_iop_permission() to protect the > > > > inode > > > > update done by kernfs_refresh_inode(). Using this lock allows > > > > the > > > > kernfs node db write lock in these functions to be changed to a > > > > read > > > > lock. > > > > > > > > Signed-off-by: Ian Kent > > > > --- > > > > fs/kernfs/inode.c | 12 ++++++++---- > > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c > > > > index ddaf18198935..568037e9efe9 100644 > > > > --- a/fs/kernfs/inode.c > > > > +++ b/fs/kernfs/inode.c > > > > @@ -189,9 +189,11 @@ int kernfs_iop_getattr(const struct path > > > > *path, struct kstat *stat, > > > > struct inode *inode = d_inode(path->dentry); > > > > struct kernfs_node *kn = inode->i_private; > > > > > > > > - down_write(&kernfs_rwsem); > > > > + inode_lock(inode); > > > > + down_read(&kernfs_rwsem); > > > > kernfs_refresh_inode(kn, inode); > > > > - up_write(&kernfs_rwsem); > > > > + up_read(&kernfs_rwsem); > > > > + inode_unlock(inode); > > > > > > > > generic_fillattr(inode, stat); > > > > return 0; > > > > @@ -281,9 +283,11 @@ int kernfs_iop_permission(struct inode > > > > *inode, > > > > int mask) > > > > > > > > kn = inode->i_private; > > > > > > > > - down_write(&kernfs_rwsem); > > > > + inode_lock(inode); > > > > + down_read(&kernfs_rwsem); > > > > kernfs_refresh_inode(kn, inode); > > > > - up_write(&kernfs_rwsem); > > > > + up_read(&kernfs_rwsem); > > > > + inode_unlock(inode); > > > > > > > > return generic_permission(inode, mask); > > > > } > > > > > > > > > > thanks, > > > fox