Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1107253pxu; Thu, 17 Dec 2020 02:18:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJxjdDnfjgmcD8n6kykavBsPjrB0cCIkh06E5F0UhpnoZuTkJHz8ajy87xmVxmiQRKqVEIjM X-Received: by 2002:a17:906:589:: with SMTP id 9mr12174848ejn.229.1608200281842; Thu, 17 Dec 2020 02:18:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608200281; cv=none; d=google.com; s=arc-20160816; b=G+jzk7//pVHAz5QbIlpJTqGwZ6/Woq4v4wSlLmhhXd5J7hfvc4gBD5443B8Rs9EKOD HW8c6IJMxCLUBgdnVnihsPCfKJ6h5hkicPxYaXA9mIkRK8YeeXaCpc1fWHiOhHanz+Gp F4MEhEmmja0bkHmfh7tazowqdcRma0GZZURUkZIkCrj5PQ8J4y42cQPhYFthD6QZGuuV 7oymijQbsP4SE60XsqCJOLbLyCm1PMH4Vkylxq/sFHe3Osn5qcnSZ4nKGGv6ainmXbAg HTSYgvk0xgMuGHMmZGzm/q4jEAbOQ6kVZeaF1Pkij8mX0eJaKIQiB6DjB/rc1KpikqKo 0+8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature:dkim-signature; bh=ECuCtmLSYQLRSu5SS/jzREHQ6SZwWtSbCthNgvkDooI=; b=x22RcjLuoxj8A7/PFrkafskMbFPjzO3tiN1YLcmQ5+/NLlKCA2mLJaRslrCAALBcKO 97XI8pPeDaKt+y/pAfPQFed+S7CGahgzWOWQSdN5+5jBHSGexVl2AfyOf1r6ZcRKCduJ UU0xSMNgHbSxbrUVe6WoaptrmxcJKr9bimdYv3d6Qsc76nR/zbve8UPcRgYGNR1wiCEF d3zafD+HiKP9AgU6+8ubBalqNNsevLoe2ccFANvbxxkbsu9rGNUNAQiLkWlIiXVbPVtc +nZSCoCUdQyRWvgPEj+KdQvPekCWPopr5vzby9tH/qnQxVLMwwfnlN+6D2w+0zzKM1Qd glZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@themaw.net header.s=fm2 header.b=wlHdXTyv; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="FDY1WQ/a"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c8si4227803edq.432.2020.12.17.02.17.39; Thu, 17 Dec 2020 02:18:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@themaw.net header.s=fm2 header.b=wlHdXTyv; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="FDY1WQ/a"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728022AbgLQKRF (ORCPT + 99 others); Thu, 17 Dec 2020 05:17:05 -0500 Received: from wnew2-smtp.messagingengine.com ([64.147.123.27]:46347 "EHLO wnew2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726999AbgLQKRE (ORCPT ); Thu, 17 Dec 2020 05:17:04 -0500 X-Greylist: delayed 384 seconds by postgrey-1.27 at vger.kernel.org; Thu, 17 Dec 2020 05:17:03 EST Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailnew.west.internal (Postfix) with ESMTP id B1CA5858; Thu, 17 Dec 2020 05:09:52 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Thu, 17 Dec 2020 05:09:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=themaw.net; h= message-id:subject:from:to:cc:date:in-reply-to:references :content-type:mime-version:content-transfer-encoding; s=fm2; bh= ECuCtmLSYQLRSu5SS/jzREHQ6SZwWtSbCthNgvkDooI=; b=wlHdXTyvjJNeSpDK pUc9k+GTxLvkEbFZsU7iXKO+fZVFyBrnI/NLJVvR9LdPcK879yts0Qn30eHq+YhS j2RoKsVNorar91ydLV1W9Kmmlz0BZkJbJawbLlpSc7UNdVQIFL+aNkxCRvrIatmY E7iinbLH4Ka0kpj0TvyZ1M/180tdsiS/l3AJ9PcOWHH7bdLP0Em1OMI6DWlv+2ef ANzFjkAQW0DVv7pdXdEBLwvur+C4RKTV5w6bkVu0TFM5AXaAlIwu3/fkil5H2hcu EhgxN7ptlpuYeQSRNdtLmC/rhi4Vf/5mPNfyuQ7CYVs7b3/MBkJpVrgOGFA7LrVk H9r/Yg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=ECuCtmLSYQLRSu5SS/jzREHQ6SZwWtSbCthNgvkDo oI=; b=FDY1WQ/aVwWpgJ2wz2vOQCreyYm5/jD7d5qxdOdOnnFhe/C/yfAmKZHuP 2D4D9/5SJUNiFsqMKODd52yGG1ZBWzDx/SJuse5j+/PXDxyPj//BjNQCvkrtqByq Izx7iODq2rpkrX1vtcW7hXcFM09VC6ppD4D3tmAtqYN7Ko2FlA+aYuQMXYqzTn2r O5IW4gFtT+EbDUKoZlfeyXrpR1PcCFCf+PTkvVuzY8MMoGaDaJU76yr9kA36jNSO IPVBFTiTUCknmJe2rj9Fm3nL9m63YNHplSH9gC7+XfoGHGd786JeQp6HHm5hquKV sqwKzDPA5/QY0rxiJ1bkAzIfb7+AA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudelgedguddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkffuhffvffgjfhgtfggggfesthejredttderjeenucfhrhhomhepkfgrnhcu mfgvnhhtuceorhgrvhgvnhesthhhvghmrgifrdhnvghtqeenucggtffrrghtthgvrhhnpe eikeeggeeuvdevgfefiefhudekkeegheeileejveethedutedvveehudffjeevudenucff ohhmrghinhepkhgvrhhnvghlrdhorhhgnecukfhppedutdeirdeiledrvdegjedrvddthe enucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehrrghv vghnsehthhgvmhgrfidrnhgvth X-ME-Proxy: Received: from mickey.themaw.net (106-69-247-205.dyn.iinet.net.au [106.69.247.205]) by mail.messagingengine.com (Postfix) with ESMTPA id 3EBF424005C; Thu, 17 Dec 2020 05:09:46 -0500 (EST) Message-ID: Subject: Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement From: Ian Kent To: Fox Chen Cc: akpm@linux-foundation.org, dhowells@redhat.com, Greg KH , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, miklos@szeredi.hu, ricklind@linux.vnet.ibm.com, sfr@canb.auug.org.au, Tejun Heo , viro@zeniv.linux.org.uk Date: Thu, 17 Dec 2020 18:09:43 +0800 In-Reply-To: References: <159237905950.89469.6559073274338175600.stgit@mickey.themaw.net> <20201210164423.9084-1-foxhlchen@gmail.com> <822f02508d495ee7398450774eb13e5116ec82ac.camel@themaw.net> <13e21e4c9a5841243c8d130cf9324f6cfc4dc2e1.camel@themaw.net> <3e97846b52a46759c414bff855e49b07f0d908fc.camel@themaw.net> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2020-12-17 at 16:54 +0800, Fox Chen wrote: > On Thu, Dec 17, 2020 at 12:46 PM Ian Kent wrote: > > On Tue, 2020-12-15 at 20:59 +0800, Ian Kent wrote: > > > On Tue, 2020-12-15 at 16:33 +0800, Fox Chen wrote: > > > > On Mon, Dec 14, 2020 at 9:30 PM Ian Kent > > > > wrote: > > > > > On Mon, 2020-12-14 at 14:14 +0800, Fox Chen wrote: > > > > > > On Sun, Dec 13, 2020 at 11:46 AM Ian Kent > > > > > > > > > > > > wrote: > > > > > > > On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote: > > > > > > > > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote: > > > > > > > > > > For the patches, there is a mutex_lock in kn- > > > > > > > > > > > attr_mutex, > > > > > > > > > > as > > > > > > > > > > Tejun > > > > > > > > > > mentioned here > > > > > > > > > > ( > > > > > > > > > > https://lore.kernel.org/lkml/X8fe0cmu+aq1gi7O@mtj.duckdns.org/ > > > > > > > > > > ), > > > > > > > > > > maybe a global > > > > > > > > > > rwsem for kn->iattr will be better?? > > > > > > > > > > > > > > > > > > I wasn't sure about that, IIRC a spin lock could be > > > > > > > > > used > > > > > > > > > around > > > > > > > > > the > > > > > > > > > initial check and checked again at the end which > > > > > > > > > would > > > > > > > > > probably > > > > > > > > > have > > > > > > > > > been much faster but much less conservative and a bit > > > > > > > > > more > > > > > > > > > ugly > > > > > > > > > so > > > > > > > > > I just went the conservative path since there was so > > > > > > > > > much > > > > > > > > > change > > > > > > > > > already. > > > > > > > > > > > > > > > > Sorry, I hadn't looked at Tejun's reply yet and TBH > > > > > > > > didn't > > > > > > > > remember > > > > > > > > it. > > > > > > > > > > > > > > > > Based on what Tejun said it sounds like that needs > > > > > > > > work. > > > > > > > > > > > > > > Those attribute handling patches were meant to allow > > > > > > > taking > > > > > > > the > > > > > > > rw > > > > > > > sem read lock instead of the write lock for > > > > > > > kernfs_refresh_inode() > > > > > > > updates, with the added locking to protect the inode > > > > > > > attributes > > > > > > > update since it's called from the VFS both with and > > > > > > > without > > > > > > > the > > > > > > > inode lock. > > > > > > > > > > > > Oh, understood. I was asking also because lock on kn- > > > > > > > attr_mutex > > > > > > drags > > > > > > concurrent performance. > > > > > > > > > > > > > Looking around it looks like kernfs_iattrs() is called > > > > > > > from > > > > > > > multiple > > > > > > > places without a node database lock at all. > > > > > > > > > > > > > > I'm thinking that, to keep my proposed change straight > > > > > > > forward > > > > > > > and on topic, I should just leave kernfs_refresh_inode() > > > > > > > taking > > > > > > > the node db write lock for now and consider the > > > > > > > attributes > > > > > > > handling > > > > > > > as a separate change. Once that's done we could > > > > > > > reconsider > > > > > > > what's > > > > > > > needed to use the node db read lock in > > > > > > > kernfs_refresh_inode(). > > > > > > > > > > > > You meant taking write lock of kernfs_rwsem for > > > > > > kernfs_refresh_inode()?? > > > > > > It may be a lot slower in my benchmark, let me test it. > > > > > > > > > > Yes, but make sure the write lock of kernfs_rwsem is being > > > > > taken > > > > > not the read lock. > > > > > > > > > > That's a mistake I had initially? > > > > > > > > > > Still, that attributes handling is, I think, sufficient to > > > > > warrant > > > > > a separate change since it looks like it might need work, the > > > > > kernfs > > > > > node db probably should be kept stable for those attribute > > > > > updates > > > > > but equally the existence of an instantiated dentry might > > > > > mitigate > > > > > the it. > > > > > > > > > > Some people might just know whether it's ok or not but I > > > > > would > > > > > like > > > > > to check the callers to work out what's going on. > > > > > > > > > > In any case it's academic if GCH isn't willing to consider > > > > > the > > > > > series > > > > > for review and possible merge. > > > > > > > > > Hi Ian > > > > > > > > I removed kn->attr_mutex and changed read lock to write lock > > > > for > > > > kernfs_refresh_inode > > > > > > > > down_write(&kernfs_rwsem); > > > > kernfs_refresh_inode(kn, inode); > > > > up_write(&kernfs_rwsem); > > > > > > > > > > > > Unfortunate, changes in this way make things worse, my > > > > benchmark > > > > runs > > > > 100% slower than upstream sysfs. :( > > > > open+read+close a sysfs file concurrently took 1000us. > > > > (Currently, > > > > sysfs with a big mutex kernfs_mutex only takes ~500us > > > > for one open+read+close operation concurrently) > > > > > > Right, so it does need attention nowish. > > > > > > I'll have a look at it in a while, I really need to get a new > > > autofs > > > release out, and there are quite a few changes, and testing is > > > seeing > > > a number of errors, some old, some newly introduced. It's proving > > > difficult. > > > > I've taken a breather for the autofs testing and had a look at > > this. > > Thanks. :) > > > I think my original analysis of this was wrong. > > > > Could you try this patch please. > > I'm not sure how much difference it will make but, in principle, > > it's much the same as the previous approach except it doesn't > > increase the kernfs node struct size or mess with the other > > attribute handling code. > > > > Note, this is not even compile tested. > > I failed to apply this patch. So based on the original six patches, I > manually removed kn->attr_mutex, and added > inode_lock/inode_unlock to those two functions, they were like: > > int kernfs_iop_getattr(const struct path *path, struct kstat *stat, > u32 request_mask, unsigned int query_flags) > { > struct inode *inode = d_inode(path->dentry); > struct kernfs_node *kn = inode->i_private; > > inode_lock(inode); > down_read(&kernfs_rwsem); > kernfs_refresh_inode(kn, inode); > up_read(&kernfs_rwsem); > inode_unlock(inode); > > generic_fillattr(inode, stat); > return 0; > } > > int kernfs_iop_permission(struct inode *inode, int mask) > { > struct kernfs_node *kn; > > if (mask & MAY_NOT_BLOCK) > return -ECHILD; > > kn = inode->i_private; > > inode_lock(inode); > down_read(&kernfs_rwsem); > kernfs_refresh_inode(kn, inode); > up_read(&kernfs_rwsem); > inode_unlock(inode); > > return generic_permission(inode, mask); > } > > But I couldn't boot the kernel and there was no error on the screen. > I guess it was deadlocked on /sys creation?? :D Right, I guess the locking documentation is out of date. I'm guessing the inode lock is taken somewhere over the .permission() call. If that usage is consistent it's easy fixed, if the usage is inconsistent it's hard to deal with and amounts to a bug. I'll have another look at it. Also, it sounds like I'm working from a more recent series. I had 8 patches, dropped the last three and added the one I posted. If I can work out what's going on I'll post the series for you to check. Ian > > > kernfs: use kernfs read lock in .getattr() and .permission() > > > > From: Ian Kent > > > > From Documenation/filesystems.rst and (slightly outdated) comments > > in fs/attr.c the inode i_rwsem is used for attribute handling. > > > > This lock satisfies the requirememnts needed to reduce lock > > contention, > > namely a per-object lock needs to be used rather than a file system > > global lock with the kernfs node db held stable for read > > operations. > > > > In particular it should reduce lock contention seen when calling > > the > > kernfs .permission() method. > > > > The inode methods .getattr() and .permission() do not hold the > > inode > > i_rwsem lock when called as they are usually read operations. Also > > the .permission() method checks for rcu-walk mode and returns > > -ECHILD > > to the VFS if it is set. So the i_rwsem lock can be used in > > kernfs_iop_getattr() and kernfs_iop_permission() to protect the > > inode > > update done by kernfs_refresh_inode(). Using this lock allows the > > kernfs node db write lock in these functions to be changed to a > > read > > lock. > > > > Signed-off-by: Ian Kent > > --- > > fs/kernfs/inode.c | 12 ++++++++---- > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c > > index ddaf18198935..568037e9efe9 100644 > > --- a/fs/kernfs/inode.c > > +++ b/fs/kernfs/inode.c > > @@ -189,9 +189,11 @@ int kernfs_iop_getattr(const struct path > > *path, struct kstat *stat, > > struct inode *inode = d_inode(path->dentry); > > struct kernfs_node *kn = inode->i_private; > > > > - down_write(&kernfs_rwsem); > > + inode_lock(inode); > > + down_read(&kernfs_rwsem); > > kernfs_refresh_inode(kn, inode); > > - up_write(&kernfs_rwsem); > > + up_read(&kernfs_rwsem); > > + inode_unlock(inode); > > > > generic_fillattr(inode, stat); > > return 0; > > @@ -281,9 +283,11 @@ int kernfs_iop_permission(struct inode *inode, > > int mask) > > > > kn = inode->i_private; > > > > - down_write(&kernfs_rwsem); > > + inode_lock(inode); > > + down_read(&kernfs_rwsem); > > kernfs_refresh_inode(kn, inode); > > - up_write(&kernfs_rwsem); > > + up_read(&kernfs_rwsem); > > + inode_unlock(inode); > > > > return generic_permission(inode, mask); > > } > > > > thanks, > fox