Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp50911pxf; Tue, 6 Apr 2021 14:43:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxXTgC/MBd24oV4E2lBbGaIsFK98LAm6BBf3eZRF2+nFV08dn51uWo+8FbYXfxamjpIgYxi X-Received: by 2002:a17:907:105c:: with SMTP id oy28mr74983ejb.552.1617745416965; Tue, 06 Apr 2021 14:43:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617745416; cv=none; d=google.com; s=arc-20160816; b=V9QJpZnkd6fcwdKUKLMIZOVRIE4hOUONcg2y81jNIHBbpCpnYig0xUjKlrghAuPiba MWwS9bZGPKq921CIofiA9yWB+UiCQzLE9Wjrn+gqkJykXXgr4BWSRA2sJ9RGyQ+XNOLY rANQTpEjvmw+t5yVAjl2MUjVlfwODRXJMQP57fHP/r6Pk8g3/TW4bOPvFi81m22juCI9 uh+TbzWwtm943Cr0VoyISD0Dp8UtOoj81EsYGaNXsb5V03fzL3vVMU1XY2EUsFY5tZMC yAgyU6R1f4ZQH3nt9V58eq3n40WcCWxv3h5cJWd1hwwhddNZp+GntvdyRzBP4ljPZxwW enHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=n9j/o5CWVEmp091j5mg+Xapd7QLEoLUeazxHn3dUsT8=; b=OFmk03DgyVvaWOumSoJk+tWRdgsavyN4eLEmgwm5+3Kpsr5IxucjZ70TWk9/gmi8+l YZUHgpTNR5p4/lbf4WWxGRvTUNc1RIY0RFGNS+LkQW0m0nBbNbqxdbJngCFRolvAiczf wRtbiNBOXu1nbKT6z/gBRNkf3B1VcJwq6XQyti42q7asiMfmn6L06I+6U2X8Anw0jStb RCiBHka/vz9Ng/tw4ndRghZjvLUduhJW27xFFc+/Z7y6TJN2Gntns6dIP3WiU2XsGFKi ASmJe6y1Bb6Zk1nOawEeJpMH9qH7q8DdI5sv8WYbk5itc9isatPVA7YFtFfXZMBMubZW IfZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y25si17709570ejb.626.2021.04.06.14.43.02; Tue, 06 Apr 2021 14:43:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344060AbhDFMd7 (ORCPT + 99 others); Tue, 6 Apr 2021 08:33:59 -0400 Received: from mail108.syd.optusnet.com.au ([211.29.132.59]:38069 "EHLO mail108.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238115AbhDFMd6 (ORCPT ); Tue, 6 Apr 2021 08:33:58 -0400 Received: from dread.disaster.area (pa49-181-239-12.pa.nsw.optusnet.com.au [49.181.239.12]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id 90F9E1AE6EA; Tue, 6 Apr 2021 22:33:48 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1lTktv-00Cq7S-Sz; Tue, 06 Apr 2021 22:33:47 +1000 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1lTktv-007ImW-JS; Tue, 06 Apr 2021 22:33:47 +1000 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Subject: [RFC PATCH 0/3] vfs: convert inode cache to hlist-bl Date: Tue, 6 Apr 2021 22:33:40 +1000 Message-Id: <20210406123343.1739669-1-david@fromorbit.com> X-Mailer: git-send-email 2.31.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=F8MpiZpN c=1 sm=1 tr=0 cx=a_idp_x a=gO82wUwQTSpaJfP49aMSow==:117 a=gO82wUwQTSpaJfP49aMSow==:17 a=3YhXtTcJ-WEA:10 a=oVvoxvwLNVPEuiSh1sgA:9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi folks, Recently I've been doing some scalability characterisation of various filesystems, and one of the limiting factors that has prevented me from exploring filesystem characteristics is the inode hash table. namely, the global inode_hash_lock that protects it. This has long been a problem, but I personally haven't cared about it because, well, XFS doesn't use it and so it's not a limiting factor for most of my work. However, in trying to characterise the scalability boundaries of bcachefs, I kept hitting against VFS limitations first. bcachefs hits the inode hash table pretty hard and it becaomse a contention point a lot sooner than it does for ext4. Btrfs also uses the inode hash, but it's namespace doesn't have the capability to stress the indoe hash lock due to it hitting internal contention first. Long story short, I did what should have been done a decade or more ago - I converted the inode hash table to use hlist-bl to split up the global lock. This is modelled on the dentry cache, with one minor tweak. That is, the inode hash value cannot be calculated from the inode, so we have to keep a record of either the hash value or a pointer to the hlist-bl list head that the inode is hashed into so taht we can lock the corect list on removal. Other than that, this is mostly just a mechanical conversion from one list and lock type to another. None of the algorithms have changed and none of the RCU behaviours have changed. But it removes the inode_hash_lock from the picture and so performance for bcachefs goes way up and CPU usage for ext4 halves at 16 and 32 threads. At higher thread counts, we start to hit filesystem and other VFS locks as the limiting factors. Profiles and performance numbers are in patch 3 for those that are curious. I've been running this in benchmarks and perf testing across bcachefs, btrfs and ext4 for a couple of weeks, and it passes fstests on ext4 and btrfs without regressions. So now it needs more eyes and testing and hopefully merging.... Cheers, Dave.