Received: by 2002:a05:7412:2a8a:b0:fc:a2b0:25d7 with SMTP id u10csp43535rdh; Tue, 6 Feb 2024 18:40:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IFuA6dX+5E8DUUIGfAV8KUtgI+hLAYP2qEZMninYNkfiRk3oX86wNHoM5F0BCYnlXdy9D0n X-Received: by 2002:a92:b10:0:b0:363:c2f8:fd12 with SMTP id b16-20020a920b10000000b00363c2f8fd12mr3863756ilf.30.1707273636065; Tue, 06 Feb 2024 18:40:36 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707273636; cv=pass; d=google.com; s=arc-20160816; b=RU6LkwteqT4dG6TK/eLu2sKg0X6RssdB8qC8ief1h+CeAr5mqddCeFPSyp0fMVeqv5 6h88pEF0NjdKj80DLOizq5EK002NKCBGhlqbFVXiU0hLxUuQ2ZEe64YnzwLZmb3eiNVr O8jCBM0RdNGgPkLste7BOH33P49uOXn82Yguovb+75WaRvUvUvZwMKHSlzgZXV8fz2s+ xFAFfVaFltplHJBZeKiOdZM7uHY53CjW3PCfpM2AnAAC0w4bgT4JFCfGTl1tbtrjmiv2 ym8I7yXt5oUATYkixx8ehU4sUNhkDFPlW+vqe6JxCkenU42cN3x2KH8oumrisdYa/j1w SAjg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=8/qSJOR+zkPPH2i9c8ryKssFXsF/wlJRBdyNQ7Hk1aE=; fh=t+NnJQ1CGN+NulN+Bmwb7hU8/ieYU3Jgdo/vPUTmCHk=; b=ISSQDfTDGPyf2DWlzckGFBASLOUiKZe8srtmnrUD6ZC4E5yYK5Vw0GCyVoELwYvXRq 2ibUbnNbIRmJM6MF6LLxMF2FohiK9I8YDvrlNI0Y52SAu0+b4JIUIKu+xgfZn4tii/tv bRUdDAc3tBgDadyWJ2M1yAG7UE9/Sttmq7zhJDuMPWMeo5PDEQZGn+CxlQMTW3exz8c+ TW1+Y8HQ5ZDhJ5LwaM4vDlXcodzEmFi+/4f1ZZo28Q/l0ePcEuXZkkzJUA5GalQlnhGL gcRw8vx/2jbvtgEmCJk2FGPNeBwdaObfMadiB2AW/C+tnCnPFxFGfkXHN/zqYE+npIYR GvYA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=DqohHpns; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-55731-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-55731-linux.lists.archive=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCX3n0pZMfBCfdTV9YzGyDP21vLxj2xIX8eGs44ttMFGnBSnkjtA5n/cnWHIEU1/kdg1GejvKTtQS8FNqqCJgPrLgpVt/s12uUweQzELOg== Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id r11-20020a170903410b00b001d55aa9ffecsi445432pld.253.2024.02.06.18.40.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Feb 2024 18:40:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-55731-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=DqohHpns; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-55731-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-55731-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 47786286E6B for ; Tue, 6 Feb 2024 23:34:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D45371CF9A; Tue, 6 Feb 2024 23:34:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="DqohHpns" Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B74B1CD2D; Tue, 6 Feb 2024 23:33:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707262439; cv=none; b=Wp+kPyXC9/ocS+26FExWNj6hsFCo2JZtMTevWwMxR73sQEcI6MtN6wGkLScNia0/DH17Oi7VP3vQpzURiZx3Pu2G9EyPlq7MAxT6EqJC8hfU4PsyAY5LFjJRa7gEs/ASg4AbWNoDARHtTTHoqplBECMZRj4qusGYzbjTs9uXm74= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707262439; c=relaxed/simple; bh=2ujnuBthhHC/WoSoD6QQnKnumAsVCVROyse7z/cd2Sk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=aF8ijqeXeJWedsx4I4J9xdTzauUxkCSodJkBjin6O0WLbAFEROlnfRqwgQIJ64I1Yq5WgD49UWN1fiTgvA+++S5LQw3X0NIvARc3LVQqG5nafN0mE6osI14GjRBXoataVSMxXLDbbwAc0i+bGvvnVUVHjCNvF41Jv3TSKP5/hZo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=DqohHpns; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=8/qSJOR+zkPPH2i9c8ryKssFXsF/wlJRBdyNQ7Hk1aE=; b=DqohHpnsHhxigSOb2sDDtK0Fhc fyh+uGH2Fco/NlpE7qCu1h0/QL08p7YxPsuyrr82HitYHg6Wl50QXE2uT6iPlhiH8TG+3oGbvJTlN Q2tM0egXmVzf8Cq2CKFxaPIwHE6GHtD6rc9vs5lJ34/ALMtAKvYB0XomAiNX97VaykYZH+PilsDL4 g/g3eMmOauaS3wMA5ZNZ9wQqGJhIFxATgaihdzr1DPe9v6ITphy1NzysdvfqhkZDvsE3sYE7Ohn9k ZvQ2EzS7RqqmpLIyHYMtgJpHtE3wtyaMSjmMmuVZAa+NQ/6dEylc3uMbVwOcdBnsfaNinNliG2KP0 9QDf7r/w==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rXUwo-0000000DaA9-1Twe; Tue, 06 Feb 2024 23:33:50 +0000 Date: Tue, 6 Feb 2024 23:33:50 +0000 From: Matthew Wilcox To: Dave Chinner Cc: JonasZhou-oc , viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, CobeChen@zhaoxin.com, LouisQi@zhaoxin.com, JonasZhou@zhaoxin.com Subject: Re: [PATCH] fs/address_space: move i_mmap_rwsem to mitigate a false sharing with i_mmap. Message-ID: References: <20240202093407.12536-1-JonasZhou-oc@zhaoxin.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Feb 07, 2024 at 08:35:59AM +1100, Dave Chinner wrote: > > The solution to this problem is to change the interval tree data structure > > from an Red-Black tree to a B-tree, or something similar where we use > > an array of pointers instead of a single pointer. > > .... B-trees are not immune to pointer chasing problems, either. > Most use binary searches within the nodes, and so we have the > problem of unpredictable cacheline misses within the nodes as well > as being unable to do depth based prefetch similar to rbtrees. > > Perhaps we should be looking at something like this: > > https://lore.kernel.org/linux-fsdevel/20240126220655.395093-2-kent.overstreet@linux.dev/ I need more data (and maybe Kent has it!) Without any data, I believe that Eytzinger layout is an idea that was a really good one in the 1990s/2000s and has now passed its expiration date because of the changed nature of hardware. In the mid-90s, we could do O(10) instructions in the time it took to service a LLC miss. Today, it's more like O(2500). That means it is far more important to be predictable than it is to execute the minimum number of instructions. If your B-tree nodes are 256kB in size (I believe that's what bacachefs is using?), then Eytzinger layout may make some sense, but if you're using smaller nodes (which I further believe is appropriate for an in-memory B-tree), then a straight 'load each index and compare it' may outperform a search that jumps around inside a node. I'm pontificating and will happily yield to someone who has data. I've tried to mark my assumptions clearly above. Something else I'm not aware of (and filesystem B-trees will not have any experience of) is what research exists on efficiently adding new entries to a balanced tree so as to minimise rebalances. Filesystems are like the Maple Tree in that for every logical offset inside a file, there is precisely one answer to "what physical block does this map to". For the i_mmap tree, we want instead to answer the question "Which VMAs have an intersection with this range of the file", and for the benchmark in question, we will have a large number of entries that compare equal to each other (they have the same start, and same end, but different values). So they could be inserted at many different points in the tree. We'd like to choose the point which causes the least amount of rebalancing.