Received: by 2002:a05:7412:bbc7:b0:fc:a2b0:25d7 with SMTP id kh7csp2707947rdb; Mon, 5 Feb 2024 15:28:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IFNHep478ax+2NELBe1PcWUr3yP07+drrJBeIV5npW5mpZvW/b2+txVvpL/7mhHmAvpyvwm X-Received: by 2002:a50:9b03:0:b0:55c:8d17:1bea with SMTP id o3-20020a509b03000000b0055c8d171beamr6729307edi.17.1707175724973; Mon, 05 Feb 2024 15:28:44 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707175724; cv=pass; d=google.com; s=arc-20160816; b=uIkR39uT6qMT3dV4yKcwp/dS/8AGgX0vmj7yJJxZLjK1CkORVG5KXgSGyDIKdjED+6 6WgOAvY7kRQzLUkwFAJwLsGVzKSIotfB5w3woytvhOW7QoABP0RBiK1hHlfYoXtFRs4j 0T1eLN9iXKH/USs8WJEewfgZfNnF2LGl44OMHf/zBpoeCcDDAzw4WsfT/88mQB0ZuX6Y MtloW05Y7BKDVWrYcWmOwckKxed2fw/1NWSSjXAzpZoc3VpnODbhEOAZzq1t3pFnllYo x620/pt4UbRpPZ/h5FLBwTl/l66Yiwsg72BdxavDmmpYtwNr2Xro/jATPOKlpBsHSvQl ymiA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=JIIsFta4gl+5qv0d4n9e230yiIydCO8J8/m+BzgHJY4=; fh=35WXHuJ7Fh045JJTq8lK92G2ebKEE4IxYAHi24UzV7E=; b=EUBqsYLcglzmSzdXEDiAtQoJW5hxGxj775MC0W4OH/sJ5bmgNvOCg1XQ4D5+ZZLZCV CdSeXrWV/mCyWlZ7TyQvYn7XSrrh97FbP82tGNJ7Cwih6iRA6daqAYY/7iBnQNPoHaHS XvjosGC4TTst3kdzZh56e8+b9DUvUDW/vF2gPcU5+FVIDbdPXbggZJXwTatNqtzWSJTR bKDRo5tTeexoO1ByIc5AHR0371n3xmhJfjbgo6CInYPHwpc1PYjwOyKwNNO+0SUfaZUi PMbD4YpWCWjE33qvHg6OWWkV5ZkGFyXrXT9vpo6llUSonZH4soxAIZ/XpQ1gYq3tvQCj AjaA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=pLDN+XsQ; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-54084-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-54084-linux.lists.archive=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=1; AJvYcCUQhN9mMWOZWMnbehMjehgMGCg3rbg+0c3BrRpQTf66XRq6xcBnS3GKNR6ol6w5HC2H3MJtWKnyDlriU7C0oZbgwhYeVkppxXrtELgTqg== Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id b1-20020a05640202c100b0055fc5dcbfacsi384284edx.663.2024.02.05.15.28.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 15:28:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-54084-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=pLDN+XsQ; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-54084-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-54084-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 9E3901F25A12 for ; Mon, 5 Feb 2024 23:28:44 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CB91248CF0; Mon, 5 Feb 2024 23:28:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="pLDN+XsQ" Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 679FC433B0; Mon, 5 Feb 2024 23:28:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707175714; cv=none; b=eMCMI4gkVEXvhMojLOzI3U6FiZdfKNCa022IjQBaTr0l3CnZ8aHVGw8bsNqqVWLYQdZknFQjigM2hqiYTzStxE60uHeMSrVObbHzCtcVbwh9VT3GxUQnR3A+12Oh2bc3+ix2pgIHre3NC0xfwP2HEJBCpDaL5BBfVSJz6JbhnuE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707175714; c=relaxed/simple; bh=Fk0+oZAIDRthG+u8dq99dvPRt7uamgSucpDNlvNnMYs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=hBMkGJCc6p0PrcNWw+KRuVSFYoBbCYXrbPkWD4d5X0/GErucPgRmubR36GxDDnQN5xadI6GdvNXBCBn5UvnsJPNzHkaMoUbvgwNYC0+pQH0WyKcAwTmRfFd64ze4n11JVcSeeb9P48DJyW9Wgk+iNkRNjcocc9vc5u4yyeTcCg4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=pLDN+XsQ; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=JIIsFta4gl+5qv0d4n9e230yiIydCO8J8/m+BzgHJY4=; b=pLDN+XsQvO7t2QiBwVAsF2exSg 43QXbQgnOVnj1mYSjjWUqhQQIaxoabHGaq5NTkkUc3GWBbtub4I9WvDfCJ4SDXiDQMRY2eCs9WAxh h2Pr+dqnB24mjyDc7TAC5To74Gv5fyz1fZOAzz+wp95uGCssm5mC+T6Bnji08cIZrNUzvFEOlTWPa UItzDxkf1fGV/Rn/LShITEeTKCPhrUrpSSSYkjxbG0K+nr2zUconALB0abQAMr3SIhpMYoOV65Hbc Zq3wqKl7ex4LL/MbLFfxBj69gDS8zs3JSv8ES963hqgLJuYQe+rHhwo3gTNK/aVEhKPrwLRmH/d+9 eJWsmJ8g==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rX8O5-0000000AhsX-1yEZ; Mon, 05 Feb 2024 23:28:29 +0000 Date: Mon, 5 Feb 2024 23:28:29 +0000 From: Matthew Wilcox To: Dave Chinner Cc: JonasZhou-oc , viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, CobeChen@zhaoxin.com, LouisQi@zhaoxin.com, JonasZhou@zhaoxin.com Subject: Re: [PATCH] fs/address_space: move i_mmap_rwsem to mitigate a false sharing with i_mmap. Message-ID: References: <20240202093407.12536-1-JonasZhou-oc@zhaoxin.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Feb 05, 2024 at 02:22:18PM +1100, Dave Chinner wrote: > Intuition tells me that what the OP is seeing is the opposite case > to above: there is significant contention on the lock. In that case, > optimal "contention performance" comes from separating the lock and > the objects it protects into different cachelines. > > The reason for this is that if the lock and objects it protects are > on the same cacheline, lock contention affects both the lock and the > objects being manipulated inside the critical section. i.e. attempts > to grab the lock pull the cacheline away from the CPU that holds the > lock, and then accesses to the object that are protected by the lock > then have to pull the cacheline back. > > i.e. the cost of the extra memory fetch from an uncontended > cacheline is less than the cost of having to repeatedly fetch the > memory inside a critical section on a contended cacheline. > > I consider optimisation attempts like this the canary in the mine: > it won't be long before these or similar workloads report > catastrophic lock contention on the lock in question. Moving items > in the structure is equivalent to re-arranging the deck chairs > whilst the ship sinks - we might keep our heads above water a > little longer, but the ship is still sinking and we're still going > to have to fix the leak sooner rather than later... So the fundamental problem is our data structure. It's actually two problems wrapped up in one bad idea. i_mmap is a struct rb_root_cached: struct rb_root_cached { struct rb_root rb_root; struct rb_node *rb_leftmost; }; struct rb_root { struct rb_node *rb_node; }; so it's two pointers into the tree; one to the leftmost node, one to the topmost node. That means we're modifying one or both of these pointers frequently. I imagine it's the rb_root, which is being modified frequently because we're always ... appending to the right? And so we're rotating frequently. Worst case would be if we're appending to the left and modifying both pointers. There are things we could do to address this by making rotations less frequent for the common case, which I suspect is mapping the entire file. And perhaps we should do these things as a stopgap. The larger problem is that rbtrees suck. They suck the same way that linked lists suck; the CPU can't prefetch very far ahead (or in this case down). It can do a little more work in that it can prefetch both the left & right pointers, but it can't fetch the grandchildren until the children fetches have finished, so the dependent reads have us stumped. The solution to this problem is to change the interval tree data structure from an Red-Black tree to a B-tree, or something similar where we use an array of pointers instead of a single pointer. Not the Maple Tree; that is designed for non-overlapping ranges. One could take inspiration from the Maple Tree and design a very similar data structure that can store and iterate over overlapping ranges. I can't understand the macros this late at night, so I don't fully understand what the interval tree is doing, but I can probably make a more fully informed observation by the end of the week.