Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp619815rdb; Fri, 17 Nov 2023 08:05:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IFnj6ihOftTt20zaIlPu0PWhybK5RQwKV50U2cxxsZVU8bJ2sdXBt9l35dG4J4XOBYNmVJK X-Received: by 2002:a05:6820:2290:b0:587:2b3c:e11f with SMTP id ck16-20020a056820229000b005872b3ce11fmr109740oob.0.1700237126580; Fri, 17 Nov 2023 08:05:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700237126; cv=none; d=google.com; s=arc-20160816; b=VpJ28VyWhAGckMVOaJcPfZtpugJjonM/6f9qVZWtuu3ayEomVnhdCt80gdv7XGK+Fa bYttXBmanQmq+GPFGMp6nRvSARmuLIgTxvE5ewnFzvdxv7DGS0rv4Tm7pG078TYibP/m frq70r83X9Wgg3gjRPl7EYnCr16Mwwf2ZKINq1e4aeL2wUnEHMYKA/TVTmKilW02Wu85 AvaV3SDO2w3OsIttz2tyR5rVLR2hM/wtZuOsnVZh1UAiSZqAsh83ViSCuwcsaDdujENg AKtoez/Ajz+sTW/BnWw6Kc3KkD9J8mwhkV1/QHj7mWCjBUsJRX3IlyWn4z+X4oijUCNp ZQcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=sv1QJO/5ojlFj/n8OL5LnbwhF0zbol0fcV3u+zrIHJM=; fh=eVRNXTHMjnh7lKUmaTvyU6iADcAJ5dqgCSYON1B2L3s=; b=RkZJWUtuijMLhEqUwwlsWTfe9OWUEcfs/op45n3iunJcjYKUGItV+wGuFBrOiuxC9k KOKYRXqbzamvkmQwfQpFBnabVRA753AmKg3OdaWGm1E5MCspmenbvwmiUdvvL/SYDxOD 8G0NpPDxu5Z51+fwwHQIpyOd3G2Wo+/PnRCwkgsvqZg+upgfWnQFVI7tXYTa8d99RhDX IOe4UAtG+68JCO23KkO/oznJoyTStBEnPnnZuB972vdj27xVdGfn/s20WYo6TH5+GPZE 1Zw8ZGiu+bIdsi63mc9ecGqIFAk/xGZ4fwuAkM9YztNiYaCkvFMYOlU9qkJrlxP+o0kP NFAA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id y17-20020a4a6251000000b005879493e68fsi684654oog.3.2023.11.17.08.05.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Nov 2023 08:05:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 11E6D8244E57; Fri, 17 Nov 2023 08:05:24 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231569AbjKQQFD (ORCPT + 99 others); Fri, 17 Nov 2023 11:05:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230513AbjKQQFC (ORCPT ); Fri, 17 Nov 2023 11:05:02 -0500 Received: from outbound-smtp60.blacknight.com (outbound-smtp60.blacknight.com [46.22.136.244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 511C990 for ; Fri, 17 Nov 2023 08:04:57 -0800 (PST) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp60.blacknight.com (Postfix) with ESMTPS id D8722FA856 for ; Fri, 17 Nov 2023 16:04:55 +0000 (GMT) Received: (qmail 17770 invoked from network); 17 Nov 2023 16:04:55 -0000 Received: from unknown (HELO mail.blacknight.com) (mgorman@techsingularity.net@[81.17.254.21]) by 81.17.254.26 with ESMTPA; 17 Nov 2023 16:04:55 -0000 Date: Fri, 17 Nov 2023 16:04:53 +0000 From: Mel Gorman To: Peter Zijlstra Cc: "Huang, Ying" , Baolin Wang , David Hildenbrand , akpm@linux-foundation.org, wangkefeng.wang@huawei.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, John Hubbard Subject: Re: [RFC PATCH] mm: support large folio numa balancing Message-ID: <20231117160453.dkbpwub7aq3jxksf@techsingularity.net> References: <606d2d7a-d937-4ffe-a6f2-dfe3ae5a0c91@redhat.com> <871qctf89m.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sf57en8n.fsf@yhuang6-desk2.ccr.corp.intel.com> <20231117100745.fnpijbk4xgmals3k@techsingularity.net> <20231117101343.GH3818@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20231117101343.GH3818@noisy.programming.kicks-ass.net> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 17 Nov 2023 08:05:24 -0800 (PST) On Fri, Nov 17, 2023 at 11:13:43AM +0100, Peter Zijlstra wrote: > On Fri, Nov 17, 2023 at 10:07:45AM +0000, Mel Gorman wrote: > > > This leads into a generic problem with large anything with NUMA > > balancing -- false sharing. As it stands, THP can be false shared by > > threads if thread-local data is split within a THP range. In this case, > > the ideal would be the THP is migrated to the hottest node but such > > support doesn't exist. The same applies for folios. If not handled > > properly, a large folio of any type can ping-pong between nodes so just > > migrating because we can is not necessarily a good idea. The patch > > should cover a realistic case why this matters, why splitting the folio > > is not better and supporting data. > > Would it make sense to have THP merging conditional on all (most?) pages > having the same node? Potentially yes, maybe with something similar to max_ptes_none, but it has corner cases of it's own. THP can be allocated up-front so we don't get the per-base-page hints unless the page is first split. I experimented with this once upon a time but cost post-splitting was not offset by the smarter NUMA placement. While we could always allocate small pages and promote later (originally known as the promotion threshold), that was known to have significant penalties of it's own so we still eagerly allocate THP. Part of that is that KVM was the main load to benefit from THP and always preferred eager promotion. Even if we always started with base pages, sparse addressing within the THP range may mean the threshold for collapsing can never be reached. Both THP and folios have the same false sharing problem but at least we knew about the false sharing problem for THP and NUMA balancing. It was found initially that THP false sharing is mostly an edge-case issue mitigated by the fact that large anonymous buffers tended to be either 2M aligned or only affected the boundaries. Later glibc and ABI changes made it even more likely that private buffers were THP-aligned. The same is not true of folios and it is a new problem so I'm uncomfortable with a patch that essentially says "migrate folios because it's possible" without considering any of the corner cases or measuring them. -- Mel Gorman SUSE Labs