Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp405397rdb; Fri, 17 Nov 2023 02:08:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IEeJy9pFKFnFQqYEuGbxt4G39yFSgGjP7gleV4dljLT4QXljgHK5jQ82dnRudWIA80nze0r X-Received: by 2002:aa7:9115:0:b0:6bd:3157:2dfe with SMTP id 21-20020aa79115000000b006bd31572dfemr14059690pfh.7.1700215689593; Fri, 17 Nov 2023 02:08:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700215689; cv=none; d=google.com; s=arc-20160816; b=vM610IoTl1fbzGZDeO3XTGoxe6/4nlW68C9+TB+Kf+KKwaMXpc0v7KUQFHn63DHIYa 98ZNs9TXA4zNS+tQpQScQxEKTaYAHPlvc8DI/abvA5WlUunWNV3dIj9nTsdRsT29kcx6 DSOZ6KC+SbAhof07sxVKLILI2hW9OylUQ71Ymr1YX0XZ0ZEebFHxQs8lJPuKWYUvu11C 0DJkXl54ObCOY2A4pK7RGSsvhwwbzFyVTZXs2nECrUdmQXVitS6UHm7lKVKEN65GURTA RhxW1v6zFwXm92Hv2dA2HdRpyZYN+4VBt+1YR9Di6uV+Ui+r6iGMxg7aO8zO2GeQ0Q+8 g+1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=xBn8J/Ez+suCRdPZOaf6o61TxeFiUYpaXReY3tbHmPM=; fh=R9Psn9uUcRaLqK0E+MjTJpACrzd28M2G1kTEViBgZx4=; b=y6fj9ZxZg3y4GFakYAiAkpuaWJr3kgyt+1cfW/GHHPrIJM69WSAQZRg2GwjFHswnK8 jESgJL8zeixyA8yup6ZXeYW+zCjwd9v/Wy8Iv1E+VhPMlu10C9WEG+WITyWPcadzc6s5 IbyDBr6XxwFjDf+pzINwEBDWdEnt+KbJ0NgWTkhI+hl5t/0xAuVyvVUyDWlLZKfifP5n B79F5+7WpcMV1F1Ci27dk7aKkqxNX0v/yuTBXsFIZ+wmBlfmP16pKNVCxdOQhIgn5666 einOo2qVKkZCgbBy7q3sUfRl+OMSyQWI4KrN9UFnSP9IBzEg0Tke5DxB+UKIutZSj4WQ nceg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id x13-20020a056a00188d00b0068fe0f46f27si1689705pfh.171.2023.11.17.02.08.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Nov 2023 02:08:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id A15788285F5B; Fri, 17 Nov 2023 02:08:05 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345734AbjKQKH4 (ORCPT + 99 others); Fri, 17 Nov 2023 05:07:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230452AbjKQKHz (ORCPT ); Fri, 17 Nov 2023 05:07:55 -0500 Received: from outbound-smtp60.blacknight.com (outbound-smtp60.blacknight.com [46.22.136.244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71CFA85 for ; Fri, 17 Nov 2023 02:07:50 -0800 (PST) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp60.blacknight.com (Postfix) with ESMTPS id E8D2EFB828 for ; Fri, 17 Nov 2023 10:07:48 +0000 (GMT) Received: (qmail 31407 invoked from network); 17 Nov 2023 10:07:48 -0000 Received: from unknown (HELO mail.blacknight.com) (mgorman@techsingularity.net@[81.17.254.19]) by 81.17.254.26 with ESMTPA; 17 Nov 2023 10:07:48 -0000 Date: Fri, 17 Nov 2023 10:07:45 +0000 From: Mel Gorman To: "Huang, Ying" Cc: Baolin Wang , David Hildenbrand , akpm@linux-foundation.org, wangkefeng.wang@huawei.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, John Hubbard , Peter Zijlstra Subject: Re: [RFC PATCH] mm: support large folio numa balancing Message-ID: <20231117100745.fnpijbk4xgmals3k@techsingularity.net> References: <606d2d7a-d937-4ffe-a6f2-dfe3ae5a0c91@redhat.com> <871qctf89m.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sf57en8n.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <87sf57en8n.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 17 Nov 2023 02:08:05 -0800 (PST) On Wed, Nov 15, 2023 at 10:58:32AM +0800, Huang, Ying wrote: > Baolin Wang writes: > > > On 11/14/2023 9:12 AM, Huang, Ying wrote: > >> David Hildenbrand writes: > >> > >>> On 13.11.23 11:45, Baolin Wang wrote: > >>>> Currently, the file pages already support large folio, and supporting for > >>>> anonymous pages is also under discussion[1]. Moreover, the numa balancing > >>>> code are converted to use a folio by previous thread[2], and the migrate_pages > >>>> function also already supports the large folio migration. > >>>> So now I did not see any reason to continue restricting NUMA > >>>> balancing for > >>>> large folio. > >>> > >>> I recall John wanted to look into that. CCing him. > >>> > >>> I'll note that the "head page mapcount" heuristic to detect sharers will > >>> now strike on the PTE path and make us believe that a large folios is > >>> exclusive, although it isn't. > >> Even 4k folio may be shared by multiple processes/threads. So, numa > >> balancing uses a multi-stage node selection algorithm (mostly > >> implemented in should_numa_migrate_memory()) to identify shared folios. > >> I think that the algorithm needs to be adjusted for PTE mapped large > >> folio for shared folios. > > > > Not sure I get you here. In should_numa_migrate_memory(), it will use > > last CPU id, last PID and group numa faults to determine if this page > > can be migrated to the target node. So for large folio, a precise > > folio sharers check can make the numa faults of a group more accurate, > > which is enough for should_numa_migrate_memory() to make a decision? > > A large folio that is mapped by multiple process may be accessed by one > remote NUMA node, so we still want to migrate it. A large folio that is > mapped by one process but accessed by multiple threads on multiple NUMA > node may be not migrated. > This leads into a generic problem with large anything with NUMA balancing -- false sharing. As it stands, THP can be false shared by threads if thread-local data is split within a THP range. In this case, the ideal would be the THP is migrated to the hottest node but such support doesn't exist. The same applies for folios. If not handled properly, a large folio of any type can ping-pong between nodes so just migrating because we can is not necessarily a good idea. The patch should cover a realistic case why this matters, why splitting the folio is not better and supporting data. -- Mel Gorman SUSE Labs