Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp2057580rdb; Mon, 20 Nov 2023 00:02:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IGy8uywUymrcpsRMAgMxFWfQpY3ngjqZERCnjZkBYT3NL25oafaYjvveJdaHVNzgLURmCm+ X-Received: by 2002:a05:6a00:1a88:b0:6b8:780:94e5 with SMTP id e8-20020a056a001a8800b006b8078094e5mr4651325pfv.18.1700467335717; Mon, 20 Nov 2023 00:02:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700467335; cv=none; d=google.com; s=arc-20160816; b=JlCutt9T2neI+IrBd1rC+APADyNKYTUzzUb29WN8dvuBP3ZhCKheaBMZ582QJQIHvE odVY+ItYxCYsQJt6vBMvuQ5AdHbwe0TsayMDb0ZK79ywZa9/RxVXIbxJAZ/e8qUYC740 SKKIrLREZkFLMokcnaZb4YkC7JEEAx/6Chgw1/aWrjFp5NOcDm9VIcDE87rUgwVxi7s9 tfh8eTBWW3yeK1vADcvrRt76I7idWm+7fXU6qGfb9fB7/H5QwLNBlCMCL5+O0Swbvyw1 Csq3bzcgiTPl1oOTP9MXE1BLlA8Sd5jDXnzbqgtDAZEkh28NbXJeqXU2TDOfyVpY+yeH uyXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=MZLLHSILZJb/nXk9ReBM9607kNQrxxO6Y5ks+GANV3g=; fh=6jz5NEXagKWhzZaejgr/WwPakIq+Y1K+mr9MpygfdiM=; b=xxhd7PdbJ5JfBPkA+52TOXc9eZC4MVfHH03bL+KXJNw9etDaDsd4vYd8fGWzx/OjfF 1OoRh2xOPGMaM80BPXU9NtUsZtUC+v5HZ6k+NlVXqeuJiLdtitlT4hDhxeW0dM3oASo6 oJpQcdgJi2VI2J2VJUDX9lkkvIMLSCWjfbRwiCbrFE2dzbcUmVpNDzXOrbKj+R7ySo6F eEmOGtvdOOzqCbu75wJOFVrHiJ0CjnMFAEBKA3xL2YopG78hh+yUTVNomgMvJorcCPfn WLMUgdY1YByty16ndyjJ7Tt0eTkt34RPvM3JpL3MiT/YI+AKfNBTq451CFeFrlqQsThC 9TnQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id o2-20020a63fb02000000b005bdf596188asi7282264pgh.667.2023.11.20.00.02.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 00:02:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id F22C08074100; Mon, 20 Nov 2023 00:01:54 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232111AbjKTIBi (ORCPT + 99 others); Mon, 20 Nov 2023 03:01:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232098AbjKTIBg (ORCPT ); Mon, 20 Nov 2023 03:01:36 -0500 Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C049B4 for ; Mon, 20 Nov 2023 00:01:32 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VwkI1Fs_1700467289; Received: from 30.97.48.46(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VwkI1Fs_1700467289) by smtp.aliyun-inc.com; Mon, 20 Nov 2023 16:01:30 +0800 Message-ID: <55ca937e-92ba-4d01-a8f1-3a2a66054451@linux.alibaba.com> Date: Mon, 20 Nov 2023 16:01:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH] mm: support large folio numa balancing To: Mel Gorman , "Huang, Ying" Cc: David Hildenbrand , akpm@linux-foundation.org, wangkefeng.wang@huawei.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, John Hubbard , Peter Zijlstra References: <606d2d7a-d937-4ffe-a6f2-dfe3ae5a0c91@redhat.com> <871qctf89m.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sf57en8n.fsf@yhuang6-desk2.ccr.corp.intel.com> <20231117100745.fnpijbk4xgmals3k@techsingularity.net> From: Baolin Wang In-Reply-To: <20231117100745.fnpijbk4xgmals3k@techsingularity.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 20 Nov 2023 00:01:55 -0800 (PST) On 11/17/2023 6:07 PM, Mel Gorman wrote: > On Wed, Nov 15, 2023 at 10:58:32AM +0800, Huang, Ying wrote: >> Baolin Wang writes: >> >>> On 11/14/2023 9:12 AM, Huang, Ying wrote: >>>> David Hildenbrand writes: >>>> >>>>> On 13.11.23 11:45, Baolin Wang wrote: >>>>>> Currently, the file pages already support large folio, and supporting for >>>>>> anonymous pages is also under discussion[1]. Moreover, the numa balancing >>>>>> code are converted to use a folio by previous thread[2], and the migrate_pages >>>>>> function also already supports the large folio migration. >>>>>> So now I did not see any reason to continue restricting NUMA >>>>>> balancing for >>>>>> large folio. >>>>> >>>>> I recall John wanted to look into that. CCing him. >>>>> >>>>> I'll note that the "head page mapcount" heuristic to detect sharers will >>>>> now strike on the PTE path and make us believe that a large folios is >>>>> exclusive, although it isn't. >>>> Even 4k folio may be shared by multiple processes/threads. So, numa >>>> balancing uses a multi-stage node selection algorithm (mostly >>>> implemented in should_numa_migrate_memory()) to identify shared folios. >>>> I think that the algorithm needs to be adjusted for PTE mapped large >>>> folio for shared folios. >>> >>> Not sure I get you here. In should_numa_migrate_memory(), it will use >>> last CPU id, last PID and group numa faults to determine if this page >>> can be migrated to the target node. So for large folio, a precise >>> folio sharers check can make the numa faults of a group more accurate, >>> which is enough for should_numa_migrate_memory() to make a decision? >> >> A large folio that is mapped by multiple process may be accessed by one >> remote NUMA node, so we still want to migrate it. A large folio that is >> mapped by one process but accessed by multiple threads on multiple NUMA >> node may be not migrated. >> > > This leads into a generic problem with large anything with NUMA > balancing -- false sharing. As it stands, THP can be false shared by > threads if thread-local data is split within a THP range. In this case, > the ideal would be the THP is migrated to the hottest node but such > support doesn't exist. The same applies for folios. If not handled So below check in should_numa_migrate_memory() can not avoid the false sharing of large folio you mentioned? Please correct me if I missed anything. /* * Destination node is much more heavily used than the source * node? Allow migration. */ if (group_faults_cpu(ng, dst_nid) > group_faults_cpu(ng, src_nid) * ACTIVE_NODE_FRACTION) return true; /* * Distribute memory according to CPU & memory use on each node, * with 3/4 hysteresis to avoid unnecessary memory migrations: * * faults_cpu(dst) 3 faults_cpu(src) * --------------- * - > --------------- * faults_mem(dst) 4 faults_mem(src) */ return group_faults_cpu(ng, dst_nid) * group_faults(p, src_nid) * 3 > group_faults_cpu(ng, src_nid) * group_faults(p, dst_nid) * 4; > properly, a large folio of any type can ping-pong between nodes so just > migrating because we can is not necessarily a good idea. The patch > should cover a realistic case why this matters, why splitting the folio > is not better and supporting data. Sure. For a private mapping, we should always migrate the large folio. The tricky part is the shared mapping as you and Ying said, which can have different scenarios, and I'm thinking about how to validate it. Do you have any suggestion? Thanks.