Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp311947rdg; Thu, 12 Oct 2023 06:31:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEYMGmjNWKM/gi32phYv4jhyzOHxBt65DARBLvfeZ0ybi6mypu28laKvl6O3rXeCVVlNpug X-Received: by 2002:a05:6a20:1447:b0:14e:3daf:fdb9 with SMTP id a7-20020a056a20144700b0014e3daffdb9mr30327605pzi.22.1697117488103; Thu, 12 Oct 2023 06:31:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697117488; cv=none; d=google.com; s=arc-20160816; b=tvrBZpxHhGYoMt9Q46REn0FSxpITMosy2CTkNHnCjBwSvjwQJDDQbpm+zsSQNdX8mS Eklyfgj29fms6HnjuJWUfmVxYT1dWxPS0+WwztXi3d2JSbmLzSpu/icbno6hKiUgFBQU WuEJz3hP2tgnVCMP811LOR7i2bFbzcDUsclwENXLbrt/4hcyEWe8w2ZqHoX8lvraHDiv mZ5AUvvnFQd/aXFlpGJxYDAH+vw0wmHV7EVOGPWHfl9Kjs5tKW3LQiMeATzPx2JMUEba ed6CqYkXsnqauK5W8JY+Wbs/QWQ5wdA2ilr5BUyOFUJDNkDY8+n4BqEITi6S+eaX5DQq hI1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id; bh=7iaCKuAxIzhL4ULWDimSrGEpl+pDc849Yiq/EPdrHNA=; fh=H73kDWlq1GCEXTyuaSTLt9yBiRg+XNoL2hXL7Le+GLU=; b=bY/4Tja1/qFk+9aaqv/79r1FDKT/XbABFtLb4fPp03NXWj0CeWZLHoXzdy9RM3lUAu noPyuGS6OqUR7LoxRBPH4brYzHt3U8gl5yVrSOe5rt24fr2qDEW4kRXGIgckTyoS4TRa ap/MgVVAP2kZ5Ma8bUInc+mNFfr1X7TzZ7SXTwVqtADJn4PfmTfYQiuwE3jmDnx0dDSY dyDP50PABLX576IlYrmPRrS24q3uRIJpb9BIt8tpaEPPKI0+EjohbPWLnBgPgwJ+SUZv kqvYPyQ4rTcI4u+7fmcGHPtL5gmP3tTWihJK9jCF/Xs0VKO3JgYJN1UJmcxuZnqkG4q7 RPwg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id bs190-20020a6328c7000000b00565e424cbc7si2177796pgb.109.2023.10.12.06.31.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Oct 2023 06:31:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 4198E807CF54; Thu, 12 Oct 2023 06:31:25 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347292AbjJLNbH (ORCPT + 99 others); Thu, 12 Oct 2023 09:31:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347294AbjJLNbA (ORCPT ); Thu, 12 Oct 2023 09:31:00 -0400 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B997BA for ; Thu, 12 Oct 2023 06:30:56 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R671e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0Vu-xY8c_1697117451; Received: from 30.27.105.7(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0Vu-xY8c_1697117451) by smtp.aliyun-inc.com; Thu, 12 Oct 2023 21:30:52 +0800 Message-ID: <57eba42c-732a-4a30-a714-5e5538f2e5d5@linux.alibaba.com> Date: Thu, 12 Oct 2023 21:30:50 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC 0/5] support NUMA emulation for arm64 To: Pierre Gondois , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: akpm@linux-foundation.org, willy@infradead.org, catalin.marinas@arm.com, dave.hansen@linux.intel.com, tj@kernel.org, mingo@redhat.com References: <20231012024842.99703-1-rongwei.wang@linux.alibaba.com> Content-Language: en-US From: Rongwei Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.7 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 12 Oct 2023 06:31:25 -0700 (PDT) On 2023/10/12 20:37, Pierre Gondois wrote: > Hello Rongwei, > > On 10/12/23 04:48, Rongwei Wang wrote: >> A brief introduction >> ==================== >> >> The NUMA emulation can fake more node base on a single >> node system, e.g. >> >> one node system: >> >> [root@localhost ~]# numactl -H >> available: 1 nodes (0) >> node 0 cpus: 0 1 2 3 4 5 6 7 >> node 0 size: 31788 MB >> node 0 free: 31446 MB >> node distances: >> node   0 >>    0:  10 >> >> add numa=fake=2 (fake 2 node on each origin node): >> >> [root@localhost ~]# numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 1 2 3 4 5 6 7 >> node 0 size: 15806 MB >> node 0 free: 15451 MB >> node 1 cpus: 0 1 2 3 4 5 6 7 >> node 1 size: 16029 MB >> node 1 free: 15989 MB >> node distances: >> node   0   1 >>    0:  10  10 >>    1:  10  10 >> >> As above shown, a new node has been faked. As cpus, the realization >> of x86 NUMA emulation is kept. Maybe each node should has 4 cores is >> better (not sure, next to do if so). >> >> Why do this >> =========== >> >> It seems has following reasons: >>    (1) In x86 host, apply NUMA emulation can fake more nodes environment >>        to test or verify some performance stuff, but arm64 only has >>        one method that modify ACPI table to do this. It's troublesome >>        more or less. >>    (2) Reduce competition for some locks. Here an example we found: >>        will-it-scale/tlb_flush1_processes -t 96 -s 10, it shows obvious >>        hotspot on lruvec->lock when test in single environment. What's >>        more, The performance improved greatly if test in two more nodes >>        system. The data shows below (more is better): >> >> --------------------------------------------------------------------- >>        threads/process |   1     |     12   |     24   | 48     |   96 >> --------------------------------------------------------------------- >>        one node        | 14 1122 | 110 5372 | 111 2615 | 79 7084  | >> 72 4516 >> --------------------------------------------------------------------- >>        numa=fake=2     | 14 1168 | 144 4848 | 215 9070 | 157 0412 | >> 142 3968 >> --------------------------------------------------------------------- >>                        | For concurrency 12, no lruvec->lock hotspot. >> For 24, >>        hotspot         | one node has 24% hotspot on lruvec->lock, but >>                        | two nodes env hasn't. >> --------------------------------------------------------------------- >> >> As for risks (e.g. numa balance...), they need to be discussed here. >> >> Lastly, this just is a draft, I can improve next if it's acceptable. > > I'm not engaging on the utility/relevance of the patch-set, but I tried > them on an arm64 system with the 'numa=fake=2' parameter and could not Sorry, my fault. I should mention this in previous brief introduction: acpi=on numa=fake=2. The default patch of arm64 numa initialize is numa_init() -> dummy_numa_init() if turn off acpi (this path has not been taken into account yet in this patch, next will to do). What's more, if you test these patchset in qemu-kvm, you should add below parameters in the script. object memory-backend-ram,id=mem0,size=32G \ numa node,memdev=mem0,cpus=0-7,nodeid=0 \ (Above parameters just make sure SRAT table has NUMA configure, avoiding path of numa_init() -> dummy_numa_init()) > see 2 nodes being created under: >   /sys/devices/system/node/ > Indeed it seems that even though numa_emulation() is moved to a generic > mm/numa.c file, the function is only called from: >   arch/x86/mm/numa.c:numa_init() > (or maybe I'm misinterpreting the intent of the patches). Here drivers/base/arch_numa.c:numa_init() has called numa_emulation() (I guess it works if you add acpi=on :-)). > > Also I had the following errors when building (still for arm64): > mm/numa.c:862:8: error: implicit declaration of function > 'early_cpu_to_node' is invalid in C99 > [-Werror,-Wimplicit-function-declaration] >         nid = early_cpu_to_node(cpu); It seems CONFIG_DEBUG_PER_CPU_MAPS enabled in your environment? You can disable CONFIG_DEBUG_PER_CPU_MAPS and test it again. I have not test it with CONFIG_DEBUG_PER_CPU_MAPS enabled. It's very helpful, I will fix it next time. If you have any questions, please let me know. Regards, -wrw > ^ > mm/numa.c:862:8: note: did you mean 'early_map_cpu_to_node'? > ./include/asm-generic/numa.h:37:13: note: 'early_map_cpu_to_node' > declared here > void __init early_map_cpu_to_node(unsigned int cpu, int nid); >             ^ > mm/numa.c:874:3: error: implicit declaration of function > 'debug_cpumask_set_cpu' is invalid in C99 > [-Werror,-Wimplicit-function-declaration] >                 debug_cpumask_set_cpu(cpu, nid, enable); >                 ^ > mm/numa.c:874:3: note: did you mean '__cpumask_set_cpu'? > ./include/linux/cpumask.h:474:29: note: '__cpumask_set_cpu' declared here > static __always_inline void __cpumask_set_cpu(unsigned int cpu, struct > cpumask *dstp) >                             ^ > 2 errors generated. > > Regards, > Pierre > >> >> Thanks! >> >> Rongwei Wang (5): >>    mm/numa: move numa emulation APIs into generic files >>    mm: percpu: fix variable type of cpu >>    arch_numa: remove __init in early_cpu_to_node() >>    mm/numa: support CONFIG_NUMA_EMU for arm64 >>    mm/numa: migrate leftover numa emulation into mm/numa.c >> >>   arch/x86/Kconfig                          |   8 - >>   arch/x86/include/asm/numa.h               |   3 - >>   arch/x86/mm/Makefile                      |   1 - >>   arch/x86/mm/numa.c                        | 216 +------------- >>   arch/x86/mm/numa_internal.h               |  14 +- >>   drivers/base/arch_numa.c                  |   7 +- >>   include/asm-generic/numa.h                |  33 +++ >>   include/linux/percpu.h                    |   2 +- >>   mm/Kconfig                                |   8 + >>   mm/Makefile                               |   1 + >>   arch/x86/mm/numa_emulation.c => mm/numa.c | 333 +++++++++++++++++++++- >>   11 files changed, 373 insertions(+), 253 deletions(-) >>   rename arch/x86/mm/numa_emulation.c => mm/numa.c (63%) >>