Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp4903277rwl; Tue, 28 Mar 2023 13:02:05 -0700 (PDT) X-Google-Smtp-Source: AKy350YmKU9jdqnoa8MRxxtGiX7X6QxvVQCgjZeWSRzee4THpbkiyUIm9RUdsrXUaxSzchFk4U7h X-Received: by 2002:a17:903:787:b0:1a1:b5ce:5d03 with SMTP id kn7-20020a170903078700b001a1b5ce5d03mr13076709plb.10.1680033725632; Tue, 28 Mar 2023 13:02:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680033725; cv=none; d=google.com; s=arc-20160816; b=dzFLNw/a14pkV9/A770WZ5Zrnj4mJuzSemtlYvoGsECJD9J6Ju5UifiOKzMkPuu7NN 8EKz+JE1Z0t/FkYZEY2Aks5HBdDOphQ/IrQVSOXomdxVCgt1j5a4DqCOXbPQRvL8ZPA8 syX5uIOOAode0Mi3Rt4h3LorLCKN66FYGDzgL5Br2S8dMmkc9jBwXW0gV6DW0VRVV/01 E2otugpuW6QLywi+Ir+ISoIwbz+MpJxYpNo+5n7MekUk766VjV36SgNUJpbV3fWAiLUp DjGqQppkrgxvlGRKGyA5Jn96l5jm35buCznnr9/Nys3W5UXODkxcpfuZb8/7es99CGmG MBWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=X91DtJ/JfDBYHgJ9R+MsUlxcd9N/rlmrW5KIlOR8liw=; b=B7iKt+Vf2LfJy78rRmJnigJ52HzUzS57qk8ZKwAYvVMX1/txyHqF71OKPzf7IDN1Cg 10aHDGEEsMr4TcKyRtwVJKFurOjDuGvZ09aniRBZYPRBM2AWztIpT5rGj7TjFaxVTHBp hg4ZdcvWvoD4vLst3MW3WWaOjNf1pd4kvlx4/rrCMzUBDQovsBH77tgDb9uEJGzStYsp aNB8z7twM2Frxj+cdfECvaeuQ5MX+0JguLX038/ayxs3YtMk0zMkPWNByPDHFeEQB1YA Hf6tICqXbWec4fXm2sfHFKxOYgzWNuFh/0UyuK/x1OZ/ix18kPTR65ur+WycyATBthXo WM+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=NWYwEsRz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e6-20020a170902744600b0019a826d304dsi29288836plt.633.2023.03.28.13.01.11; Tue, 28 Mar 2023 13:02:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=NWYwEsRz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229889AbjC1T7C (ORCPT + 99 others); Tue, 28 Mar 2023 15:59:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229765AbjC1T6i (ORCPT ); Tue, 28 Mar 2023 15:58:38 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8D6D3C38 for ; Tue, 28 Mar 2023 12:58:04 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id q19so10339740wrc.5 for ; Tue, 28 Mar 2023 12:58:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1680033483; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=X91DtJ/JfDBYHgJ9R+MsUlxcd9N/rlmrW5KIlOR8liw=; b=NWYwEsRz9geMdPiZTptjEO/9APlaCqvEYwMbECwAldxXFGFACgIWK90ii/GU/MH0bv BZi+c0pfVDIua+d5qc0dsTUUYxTuWgQo943MJfDPirAdFrEzrxH9/wQ5VZbbD7+9M+8V VFdYMw4UFXT1TIsDaSj/BN3KRp2iVO55u2TfrvQohfiR8WW6ukbp+Qb7nink2lOLE/N2 QGPBXS7HapIA84q1rxlmR3mOhwTq0O2ZGtfFsDrPzi9MDfgUGNBDuP+zPhMWUUwJ7sTa pT+OQ4Sgm2fD8eEXcfi1nICr9juzJneQFELj28gTxjEZ8pY9F+uiPs6ZMU/JjykbSJ6J DMxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680033483; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X91DtJ/JfDBYHgJ9R+MsUlxcd9N/rlmrW5KIlOR8liw=; b=1zsCW13+42tbB2rs0Mdz8Zntws+IOosjTvzCaPDJjolVAGKvrLzvKX3qEQsUh7sZdr r2VmfHHMpx5AH5GpplZ062DrvVP7EydnKU41H1esLnoKwRDRCea+qredFQp3SmHDMkjg s3L2pOscWCMlTM84uA88GMuAF7Dcr2FdgDGbHbPE664IS1yzyISDd3AM+p6FTZA7HG/m 3gFNrFPAnGKmH6kfjPBfd2tuHEA5Y1Bf9xGCst43HAajnufxBsZyaV6fV6b11zSaAI27 L3dS7Bycl68qzbdo4JsQd7qnppskiY9sQOkddtv+p3whTlvVHfmS4sukEOMsuCF6zFcO Srvw== X-Gm-Message-State: AAQBX9cE1iItNyJojn49b1I/AmQ3Z6tdl+pTtonMXqIEeOYTl0SLegrB upSU/7PhvX7xpGQjt1zfNDvkRQ== X-Received: by 2002:a5d:54d0:0:b0:2d5:553a:93ac with SMTP id x16-20020a5d54d0000000b002d5553a93acmr14146098wrv.7.1680033483331; Tue, 28 Mar 2023 12:58:03 -0700 (PDT) Received: from usaari01.cust.communityfibre.co.uk ([2a02:6b6a:b566:0:8445:3123:91d7:959d]) by smtp.gmail.com with ESMTPSA id z6-20020a056000110600b002c557f82e27sm28248386wrw.99.2023.03.28.12.58.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Mar 2023 12:58:02 -0700 (PDT) From: Usama Arif To: dwmw2@infradead.org, tglx@linutronix.de, kim.phillips@amd.com, brgerst@gmail.com Cc: piotrgorski@cachyos.org, oleksandr@natalenko.name, arjan@linux.intel.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, x86@kernel.org, pbonzini@redhat.com, paulmck@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rcu@vger.kernel.org, mimoja@mimoja.de, hewenliang4@huawei.com, thomas.lendacky@amd.com, seanjc@google.com, pmenzel@molgen.mpg.de, fam.zheng@bytedance.com, punit.agrawal@bytedance.com, simon.evans@bytedance.com, liangma@liangbit.com, gpiccoli@igalia.com, David Woodhouse , Usama Arif , Mark Rutland Subject: [PATCH v17 3/8] cpu/hotplug: Add CPUHP_BP_PARALLEL_STARTUP state before CPUHP_BRINGUP_CPU Date: Tue, 28 Mar 2023 20:57:53 +0100 Message-Id: <20230328195758.1049469-4-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230328195758.1049469-1-usama.arif@bytedance.com> References: <20230328195758.1049469-1-usama.arif@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: David Woodhouse There is often significant latency in the early stages of CPU bringup, and time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and then waiting for it to make its way through hardware powerup and through firmware before finally reaching the kernel entry point and moving on through its startup. Allow a platform to register a pre-bringup CPUHP state to which each CPU can be stepped in parallel, thus absorbing some of that latency. There is a subtlety here: even with an empty CPUHP_BP_PARALLEL_STARTUP step, this means that *all* CPUs are brought through the prepare states all the way to CPUHP_BP_PARALLEL_STARTUP before any of them are taken to CPUHP_BRINGUP_CPU and then are allowed to run for themselves to CPUHP_ONLINE. So any combination of prepare/start calls which depend on A-B ordering for each CPU in turn would explore horribly. As an example, the X2APIC code prior to commit cefad862f238 ("x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel") would allocate a new cluster mask "just in case" and store it in a global variable in the prep stage, then the AP would potentially consume that preallocated structure and set the global pointer to NULL to be reallocated in CPUHP_X2APIC_PREPARE for the next CPU. Which doesn't work at all if the prepare step is run for all the CPUs first. Any platform enabling the CPUHP_BP_PARALLEL_STARTUP step must be reviewed and tested to ensure that such issues do not exist, and the existing behaviour of each AP through to CPUHP_BP_PREPARE_DYN and then immediately to CPUHP_BRINGUP_CPU and CPUHP_ONLINE only one at a time does not change unless such a state is registered. Note that this does *not* yet bring each AP to the CPUHP_BRINGUP_CPU state at the same time, only to the new CPUHP_BP_PARALLEL_STARTUP state. The final loop in bringup_nonboot_cpus() remains the same, bringing each AP in turn from the CPUHP_BP_PARALLEL_STARTUP (or all the way from CPUHP_OFFLINE) to CPUHP_BRINGUP_CPU and then waiting for that AP to do its own processing and reach CPUHP_ONLINE before releasing the next. Parallelising that part by bringing them all to CPUHP_BRINGUP_CPU and then waiting for them all to run to CPUHP_ONLINE at the same time is a more complicated exercise for the future. Signed-off-by: David Woodhouse Signed-off-by: Usama Arif Tested-by: Paul E. McKenney Tested-by: Kim Phillips Tested-by: Oleksandr Natalenko Tested-by: Guilherme G. Piccoli Reviewed-by: Mark Rutland Tested-by: Mark Rutland [arm64] --- include/linux/cpuhotplug.h | 22 ++++++++++++++++++++++ kernel/cpu.c | 38 +++++++++++++++++++++++++++++++++++--- 2 files changed, 57 insertions(+), 3 deletions(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index c6fab004104a..84efd33ed3a3 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -133,6 +133,28 @@ enum cpuhp_state { CPUHP_MIPS_SOC_PREPARE, CPUHP_BP_PREPARE_DYN, CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20, + /* + * This is an optional state if the architecture supports parallel + * startup. It's used to start bringing the CPU online (e.g. send + * the startup IPI) so that the APs can run in parallel through + * the low level startup code instead of waking them one by one in + * CPUHP_BRINGUP_CPU. This avoids waiting for the AP to react and + * shortens the serialized phase of the bringup. + * + * If the architecture registers this state, all APs will be taken + * to it (and thus through all prior states) before any is taken + * to the subsequent CPUHP_BRINGUP_CPU state. + */ + CPUHP_BP_PARALLEL_STARTUP, + + /* + * This step brings the AP online and takes it to the point where it + * manages its own state from here on. For the time being, the rest + * of the AP bringup is fully serialized despite running on the AP. + * If the architecture doesn't use the CPUHP_BP_PARALLEL_STARTUP + * state, this step also does all the work of bringing the CPU + * online. + */ CPUHP_BRINGUP_CPU, /* diff --git a/kernel/cpu.c b/kernel/cpu.c index 43e0a77f21e8..3382273ea3f4 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1504,13 +1504,45 @@ int bringup_hibernate_cpu(unsigned int sleep_cpu) void bringup_nonboot_cpus(unsigned int setup_max_cpus) { - unsigned int cpu; + unsigned int cpu, n = num_online_cpus(); + /* + * On architectures which have setup the CPUHP_BP_PARALLEL_STARTUP + * state, this invokes all BP prepare states and the parallel + * startup state sends the startup IPI to each of the to be onlined + * APs. This avoids waiting for each AP to respond to the startup + * IPI in CPUHP_BRINGUP_CPU. The APs proceed through the low level + * bringup code and then wait for the control CPU to release them + * one by one for the final onlining procedure in the loop below. + * + * For architectures which do not support parallel bringup all + * states are fully serialized in the loop below. + */ + if (!cpuhp_step_empty(true, cpuhp_get_step(CPUHP_BP_PARALLEL_STARTUP))) { + for_each_present_cpu(cpu) { + if (n++ >= setup_max_cpus) + break; + cpu_up(cpu, CPUHP_BP_PARALLEL_STARTUP); + } + } + + /* Do the per CPU serialized bringup to ONLINE state */ for_each_present_cpu(cpu) { if (num_online_cpus() >= setup_max_cpus) break; - if (!cpu_online(cpu)) - cpu_up(cpu, CPUHP_ONLINE); + + if (!cpu_online(cpu)) { + struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); + int ret = cpu_up(cpu, CPUHP_ONLINE); + + /* + * Due to the above preparation loop a failed online attempt + * might have only rolled back to CPUHP_BP_PARALLEL_STARTUP. Do the + * remaining cleanups. NOOP for the non parallel case. + */ + if (ret && can_rollback_cpu(st)) + WARN_ON(cpuhp_invoke_callback_range(false, cpu, st, CPUHP_OFFLINE)); + } } } -- 2.25.1