Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2451865rdf; Mon, 6 Nov 2023 14:59:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IHPhJVOmKOHyUQArjFXF+QB/dwUfpzX95arIMXgWRxH2h5XdcIUWEu2lftkWJcGlzhFwM/p X-Received: by 2002:a17:902:e54a:b0:1cc:70e4:28c7 with SMTP id n10-20020a170902e54a00b001cc70e428c7mr17355993plf.18.1699311562180; Mon, 06 Nov 2023 14:59:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699311562; cv=none; d=google.com; s=arc-20160816; b=VM+eB9kHfiY21ztT9W1B4cNs8kojgwdXfehU/eImMJSqPGKSN8S2e2qOBF7AamALLT 1L5jxksu60yarLkEXwngUlV+SdjiP1d/PKj6bcMc7O5sNhjGkE6yJwxEjZNdLTwlW3Kz q56UC9ZSkHKNQj92jcZ83tDh5327l+C3lIMZJsocEMCkX+1m7Yl1umPfSTALxyQZL2jC lS0Za4vMC4yp7bP1Hx7j6JFqd3FWWQAK4fl+/9YPGPL+krioJo/DqmYaeEL+JEH/gGFZ Lt6zGa7SnKHz5dCfkFE9pi61cKpEccWYg+z8218jGp54Al8ox5C/aEeXvUadhct4G1in T6BA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=JjTrbVO5DQDFfCNQXlxC4Kehdfa5zqUUgQgAbIyQh7E=; fh=1Hs7iDbEoFeXoyFS63HJ4vVb7wxOOMsjfhFh4CPbNGM=; b=Odj4Nqye+K4+YbPd8d6uvw9ZhPWieHKNHgxZICGSzFJXhA9rof8ED/kn3OGT7dvsMl fzBMvm2qZFhN4LyCa3JZ8k386DFTt3wJqcS5tmywHxy4iwvdKUl5VVlVTk0IL73IEf5D GEeHtCges74NOSuNPL8Prn0I0okvB8UfxXUgdEHf3xZlq5RRWjhp8RUjuChVrIYn/AIO m/xRMn2NyGkidKQ3clHgvliS0KTKco6/iNNxKIazs4coIHF07xQeesxGvcBQamgKqBnc fowzSi9ZoR2Roe4Kamh9CcjVBT+hTi6eCmMB3KkX70f9KGt/pmL+t0DXLuxMqupJTc7a gTkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=j5DHn6vi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id t3-20020a170902d14300b001cc5f995bccsi8265008plt.182.2023.11.06.14.59.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 14:59:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=j5DHn6vi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 5EFA880560EB; Mon, 6 Nov 2023 14:59:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233166AbjKFW7H (ORCPT + 99 others); Mon, 6 Nov 2023 17:59:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232565AbjKFW7H (ORCPT ); Mon, 6 Nov 2023 17:59:07 -0500 Received: from mail-oo1-xc2e.google.com (mail-oo1-xc2e.google.com [IPv6:2607:f8b0:4864:20::c2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD69510D for ; Mon, 6 Nov 2023 14:59:03 -0800 (PST) Received: by mail-oo1-xc2e.google.com with SMTP id 006d021491bc7-581f78a0206so2717688eaf.2 for ; Mon, 06 Nov 2023 14:59:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1699311543; x=1699916343; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=JjTrbVO5DQDFfCNQXlxC4Kehdfa5zqUUgQgAbIyQh7E=; b=j5DHn6vieo1kgHTc9hy5l4CvwvqcHXVRGmzHqvjHCXnxU8tif1KNszvLBzdPThftpJ DEfIdodPnFfMszJm8DoABMO0tuVEvfKtBZxOJrgbLKOeWPdMV1UlNYECNivAkGU/JCgR pHLvX6zjLId6Apob0esNSFwFsbO34fAHtJYl5VI6JWbe/OKzhegt+TdtGv67eWxD0rGk DPPAsoSqIGAaHnXK13YGPPJMnBZk5hXLU1R1fu6v3NEqPlGDZIZvXl8Vq+leZQbpaRoY kz3TxpCzwfdBWri/FpWJQ2XTiASqM7YJXE3CFwaviVX7BxerNvrbTE3PL2IhAZ9zfTgi 283g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699311543; x=1699916343; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JjTrbVO5DQDFfCNQXlxC4Kehdfa5zqUUgQgAbIyQh7E=; b=jSntUl3AHUyh0dnEYcy7hWDhqTr3mLWMa19o1/8SkDrfLP86HFTl0DNm599IDUaDwY f5ucUOxAaDtSxkDVdBYNIPrPVjOQTL8wzVIZf3nTx1YGEL9RV9n03mTY5FCW2vGikTvG MMrep834eGet3xACHpwD7DegaPMncKoCs7scdaXzeogiY9IMFhoNQEv1VZOytkWFHu2H 0sp36BttV+DJbFlIKteZFmL0k7LO0pYv6GqUCqRUSagcJq5BQoj1jsP1C/5kPJaqxwoa O5lIeDVbdfb84IGlYR6SMdSo9wN/rIgBvR9tqiVBoP2giDF/PdyT52OYGv03BAYN51fc kQuQ== X-Gm-Message-State: AOJu0YzyUUCKdBK1peLd1HDxls7cimLCGwNdJizfVuiMGX3F6t+X4CiM +0QHcvSipv/3UW93iqTpezBdJ/sWRL6i/dz9VkatHA== X-Received: by 2002:a4a:c919:0:b0:582:1477:8362 with SMTP id v25-20020a4ac919000000b0058214778362mr28266888ooq.4.1699311543185; Mon, 06 Nov 2023 14:59:03 -0800 (PST) Received: from evan.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id w15-20020a4aa98f000000b00586a945615csm1737346oom.6.2023.11.06.14.59.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 14:59:02 -0800 (PST) From: Evan Green To: Palmer Dabbelt Cc: Jisheng Zhang , Sebastian Andrzej Siewior , David Laight , Evan Green , Albert Ou , Andrew Jones , Anup Patel , =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= , Conor Dooley , Greentime Hu , Heiko Stuebner , Ley Foon Tan , Marc Zyngier , Palmer Dabbelt , Paul Walmsley , Sunil V L , linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org Subject: [PATCH v3] RISC-V: Probe misaligned access speed in parallel Date: Mon, 6 Nov 2023 14:58:55 -0800 Message-Id: <20231106225855.3121724-1-evan@rivosinc.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 06 Nov 2023 14:59:19 -0800 (PST) Probing for misaligned access speed takes about 0.06 seconds. On a system with 64 cores, doing this in smp_callin() means it's done serially, extending boot time by 3.8 seconds. That's a lot of boot time. Instead of measuring each CPU serially, let's do the measurements on all CPUs in parallel. If we disable preemption on all CPUs, the jiffies stop ticking, so we can do this in stages of 1) everybody except core 0, then 2) core 0. The allocations are all done outside of on_each_cpu() to avoid calling alloc_pages() with interrupts disabled. For hotplugged CPUs that come in after the boot time measurement, register CPU hotplug callbacks, and do the measurement there. Interrupts are enabled in those callbacks, so they're fine to do alloc_pages() in. Reported-by: Jisheng Zhang Closes: https://lore.kernel.org/all/mhng-9359993d-6872-4134-83ce-c97debe1cf9a@palmer-ri-x1c9/T/#mae9b8f40016f9df428829d33360144dc5026bcbf Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed") Signed-off-by: Evan Green --- Changes in v3: - Avoid alloc_pages() with interrupts disabled (Sebastien) - Use cpuhp callbacks instead of hooking into smp_callin() (Sebastien). - Move cached answer check in check_unaligned_access() out to the hotplug callback, both to save the work of a useless allocation, and since check_unaligned_access_emulated() resets the answer to unknown. Changes in v2: - Removed new global, used system_state == SYSTEM_RUNNING instead (Jisheng) - Added tags arch/riscv/include/asm/cpufeature.h | 1 - arch/riscv/kernel/cpufeature.c | 96 +++++++++++++++++++++++------ arch/riscv/kernel/smpboot.c | 1 - 3 files changed, 77 insertions(+), 21 deletions(-) diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h index 7f1e46a9d445..69f2cae96f0b 100644 --- a/arch/riscv/include/asm/cpufeature.h +++ b/arch/riscv/include/asm/cpufeature.h @@ -30,7 +30,6 @@ DECLARE_PER_CPU(long, misaligned_access_speed); /* Per-cpu ISA extensions. */ extern struct riscv_isainfo hart_isa[NR_CPUS]; -void check_unaligned_access(int cpu); void riscv_user_isa_enable(void); #ifdef CONFIG_RISCV_MISALIGNED diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 6a01ded615cd..fe59e18dbd5b 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -29,6 +30,7 @@ #define MISALIGNED_ACCESS_JIFFIES_LG2 1 #define MISALIGNED_BUFFER_SIZE 0x4000 +#define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE) #define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80) unsigned long elf_hwcap __read_mostly; @@ -557,30 +559,21 @@ unsigned long riscv_get_elf_hwcap(void) return hwcap; } -void check_unaligned_access(int cpu) +static int check_unaligned_access(void *param) { + int cpu = smp_processor_id(); u64 start_cycles, end_cycles; u64 word_cycles; u64 byte_cycles; int ratio; unsigned long start_jiffies, now; - struct page *page; + struct page *page = param; void *dst; void *src; long speed = RISCV_HWPROBE_MISALIGNED_SLOW; if (check_unaligned_access_emulated(cpu)) - return; - - /* We are already set since the last check */ - if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN) - return; - - page = alloc_pages(GFP_NOWAIT, get_order(MISALIGNED_BUFFER_SIZE)); - if (!page) { - pr_warn("Can't alloc pages to measure memcpy performance"); - return; - } + return 0; /* Make an unaligned destination buffer. */ dst = (void *)((unsigned long)page_address(page) | 0x1); @@ -634,7 +627,7 @@ void check_unaligned_access(int cpu) pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n", cpu); - goto out; + return 0; } if (word_cycles < byte_cycles) @@ -648,19 +641,84 @@ void check_unaligned_access(int cpu) (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow"); per_cpu(misaligned_access_speed, cpu) = speed; + return 0; +} -out: - __free_pages(page, get_order(MISALIGNED_BUFFER_SIZE)); +static void check_unaligned_access_nonboot_cpu(void *param) +{ + unsigned int cpu = smp_processor_id(); + struct page **pages = param; + + if (smp_processor_id() != 0) + check_unaligned_access(pages[cpu]); +} + +static int riscv_online_cpu(unsigned int cpu) +{ + static struct page *buf; + + /* We are already set since the last check */ + if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN) + return 0; + + buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER); + if (!buf) { + pr_warn("Allocation failure, not measuring misaligned performance\n"); + return -ENOMEM; + } + + check_unaligned_access(buf); + __free_pages(buf, MISALIGNED_BUFFER_ORDER); + return 0; } -static int __init check_unaligned_access_boot_cpu(void) +/* Measure unaligned access on all CPUs present at boot in parallel. */ +static int check_unaligned_access_all_cpus(void) { - check_unaligned_access(0); + unsigned int cpu; + unsigned int cpu_count = num_possible_cpus(); + struct page **bufs = kzalloc(cpu_count * sizeof(struct page *), + GFP_KERNEL); + + if (!bufs) { + pr_warn("Allocation failure, not measuring misaligned performance\n"); + return 0; + } + + /* + * Allocate separate buffers for each CPU so there's no fighting over + * cache lines. + */ + for_each_cpu(cpu, cpu_online_mask) { + bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER); + if (!bufs[cpu]) { + pr_warn("Allocation failure, not measuring misaligned performance\n"); + goto out; + } + } + + /* Check everybody except 0, who stays behind to tend jiffies. */ + on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1); + + /* Check core 0. */ + smp_call_on_cpu(0, check_unaligned_access, bufs[0], true); + + /* Setup hotplug callback for any new CPUs that come online. */ + cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online", + riscv_online_cpu, NULL); + +out: unaligned_emulation_finish(); + for_each_cpu(cpu, cpu_online_mask) { + if (bufs[cpu]) + __free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER); + } + + kfree(bufs); return 0; } -arch_initcall(check_unaligned_access_boot_cpu); +arch_initcall(check_unaligned_access_all_cpus); void riscv_user_isa_enable(void) { diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index d69c628c24f4..d162bf339beb 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -247,7 +247,6 @@ asmlinkage __visible void smp_callin(void) riscv_ipi_enable(); numa_add_cpu(curr_cpuid); - check_unaligned_access(curr_cpuid); set_cpu_online(curr_cpuid, 1); if (has_vector()) { -- 2.34.1