Received: by 2002:a05:7412:98c1:b0:fa:551:50a7 with SMTP id kc1csp1884495rdb; Mon, 8 Jan 2024 13:38:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IGSPPWdEnFnmykS+9w+lxgCd1uuDMqNh4NPntJ7478aYIeGfEj6xl7GB7mA1xBUpu4YEYdE X-Received: by 2002:a05:6a20:429e:b0:199:ee23:c1dd with SMTP id o30-20020a056a20429e00b00199ee23c1ddmr156160pzj.10.1704749879784; Mon, 08 Jan 2024 13:37:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704749879; cv=none; d=google.com; s=arc-20160816; b=j/S/uPNby9tEnhtb8dFJ1Y9UiZsp5daFm40MPsjSQEV8Fohj3qA/D+OROwGWsiuGsU ZaEzeeZ90uYb3Tea5baChnGLVOlXCKg4a9uimHHoWypAcRKbeU/ExoALZ1zNfrmMf86t HzDrIzxgPtl7xGt3F/7Aq9iOmwqI8nyP02DT8YFzHi7f1opeJYzGStRAIWYHcvfrn6Oc KWYWdb+VyNvHMIpIrLiijrIwOi7Ind6fGclQWWFte1lqJ4kBNBB6QITXtbIZ17mWCldq aSKVZXRzLUNBYBZ7H8JqWqFPibTwcT9vagGNMUjGVK9rEqEkMl8XoUHp0tB58ZqIB6TT SicA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date:dkim-signature; bh=yJLvvvJMK3asun1jUdYXWo2TZ3pozvY0KGbDySw/SbA=; fh=z/0Fmw8FaAIjO+Kscil2AmTwv7T0rdCT3w+WpSa/jhs=; b=vueCZOOyEJa+tPQH1uTIt1Yt5zxX/OZeJoVIQrzV6o4fg5LTGVKyQ75LmDj5SXKcww Kx9dXuLfdHEo/NoSJdt6DkWhJAnMpF+0GO6+Dymvcq+AJzPxPHbuQBw4Ar6+ujXBKB/d +RXVkxT1Zko89lHLu7hE308390M8Jgw3dmPY24cKcyHlJTirtRHBedvZvFksaUCJLHBv 1loVEJR3kterDUU0DlMe4N+wRMp4aQATNp2ZDgwhYUVLpSgnUtlYux4GYFJOb7UbChii a2iVGZcfgBkzz4lE6fSdoXg2lk9qrY5U20LDWRFX5JKa6hqYbXR0wzwjlbtWKSaEOQbz SrUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=1c7S71uS; spf=pass (google.com: domain of linux-kernel+bounces-20132-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-20132-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id q65-20020a632a44000000b005be0ca9ca31si394493pgq.294.2024.01.08.13.37.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jan 2024 13:37:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-20132-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=1c7S71uS; spf=pass (google.com: domain of linux-kernel+bounces-20132-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-20132-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 6B7AD2848F2 for ; Mon, 8 Jan 2024 21:37:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 419945645B; Mon, 8 Jan 2024 21:37:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b="1c7S71uS" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7400A56455 for ; Mon, 8 Jan 2024 21:37:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1d427518d52so15717685ad.0 for ; Mon, 08 Jan 2024 13:37:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1704749862; x=1705354662; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=yJLvvvJMK3asun1jUdYXWo2TZ3pozvY0KGbDySw/SbA=; b=1c7S71uShVuFivFPSytvYFLDgm7CTpmAN23G/rj17+ZNC3A4qPlgXR4OGCvZHwNCtt bK33QTgb32UeRcwpnrQqqp/QMfu4QiTdb5AwnjVXZXvGYy4yRK/bXAIC2nu9gPY0RnDa MEGm3eOEU6f5VV++r58yTsIXNX9kT3b8x+CQjO2tPfVG0kG81zU7EYlrHp5yToYWA97g 3EphB5KDj4ivSsK4/jjNLZfl2/p0o88n4BSVmgnSFt8IKYqoZLtdF5RIJG3CfvxGVCxb aX3zAQV8CsmH9H3zRWMOJPDum4+P4/5RYLADud5EZ9sOWaJ+iHJGCXQpyZVz5P8tIDzx fmAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704749862; x=1705354662; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yJLvvvJMK3asun1jUdYXWo2TZ3pozvY0KGbDySw/SbA=; b=MyORu19rf+MbauONHBVLCY1sL/54KlVyKlI/9yNHSk+0yKcUpK/Eftb701NoRUtXCV 2rCb5JAf8qUyp496kIAvou9uq54PXtjwcdO8oRqNGED6NbFeYq1MLiFRhBe2JzQQs1p7 opoCxf51Wia+gLT7SC+KVlI3HnfVCNeWGzjVKgBWg95qlZd9BUb8JrpDUsCUOuG2PGmQ R3y5OGkcddQb12Zj5aBSmhPLHeUYSpHOZ2Vt8X3fHrDAxxenEubHhwIOPmmN/5WP2ENU zhW3OVjieASr9i5trwHH12CDkb6Xsma5vLg1174e/NhTf7mTPBLM3IxfQFQSzdTm6WH2 RUbA== X-Gm-Message-State: AOJu0YwGg8bJ3jm+oWxjBkZKVVT7FqiPmGFBCKfi00i0NFM5NiIBoUec trvaYE9SAx1X98tcZ5LbEuBP+mVR3R+XKA== X-Received: by 2002:a17:902:ecc9:b0:1d4:d14b:9ab8 with SMTP id a9-20020a170902ecc900b001d4d14b9ab8mr503559plh.31.1704749861754; Mon, 08 Jan 2024 13:37:41 -0800 (PST) Received: from ghost ([12.44.203.122]) by smtp.gmail.com with ESMTPSA id t5-20020a170902bc4500b001d3e3704d2fsm350829plz.31.2024.01.08.13.37.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jan 2024 13:37:41 -0800 (PST) Date: Mon, 8 Jan 2024 13:37:38 -0800 From: Charlie Jenkins To: Evan Green Cc: Palmer Dabbelt , Conor Dooley , Samuel Holland , David Laight , Xiao Wang , Guo Ren , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, Paul Walmsley , Albert Ou , Arnd Bergmann Subject: Re: [PATCH v14 2/5] riscv: Add static key for misaligned accesses Message-ID: References: <20231227-optimize_checksum-v14-0-ddfd48016566@rivosinc.com> <20231227-optimize_checksum-v14-2-ddfd48016566@rivosinc.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Mon, Jan 08, 2024 at 11:22:34AM -0800, Evan Green wrote: > On Wed, Dec 27, 2023 at 9:38 AM Charlie Jenkins wrote: > > > > Support static branches depending on the value of misaligned accesses. > > This will be used by a later patch in the series. All online cpus must > > be considered "fast" for this static branch to be flipped. > > > > Signed-off-by: Charlie Jenkins > > This is fancier than I would have gone for, I probably would have > punted on heterogeneous hotplug out of laziness for now. However, what > you've done looks smart, in that we'll basically flip the branch if at > any moment all the online CPUs are fast. I've got some nits below, but > won't withhold my review for them (making them optional I suppose :)). > > Reviewed-by: Evan Green > Thanks! > > --- > > arch/riscv/include/asm/cpufeature.h | 2 + > > arch/riscv/kernel/cpufeature.c | 89 +++++++++++++++++++++++++++++++++++-- > > 2 files changed, 87 insertions(+), 4 deletions(-) > > > > diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h > > index a418c3112cd6..7b129e5e2f07 100644 > > --- a/arch/riscv/include/asm/cpufeature.h > > +++ b/arch/riscv/include/asm/cpufeature.h > > @@ -133,4 +133,6 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi > > return __riscv_isa_extension_available(hart_isa[cpu].isa, ext); > > } > > > > +DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key); > > + > > #endif > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c > > index b3785ffc1570..dfd716b93565 100644 > > --- a/arch/riscv/kernel/cpufeature.c > > +++ b/arch/riscv/kernel/cpufeature.c > > @@ -8,8 +8,10 @@ > > > > #include > > #include > > +#include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -44,6 +46,8 @@ struct riscv_isainfo hart_isa[NR_CPUS]; > > /* Performance information */ > > DEFINE_PER_CPU(long, misaligned_access_speed); > > > > +static cpumask_t fast_misaligned_access; > > + > > /** > > * riscv_isa_extension_base() - Get base extension word > > * > > @@ -643,6 +647,16 @@ static int check_unaligned_access(void *param) > > (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow"); > > > > per_cpu(misaligned_access_speed, cpu) = speed; > > + > > + /* > > + * Set the value of fast_misaligned_access of a CPU. These operations > > + * are atomic to avoid race conditions. > > + */ > > + if (speed == RISCV_HWPROBE_MISALIGNED_FAST) > > + cpumask_set_cpu(cpu, &fast_misaligned_access); > > + else > > + cpumask_clear_cpu(cpu, &fast_misaligned_access); > > + > > return 0; > > } > > > > @@ -655,13 +669,70 @@ static void check_unaligned_access_nonboot_cpu(void *param) > > check_unaligned_access(pages[cpu]); > > } > > > > +DEFINE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key); > > + > > +static int exclude_set_unaligned_access_static_branches(int cpu) > > +{ > > + /* > > + * Same as set_unaligned_access_static_branches, except excludes the > > + * given CPU from the result. When a CPU is hotplugged into an offline > > + * state, this function is called before the CPU is set to offline in > > + * the cpumask, and thus the CPU needs to be explicitly excluded. > > + */ > > + > > + cpumask_t online_fast_misaligned_access; > > + > > + cpumask_and(&online_fast_misaligned_access, &fast_misaligned_access, cpu_online_mask); > > + cpumask_clear_cpu(cpu, &online_fast_misaligned_access); > > + > > + if (cpumask_weight(&online_fast_misaligned_access) == (num_online_cpus() - 1)) > > + static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key); > > + else > > + static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key); > > + > > + return 0; > > +} > > A minor nit: the function above and below are looking a little > copy/pasty, and lead to multiple spots where the static branch gets > changed. You could make a third function that actually does the > setting with parameters, then these two could call it in different > ways. The return types also don't need to be int, since you always > return 0. Something like: > > static void modify_unaligned_access_branches(cpumask_t *mask, int weight) > { > if (cpumask_weight(mask) == weight) { > static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key); > } else { > static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key); > } > } > > static void set_unaligned_access_branches(void) > { > cpumask_t fast_and_online; > > cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask); > modify_unaligned_access_branches(&fast_and_online, num_online_cpus()); > } > > static void set_unaligned_access_branches_except_cpu(unsigned int cpu) > { > cpumask_t fast_except_me; > > cpumask_and(&online_fast_misaligned_access, > &fast_misaligned_access, cpu_online_mask); > cpumask_clear_cpu(cpu, &fast_except_me); > modify_unaligned_access_branches(&fast_except_me, > num_online_cpus() - 1); > } > Great suggestions, I will apply these changes and send out a new version. - Charlie > > + > > +static int set_unaligned_access_static_branches(void) > > +{ > > + /* > > + * This will be called after check_unaligned_access_all_cpus so the > > + * result of unaligned access speed for all CPUs will be available. > > + * > > + * To avoid the number of online cpus changing between reading > > + * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be > > + * held before calling this function. > > + */ > > + cpumask_t online_fast_misaligned_access; > > + > > + cpumask_and(&online_fast_misaligned_access, &fast_misaligned_access, cpu_online_mask); > > + > > + if (cpumask_weight(&online_fast_misaligned_access) == num_online_cpus()) > > + static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key); > > + else > > + static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key); > > + > > + return 0; > > +} > > + > > +static int lock_and_set_unaligned_access_static_branch(void) > > +{ > > + cpus_read_lock(); > > + set_unaligned_access_static_branches(); > > + cpus_read_unlock(); > > + > > + return 0; > > +} > > + > > +arch_initcall_sync(lock_and_set_unaligned_access_static_branch); > > + > > static int riscv_online_cpu(unsigned int cpu) > > { > > static struct page *buf; > > > > /* We are already set since the last check */ > > if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN) > > - return 0; > > + goto exit; > > > > buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER); > > if (!buf) { > > @@ -671,7 +742,14 @@ static int riscv_online_cpu(unsigned int cpu) > > > > check_unaligned_access(buf); > > __free_pages(buf, MISALIGNED_BUFFER_ORDER); > > - return 0; > > + > > +exit: > > + return set_unaligned_access_static_branches(); > > +} > > + > > +static int riscv_offline_cpu(unsigned int cpu) > > +{ > > + return exclude_set_unaligned_access_static_branches(cpu); > > } > > > > /* Measure unaligned access on all CPUs present at boot in parallel. */ > > @@ -705,9 +783,12 @@ static int check_unaligned_access_all_cpus(void) > > /* Check core 0. */ > > smp_call_on_cpu(0, check_unaligned_access, bufs[0], true); > > > > - /* Setup hotplug callback for any new CPUs that come online. */ > > + /* > > + * Setup hotplug callbacks for any new CPUs that come online or go > > + * offline. > > + */ > > cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online", > > - riscv_online_cpu, NULL); > > + riscv_online_cpu, riscv_offline_cpu); > > > > out: > > unaligned_emulation_finish(); > > > > -- > > 2.43.0 > >