Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking. Tested on QEMU, this series
allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.
This patch takes heavy use of the Zbb extension using alternatives
patching.
To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT.
I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.
ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.
Relies on https://lore.kernel.org/lkml/[email protected]/
---
The algorithm proposed to replace the default csum_fold can be seen to
compute the same result by running all 2^32 possible inputs.
static inline unsigned int ror32(unsigned int word, unsigned int shift)
{
return (word >> (shift & 31)) | (word << ((-shift) & 31));
}
unsigned short csum_fold(unsigned int csum)
{
unsigned int sum = csum;
sum = (sum & 0xffff) + (sum >> 16);
sum = (sum & 0xffff) + (sum >> 16);
return ~sum;
}
unsigned short csum_fold_arc(unsigned int csum)
{
return ((~csum - ror32(csum, 16)) >> 16);
}
int main()
{
unsigned int start = 0x0;
do {
if (csum_fold(start) != csum_fold_arc(start)) {
printf("Not the same %u\n", start);
return -1;
}
start += 1;
} while(start != 0x0);
printf("The same\n");
return 0;
}
Cc: Paul Walmsley <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Arnd Bergmann <[email protected]>
To: Charlie Jenkins <[email protected]>
To: Palmer Dabbelt <[email protected]>
To: Conor Dooley <[email protected]>
To: Samuel Holland <[email protected]>
To: David Laight <[email protected]>
To: Xiao Wang <[email protected]>
To: Evan Green <[email protected]>
To: Guo Ren <[email protected]>
To: [email protected]
To: [email protected]
To: [email protected]
Signed-off-by: Charlie Jenkins <[email protected]>
---
Changes in v14:
- Update misaligned static branch when CPUs are hotplugged (Guo)
- Leave off Evan's reviewed-by on patch 2 since it was completely
re-written
- Link to v13: https://lore.kernel.org/r/[email protected]
Changes in v13:
- Move cast from patch 4 to patch 3
- Link to v12: https://lore.kernel.org/r/[email protected]
Changes in v12:
- Rebase onto 6.7-rc5
- Add performance stats in the cover letter
- Link to v11: https://lore.kernel.org/r/[email protected]
Changes in v11:
- Extensive modifications to comply to sparse
- Organize include statements (Xiao)
- Add csum_ipv6_magic to commit message (Xiao)
- Remove extraneous len statement (Xiao)
- Add kasan_check_read call (Xiao)
- Improve comment field checksum.h (Xiao)
- Consolidate "buff" and "len" into one parameter "end" (Xiao)
- Link to v10: https://lore.kernel.org/r/[email protected]
Changes in v10:
- Move tests that were riscv-specific to be arch agnostic (Arnd)
- Link to v9: https://lore.kernel.org/r/[email protected]
Changes in v9:
- Use ror64 (Xiao)
- Move do_csum and csum_ipv6_magic headers to patch 4 (Xiao)
- Remove word "IP" from checksum headers (Xiao)
- Swap to using ifndef CONFIG_32BIT instead of ifdef CONFIG_64BIT (Xiao)
- Run no alignment code when buff is aligned (Xiao)
- Consolidate two do_csum implementations overlap into do_csum_common
- Link to v8: https://lore.kernel.org/r/[email protected]
Changes in v8:
- Speedups of 12% without Zbb and 21% with Zbb when cpu supports fast
misaligned accesses for do_csum
- Various formatting updates
- Patch now relies on https://lore.kernel.org/lkml/[email protected]/
- Link to v7: https://lore.kernel.org/r/[email protected]
Changes in v7:
- Included linux/bitops.h in asm-generic/checksum.h to use ror (Conor)
- Optimized loop in do_csum (David)
- Used ror instead of shifting (David)
- Unfortunately had to reintroduce ifdefs because gcc is not smart
enough to not throw warnings on code that will never execute
- Use ifdef instead of IS_ENABLED on __LITTLE_ENDIAN because IS_ENABLED
does not work on that
- Only optimize for zbb when alternatives is enabled in do_csum
- Link to v6: https://lore.kernel.org/r/[email protected]
Changes in v6:
- Fix accuracy of commit message for csum_fold
- Fix indentation
- Link to v5: https://lore.kernel.org/r/[email protected]
Changes in v5:
- Drop vector patches
- Check ZBB enabled before doing any ZBB code (Conor)
- Check endianness in IS_ENABLED
- Revert to the simpler non-tree based version of ipv6_csum_magic since
David pointed out that the tree based version is not better.
- Link to v4: https://lore.kernel.org/r/[email protected]
Changes in v4:
- Suggestion by David Laight to use an improved checksum used in
arch/arc.
- Eliminates zero-extension on rv32, but not on rv64.
- Reduces data dependency which should improve execution speed on
rv32 and rv64
- Still passes CHECKSUM_KUNIT and RISCV_CHECKSUM_KUNIT on rv32 and
rv64 with and without zbb.
- Link to v3: https://lore.kernel.org/r/[email protected]
Changes in v3:
- Use riscv_has_extension_likely and has_vector where possible (Conor)
- Reduce ifdefs by using IS_ENABLED where possible (Conor)
- Use kernel_vector_begin in the vector code (Samuel)
- Link to v2: https://lore.kernel.org/r/[email protected]
Changes in v2:
- After more benchmarking, rework functions to improve performance.
- Remove tests that overlapped with the already existing checksum
tests and make tests more extensive.
- Use alternatives to activate code with Zbb and vector extensions
- Link to v1: https://lore.kernel.org/r/[email protected]
---
Charlie Jenkins (5):
asm-generic: Improve csum_fold
riscv: Add static key for misaligned accesses
riscv: Add checksum header
riscv: Add checksum library
kunit: Add tests for csum_ipv6_magic and ip_fast_csum
arch/riscv/include/asm/checksum.h | 93 ++++++++++
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/kernel/cpufeature.c | 89 +++++++++-
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/csum.c | 326 ++++++++++++++++++++++++++++++++++++
include/asm-generic/checksum.h | 6 +-
lib/checksum_kunit.c | 284 ++++++++++++++++++++++++++++++-
7 files changed, 793 insertions(+), 8 deletions(-)
---
base-commit: a39b6ac3781d46ba18193c9dbb2110f31e9bffe9
change-id: 20230804-optimize_checksum-db145288ac21
--
- Charlie
This csum_fold implementation introduced into arch/arc by Vineet Gupta
is better than the default implementation on at least arc, x86, and
riscv. Using GCC trunk and compiling non-inlined version, this
implementation has 41.6667%, 25% fewer instructions on riscv64, x86-64
respectively with -O3 optimization. Most implmentations override this
default in asm, but this should be more performant than all of those
other implementations except for arm which has barrel shifting and
sparc32 which has a carry flag.
Signed-off-by: Charlie Jenkins <[email protected]>
Reviewed-by: David Laight <[email protected]>
---
include/asm-generic/checksum.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/asm-generic/checksum.h b/include/asm-generic/checksum.h
index 43e18db89c14..ad928cce268b 100644
--- a/include/asm-generic/checksum.h
+++ b/include/asm-generic/checksum.h
@@ -2,6 +2,8 @@
#ifndef __ASM_GENERIC_CHECKSUM_H
#define __ASM_GENERIC_CHECKSUM_H
+#include <linux/bitops.h>
+
/*
* computes the checksum of a memory block at buff, length len,
* and adds in "sum" (32-bit)
@@ -31,9 +33,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl);
static inline __sum16 csum_fold(__wsum csum)
{
u32 sum = (__force u32)csum;
- sum = (sum & 0xffff) + (sum >> 16);
- sum = (sum & 0xffff) + (sum >> 16);
- return (__force __sum16)~sum;
+ return (__force __sum16)((~sum - ror32(sum, 16)) >> 16);
}
#endif
--
2.43.0
Support static branches depending on the value of misaligned accesses.
This will be used by a later patch in the series. All online cpus must
be considered "fast" for this static branch to be flipped.
Signed-off-by: Charlie Jenkins <[email protected]>
---
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/kernel/cpufeature.c | 89 +++++++++++++++++++++++++++++++++++--
2 files changed, 87 insertions(+), 4 deletions(-)
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index a418c3112cd6..7b129e5e2f07 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -133,4 +133,6 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
return __riscv_isa_extension_available(hart_isa[cpu].isa, ext);
}
+DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+
#endif
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index b3785ffc1570..dfd716b93565 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -8,8 +8,10 @@
#include <linux/acpi.h>
#include <linux/bitmap.h>
+#include <linux/cpu.h>
#include <linux/cpuhotplug.h>
#include <linux/ctype.h>
+#include <linux/jump_label.h>
#include <linux/log2.h>
#include <linux/memory.h>
#include <linux/module.h>
@@ -44,6 +46,8 @@ struct riscv_isainfo hart_isa[NR_CPUS];
/* Performance information */
DEFINE_PER_CPU(long, misaligned_access_speed);
+static cpumask_t fast_misaligned_access;
+
/**
* riscv_isa_extension_base() - Get base extension word
*
@@ -643,6 +647,16 @@ static int check_unaligned_access(void *param)
(speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");
per_cpu(misaligned_access_speed, cpu) = speed;
+
+ /*
+ * Set the value of fast_misaligned_access of a CPU. These operations
+ * are atomic to avoid race conditions.
+ */
+ if (speed == RISCV_HWPROBE_MISALIGNED_FAST)
+ cpumask_set_cpu(cpu, &fast_misaligned_access);
+ else
+ cpumask_clear_cpu(cpu, &fast_misaligned_access);
+
return 0;
}
@@ -655,13 +669,70 @@ static void check_unaligned_access_nonboot_cpu(void *param)
check_unaligned_access(pages[cpu]);
}
+DEFINE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+
+static int exclude_set_unaligned_access_static_branches(int cpu)
+{
+ /*
+ * Same as set_unaligned_access_static_branches, except excludes the
+ * given CPU from the result. When a CPU is hotplugged into an offline
+ * state, this function is called before the CPU is set to offline in
+ * the cpumask, and thus the CPU needs to be explicitly excluded.
+ */
+
+ cpumask_t online_fast_misaligned_access;
+
+ cpumask_and(&online_fast_misaligned_access, &fast_misaligned_access, cpu_online_mask);
+ cpumask_clear_cpu(cpu, &online_fast_misaligned_access);
+
+ if (cpumask_weight(&online_fast_misaligned_access) == (num_online_cpus() - 1))
+ static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
+ else
+ static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
+
+ return 0;
+}
+
+static int set_unaligned_access_static_branches(void)
+{
+ /*
+ * This will be called after check_unaligned_access_all_cpus so the
+ * result of unaligned access speed for all CPUs will be available.
+ *
+ * To avoid the number of online cpus changing between reading
+ * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be
+ * held before calling this function.
+ */
+ cpumask_t online_fast_misaligned_access;
+
+ cpumask_and(&online_fast_misaligned_access, &fast_misaligned_access, cpu_online_mask);
+
+ if (cpumask_weight(&online_fast_misaligned_access) == num_online_cpus())
+ static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
+ else
+ static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
+
+ return 0;
+}
+
+static int lock_and_set_unaligned_access_static_branch(void)
+{
+ cpus_read_lock();
+ set_unaligned_access_static_branches();
+ cpus_read_unlock();
+
+ return 0;
+}
+
+arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
+
static int riscv_online_cpu(unsigned int cpu)
{
static struct page *buf;
/* We are already set since the last check */
if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
- return 0;
+ goto exit;
buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
if (!buf) {
@@ -671,7 +742,14 @@ static int riscv_online_cpu(unsigned int cpu)
check_unaligned_access(buf);
__free_pages(buf, MISALIGNED_BUFFER_ORDER);
- return 0;
+
+exit:
+ return set_unaligned_access_static_branches();
+}
+
+static int riscv_offline_cpu(unsigned int cpu)
+{
+ return exclude_set_unaligned_access_static_branches(cpu);
}
/* Measure unaligned access on all CPUs present at boot in parallel. */
@@ -705,9 +783,12 @@ static int check_unaligned_access_all_cpus(void)
/* Check core 0. */
smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
- /* Setup hotplug callback for any new CPUs that come online. */
+ /*
+ * Setup hotplug callbacks for any new CPUs that come online or go
+ * offline.
+ */
cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
- riscv_online_cpu, NULL);
+ riscv_online_cpu, riscv_offline_cpu);
out:
unaligned_emulation_finish();
--
2.43.0
Provide checksum algorithms that have been designed to leverage riscv
instructions such as rotate. In 64-bit, can take advantage of the larger
register to avoid some overflow checking.
Signed-off-by: Charlie Jenkins <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Reviewed-by: Xiao Wang <[email protected]>
---
arch/riscv/include/asm/checksum.h | 82 +++++++++++++++++++++++++++++++++++++++
1 file changed, 82 insertions(+)
diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
new file mode 100644
index 000000000000..5a810126aac7
--- /dev/null
+++ b/arch/riscv/include/asm/checksum.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Checksum routines
+ *
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#ifndef __ASM_RISCV_CHECKSUM_H
+#define __ASM_RISCV_CHECKSUM_H
+
+#include <linux/in6.h>
+#include <linux/uaccess.h>
+
+#define ip_fast_csum ip_fast_csum
+
+/* Define riscv versions of functions before importing asm-generic/checksum.h */
+#include <asm-generic/checksum.h>
+
+/**
+ * Quickly compute an IP checksum with the assumption that IPv4 headers will
+ * always be in multiples of 32-bits, and have an ihl of at least 5.
+ *
+ * @ihl: the number of 32 bit segments and must be greater than or equal to 5.
+ * @iph: assumed to be word aligned given that NET_IP_ALIGN is set to 2 on
+ * riscv, defining IP headers to be aligned.
+ */
+static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
+{
+ unsigned long csum = 0;
+ int pos = 0;
+
+ do {
+ csum += ((const unsigned int *)iph)[pos];
+ if (IS_ENABLED(CONFIG_32BIT))
+ csum += csum < ((const unsigned int *)iph)[pos];
+ } while (++pos < ihl);
+
+ /*
+ * ZBB only saves three instructions on 32-bit and five on 64-bit so not
+ * worth checking if supported without Alternatives.
+ */
+ if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+ IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+ unsigned long fold_temp;
+
+ asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ RISCV_ISA_EXT_ZBB, 1)
+ :
+ :
+ :
+ : no_zbb);
+
+ if (IS_ENABLED(CONFIG_32BIT)) {
+ asm(".option push \n\
+ .option arch,+zbb \n\
+ not %[fold_temp], %[csum] \n\
+ rori %[csum], %[csum], 16 \n\
+ sub %[csum], %[fold_temp], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
+ } else {
+ asm(".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 32 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ srli %[csum], %[csum], 32 \n\
+ not %[fold_temp], %[csum] \n\
+ roriw %[csum], %[csum], 16 \n\
+ subw %[csum], %[fold_temp], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
+ }
+ return (__force __sum16)(csum >> 16);
+ }
+no_zbb:
+#ifndef CONFIG_32BIT
+ csum += ror64(csum, 32);
+ csum >>= 32;
+#endif
+ return csum_fold((__force __wsum)csum);
+}
+
+#endif /* __ASM_RISCV_CHECKSUM_H */
--
2.43.0
Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
will load from the buffer in groups of 32 bits, and when compiled for
64-bit will load in groups of 64 bits.
Additionally provide riscv optimized implementation of csum_ipv6_magic.
Signed-off-by: Charlie Jenkins <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Reviewed-by: Xiao Wang <[email protected]>
---
arch/riscv/include/asm/checksum.h | 11 ++
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/csum.c | 326 ++++++++++++++++++++++++++++++++++++++
3 files changed, 338 insertions(+)
diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
index 5a810126aac7..a5b60b54b101 100644
--- a/arch/riscv/include/asm/checksum.h
+++ b/arch/riscv/include/asm/checksum.h
@@ -12,6 +12,17 @@
#define ip_fast_csum ip_fast_csum
+extern unsigned int do_csum(const unsigned char *buff, int len);
+#define do_csum do_csum
+
+/* Default version is sufficient for 32 bit */
+#ifndef CONFIG_32BIT
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ __u32 len, __u8 proto, __wsum sum);
+#endif
+
/* Define riscv versions of functions before importing asm-generic/checksum.h */
#include <asm-generic/checksum.h>
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 26cb2502ecf8..2aa1a4ad361f 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -6,6 +6,7 @@ lib-y += memmove.o
lib-y += strcmp.o
lib-y += strlen.o
lib-y += strncmp.o
+lib-y += csum.o
lib-$(CONFIG_MMU) += uaccess.o
lib-$(CONFIG_64BIT) += tishift.o
lib-$(CONFIG_RISCV_ISA_ZICBOZ) += clear_page.o
diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
new file mode 100644
index 000000000000..06ce8e7250d9
--- /dev/null
+++ b/arch/riscv/lib/csum.c
@@ -0,0 +1,326 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Checksum library
+ *
+ * Influenced by arch/arm64/lib/csum.c
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/jump_label.h>
+#include <linux/kasan-checks.h>
+#include <linux/kernel.h>
+
+#include <asm/cpufeature.h>
+
+#include <net/checksum.h>
+
+/* Default version is sufficient for 32 bit */
+#ifndef CONFIG_32BIT
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ __u32 len, __u8 proto, __wsum csum)
+{
+ unsigned int ulen, uproto;
+ unsigned long sum = (__force unsigned long)csum;
+
+ sum += (__force unsigned long)saddr->s6_addr32[0];
+ sum += (__force unsigned long)saddr->s6_addr32[1];
+ sum += (__force unsigned long)saddr->s6_addr32[2];
+ sum += (__force unsigned long)saddr->s6_addr32[3];
+
+ sum += (__force unsigned long)daddr->s6_addr32[0];
+ sum += (__force unsigned long)daddr->s6_addr32[1];
+ sum += (__force unsigned long)daddr->s6_addr32[2];
+ sum += (__force unsigned long)daddr->s6_addr32[3];
+
+ ulen = (__force unsigned int)htonl((unsigned int)len);
+ sum += ulen;
+
+ uproto = (__force unsigned int)htonl(proto);
+ sum += uproto;
+
+ /*
+ * Zbb support saves 4 instructions, so not worth checking without
+ * alternatives if supported
+ */
+ if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+ IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+ unsigned long fold_temp;
+
+ /*
+ * Zbb is likely available when the kernel is compiled with Zbb
+ * support, so nop when Zbb is available and jump when Zbb is
+ * not available.
+ */
+ asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ RISCV_ISA_EXT_ZBB, 1)
+ :
+ :
+ :
+ : no_zbb);
+ asm(".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[sum], 32 \n\
+ add %[sum], %[fold_temp], %[sum] \n\
+ srli %[sum], %[sum], 32 \n\
+ not %[fold_temp], %[sum] \n\
+ roriw %[sum], %[sum], 16 \n\
+ subw %[sum], %[fold_temp], %[sum] \n\
+ .option pop"
+ : [sum] "+r" (sum), [fold_temp] "=&r" (fold_temp));
+ return (__force __sum16)(sum >> 16);
+ }
+no_zbb:
+ sum += ror64(sum, 32);
+ sum >>= 32;
+ return csum_fold((__force __wsum)sum);
+}
+EXPORT_SYMBOL(csum_ipv6_magic);
+#endif /* !CONFIG_32BIT */
+
+#ifdef CONFIG_32BIT
+#define OFFSET_MASK 3
+#elif CONFIG_64BIT
+#define OFFSET_MASK 7
+#endif
+
+static inline __no_sanitize_address unsigned long
+do_csum_common(const unsigned long *ptr, const unsigned long *end,
+ unsigned long data)
+{
+ unsigned int shift;
+ unsigned long csum = 0, carry = 0;
+
+ /*
+ * Do 32-bit reads on RV32 and 64-bit reads otherwise. This should be
+ * faster than doing 32-bit reads on architectures that support larger
+ * reads.
+ */
+ while (ptr < end) {
+ csum += data;
+ carry += csum < data;
+ data = *(ptr++);
+ }
+
+ /*
+ * Perform alignment (and over-read) bytes on the tail if any bytes
+ * leftover.
+ */
+ shift = ((long)ptr - (long)end) * 8;
+#ifdef __LITTLE_ENDIAN
+ data = (data << shift) >> shift;
+#else
+ data = (data >> shift) << shift;
+#endif
+ csum += data;
+ carry += csum < data;
+ csum += carry;
+ csum += csum < carry;
+
+ return csum;
+}
+
+/*
+ * Algorithm accounts for buff being misaligned.
+ * If buff is not aligned, will over-read bytes but not use the bytes that it
+ * shouldn't. The same thing will occur on the tail-end of the read.
+ */
+static inline __no_sanitize_address unsigned int
+do_csum_with_alignment(const unsigned char *buff, int len)
+{
+ unsigned int offset, shift;
+ unsigned long csum, data;
+ const unsigned long *ptr, *end;
+
+ /*
+ * Align address to closest word (double word on rv64) that comes before
+ * buff. This should always be in the same page and cache line.
+ * Directly call KASAN with the alignment we will be using.
+ */
+ offset = (unsigned long)buff & OFFSET_MASK;
+ kasan_check_read(buff, len);
+ ptr = (const unsigned long *)(buff - offset);
+
+ /*
+ * Clear the most significant bytes that were over-read if buff was not
+ * aligned.
+ */
+ shift = offset * 8;
+ data = *(ptr++);
+#ifdef __LITTLE_ENDIAN
+ data = (data >> shift) << shift;
+#else
+ data = (data << shift) >> shift;
+#endif
+ end = (const unsigned long *)(buff + len);
+ csum = do_csum_common(ptr, end, data);
+
+ /*
+ * Zbb support saves 6 instructions, so not worth checking without
+ * alternatives if supported
+ */
+ if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+ IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+ unsigned long fold_temp;
+
+ /*
+ * Zbb is likely available when the kernel is compiled with Zbb
+ * support, so nop when Zbb is available and jump when Zbb is
+ * not available.
+ */
+ asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ RISCV_ISA_EXT_ZBB, 1)
+ :
+ :
+ :
+ : no_zbb);
+
+#ifdef CONFIG_32BIT
+ asm_volatile_goto(".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 16 \n\
+ andi %[offset], %[offset], 1 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ beq %[offset], zero, %l[end] \n\
+ rev8 %[csum], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+ : [offset] "r" (offset)
+ :
+ : end);
+
+ return (unsigned short)csum;
+#else /* !CONFIG_32BIT */
+ asm_volatile_goto(".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 32 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ srli %[csum], %[csum], 32 \n\
+ roriw %[fold_temp], %[csum], 16 \n\
+ addw %[csum], %[fold_temp], %[csum] \n\
+ andi %[offset], %[offset], 1 \n\
+ beq %[offset], zero, %l[end] \n\
+ rev8 %[csum], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+ : [offset] "r" (offset)
+ :
+ : end);
+
+ return (csum << 16) >> 48;
+#endif /* !CONFIG_32BIT */
+end:
+ return csum >> 16;
+ }
+no_zbb:
+#ifndef CONFIG_32BIT
+ csum += ror64(csum, 32);
+ csum >>= 32;
+#endif
+ csum = (u32)csum + ror32((u32)csum, 16);
+ if (offset & 1)
+ return (u16)swab32(csum);
+ return csum >> 16;
+}
+
+/*
+ * Does not perform alignment, should only be used if machine has fast
+ * misaligned accesses, or when buff is known to be aligned.
+ */
+static inline __no_sanitize_address unsigned int
+do_csum_no_alignment(const unsigned char *buff, int len)
+{
+ unsigned long csum, data;
+ const unsigned long *ptr, *end;
+
+ ptr = (const unsigned long *)(buff);
+ data = *(ptr++);
+
+ kasan_check_read(buff, len);
+
+ end = (const unsigned long *)(buff + len);
+ csum = do_csum_common(ptr, end, data);
+
+ /*
+ * Zbb support saves 6 instructions, so not worth checking without
+ * alternatives if supported
+ */
+ if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+ IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+ unsigned long fold_temp;
+
+ /*
+ * Zbb is likely available when the kernel is compiled with Zbb
+ * support, so nop when Zbb is available and jump when Zbb is
+ * not available.
+ */
+ asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ RISCV_ISA_EXT_ZBB, 1)
+ :
+ :
+ :
+ : no_zbb);
+
+#ifdef CONFIG_32BIT
+ asm (".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 16 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+ :
+ : );
+
+#else /* !CONFIG_32BIT */
+ asm (".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 32 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ srli %[csum], %[csum], 32 \n\
+ roriw %[fold_temp], %[csum], 16 \n\
+ addw %[csum], %[fold_temp], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+ :
+ : );
+#endif /* !CONFIG_32BIT */
+ return csum >> 16;
+ }
+no_zbb:
+#ifndef CONFIG_32BIT
+ csum += ror64(csum, 32);
+ csum >>= 32;
+#endif
+ csum = (u32)csum + ror32((u32)csum, 16);
+ return csum >> 16;
+}
+
+/*
+ * Perform a checksum on an arbitrary memory address.
+ * Will do a light-weight address alignment if buff is misaligned, unless
+ * cpu supports fast misaligned accesses.
+ */
+unsigned int do_csum(const unsigned char *buff, int len)
+{
+ if (unlikely(len <= 0))
+ return 0;
+
+ /*
+ * Significant performance gains can be seen by not doing alignment
+ * on machines with fast misaligned accesses.
+ *
+ * There is some duplicate code between the "with_alignment" and
+ * "no_alignment" implmentations, but the overlap is too awkward to be
+ * able to fit in one function without introducing multiple static
+ * branches. The largest chunk of overlap was delegated into the
+ * do_csum_common function.
+ */
+ if (static_branch_likely(&fast_misaligned_access_speed_key))
+ return do_csum_no_alignment(buff, len);
+
+ if (((unsigned long)buff & OFFSET_MASK) == 0)
+ return do_csum_no_alignment(buff, len);
+
+ return do_csum_with_alignment(buff, len);
+}
--
2.43.0
Supplement existing checksum tests with tests for csum_ipv6_magic and
ip_fast_csum.
Signed-off-by: Charlie Jenkins <[email protected]>
---
lib/checksum_kunit.c | 284 ++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 283 insertions(+), 1 deletion(-)
diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 0eed92b77ba3..af3e5ca4e170 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -1,15 +1,21 @@
// SPDX-License-Identifier: GPL-2.0+
/*
- * Test cases csum_partial and csum_fold
+ * Test cases csum_partial, csum_fold, ip_fast_csum, csum_ipv6_magic
*/
#include <kunit/test.h>
#include <asm/checksum.h>
+#include <net/ip6_checksum.h>
#define MAX_LEN 512
#define MAX_ALIGN 64
#define TEST_BUFLEN (MAX_LEN + MAX_ALIGN)
+#define IPv4_MIN_WORDS 5
+#define IPv4_MAX_WORDS 15
+#define NUM_IPv6_TESTS 200
+#define NUM_IP_FAST_CSUM_TESTS 181
+
/* Values for a little endian CPU. Byte swap each half on big endian CPU. */
static const u32 random_init_sum = 0x2847aab;
static const u8 random_buf[] = {
@@ -209,6 +215,237 @@ static const u32 init_sums_no_overflow[] = {
0xffff0000, 0xfffffffb,
};
+static const __sum16 expected_csum_ipv6_magic[] = {
+ 0x18d4, 0x3085, 0x2e4b, 0xd9f4, 0xbdc8, 0x78f, 0x1034, 0x8422, 0x6fc0,
+ 0xd2f6, 0xbeb5, 0x9d3, 0x7e2a, 0x312e, 0x778e, 0xc1bb, 0x7cf2, 0x9d1e,
+ 0xca21, 0xf3ff, 0x7569, 0xb02e, 0xca86, 0x7e76, 0x4539, 0x45e3, 0xf28d,
+ 0xdf81, 0x8fd5, 0x3b5d, 0x8324, 0xf471, 0x83be, 0x1daf, 0x8c46, 0xe682,
+ 0xd1fb, 0x6b2e, 0xe687, 0x2a33, 0x4833, 0x2d67, 0x660f, 0x2e79, 0xd65e,
+ 0x6b62, 0x6672, 0x5dbd, 0x8680, 0xbaa5, 0x2229, 0x2125, 0x2d01, 0x1cc0,
+ 0x6d36, 0x33c0, 0xee36, 0xd832, 0x9820, 0x8a31, 0x53c5, 0x2e2, 0xdb0e,
+ 0x49ed, 0x17a7, 0x77a0, 0xd72e, 0x3d72, 0x7dc8, 0x5b17, 0xf55d, 0xa4d9,
+ 0x1446, 0x5d56, 0x6b2e, 0x69a5, 0xadb6, 0xff2a, 0x92e, 0xe044, 0x3402,
+ 0xbb60, 0xec7f, 0xe7e6, 0x1986, 0x32f4, 0x8f8, 0x5e00, 0x47c6, 0x3059,
+ 0x3969, 0xe957, 0x4388, 0x2854, 0x3334, 0xea71, 0xa6de, 0x33f9, 0x83fc,
+ 0x37b4, 0x5531, 0x3404, 0x1010, 0xed30, 0x610a, 0xc95, 0x9aed, 0x6ff,
+ 0x5136, 0x2741, 0x660e, 0x8b80, 0xf71, 0xa263, 0x88af, 0x7a73, 0x3c37,
+ 0x1908, 0x6db5, 0x2e92, 0x1cd2, 0x70c8, 0xee16, 0xe80, 0xcd55, 0x6e6,
+ 0x6434, 0x127, 0x655d, 0x2ea0, 0xb4f4, 0xdc20, 0x5671, 0xe462, 0xe52b,
+ 0xdb44, 0x3589, 0xc48f, 0xe60b, 0xd2d2, 0x66ad, 0x498, 0x436, 0xb917,
+ 0xf0ca, 0x1a6e, 0x1cb7, 0xbf61, 0x2870, 0xc7e8, 0x5b30, 0xe4a5, 0x168,
+ 0xadfc, 0xd035, 0xe690, 0xe283, 0xfb27, 0xe4ad, 0xb1a5, 0xf2d5, 0xc4b6,
+ 0x8a30, 0xd7d5, 0x7df9, 0x91d5, 0x63ed, 0x2d21, 0x312b, 0xab19, 0xa632,
+ 0x8d2e, 0xef06, 0x57b9, 0xc373, 0xbd1f, 0xa41f, 0x8444, 0x9975, 0x90cb,
+ 0xc49c, 0xe965, 0x4eff, 0x5a, 0xef6d, 0xe81a, 0xe260, 0x853a, 0xff7a,
+ 0x99aa, 0xb06b, 0xee19, 0xcc2c, 0xf34c, 0x7c49, 0xdac3, 0xa71e, 0xc988,
+ 0x3845, 0x1014
+};
+
+static const __sum16 expected_fast_csum[] = {
+ 0xda83, 0x45da, 0x4f46, 0x4e4f, 0x34e, 0xe902, 0xa5e9, 0x87a5, 0x7187,
+ 0x5671, 0xf556, 0x6df5, 0x816d, 0x8f81, 0xbb8f, 0xfbba, 0x5afb, 0xbe5a,
+ 0xedbe, 0xabee, 0x6aac, 0xe6b, 0xea0d, 0x67ea, 0x7e68, 0x8a7e, 0x6f8a,
+ 0x3a70, 0x9f3a, 0xe89e, 0x75e8, 0x7976, 0xfa79, 0x2cfa, 0x3c2c, 0x463c,
+ 0x7146, 0x7a71, 0x547a, 0xfd53, 0x99fc, 0xb699, 0x92b6, 0xdb91, 0xe8da,
+ 0x5fe9, 0x1e60, 0xae1d, 0x39ae, 0xf439, 0xa1f4, 0xdda1, 0xede, 0x790f,
+ 0x579, 0x1206, 0x9012, 0x2490, 0xd224, 0x5cd2, 0xa65d, 0xca7, 0x220d,
+ 0xf922, 0xbf9, 0x920b, 0x1b92, 0x361c, 0x2e36, 0x4d2e, 0x24d, 0x2,
+ 0xcfff, 0x90cf, 0xa591, 0x93a5, 0x7993, 0x9579, 0xc894, 0x50c8, 0x5f50,
+ 0xd55e, 0xcad5, 0xf3c9, 0x8f4, 0x4409, 0x5043, 0x5b50, 0x55b, 0x2205,
+ 0x1e22, 0x801e, 0x3780, 0xe137, 0x7ee0, 0xf67d, 0x3cf6, 0xa53c, 0x2ea5,
+ 0x472e, 0x5147, 0xcf51, 0x1bcf, 0x951c, 0x1e95, 0xc71e, 0xe4c7, 0xc3e4,
+ 0x3dc3, 0xee3d, 0xa4ed, 0xf9a4, 0xcbf8, 0x75cb, 0xb375, 0x50b4, 0x3551,
+ 0xf835, 0x19f8, 0x8c1a, 0x538c, 0xad52, 0xa3ac, 0xb0a3, 0x5cb0, 0x6c5c,
+ 0x5b6c, 0xc05a, 0x92c0, 0x4792, 0xbe47, 0x53be, 0x1554, 0x5715, 0x4b57,
+ 0xe54a, 0x20e5, 0x21, 0xd500, 0xa1d4, 0xa8a1, 0x57a9, 0xca57, 0x5ca,
+ 0x1c06, 0x4f1c, 0xe24e, 0xd9e2, 0xf0d9, 0x4af1, 0x474b, 0x8146, 0xe81,
+ 0xfd0e, 0x84fd, 0x7c85, 0xba7c, 0x17ba, 0x4a17, 0x964a, 0xf595, 0xff5,
+ 0x5310, 0x3253, 0x6432, 0x4263, 0x2242, 0xe121, 0x32e1, 0xf632, 0xc5f5,
+ 0x21c6, 0x7d22, 0x8e7c, 0x418e, 0x5641, 0x3156, 0x7c31, 0x737c, 0x373,
+ 0x2503, 0xc22a, 0x3c2, 0x4a04, 0x8549, 0x5285, 0xa352, 0xe8a3, 0x6fe8,
+ 0x1a6f, 0x211a, 0xe021, 0x38e0, 0x7638, 0xf575, 0x9df5, 0x169e, 0xf116,
+ 0x23f1, 0xcd23, 0xece, 0x660f, 0x4866, 0x6a48, 0x716a, 0xee71, 0xa2ee,
+ 0xb8a2, 0x61b9, 0xa361, 0xf7a2, 0x26f7, 0x1127, 0x6611, 0xe065, 0x36e0,
+ 0x1837, 0x3018, 0x1c30, 0x721b, 0x3e71, 0xe43d, 0x99e4, 0x9e9a, 0xb79d,
+ 0xa9b7, 0xcaa, 0xeb0c, 0x4eb, 0x1305, 0x8813, 0xb687, 0xa9b6, 0xfba9,
+ 0xd7fb, 0xccd8, 0x2ecd, 0x652f, 0xae65, 0x3fae, 0x3a40, 0x563a, 0x7556,
+ 0x2776, 0x1228, 0xef12, 0xf9ee, 0xcef9, 0x56cf, 0xa956, 0x24a9, 0xba24,
+ 0x5fba, 0x665f, 0xf465, 0x8ff4, 0x6d8f, 0x346d, 0x5f34, 0x385f, 0xd137,
+ 0xb8d0, 0xacb8, 0x55ac, 0x7455, 0xe874, 0x89e8, 0xd189, 0xa0d1, 0xb2a0,
+ 0xb8b2, 0x36b8, 0x5636, 0xd355, 0x8d3, 0x1908, 0x2118, 0xc21, 0x990c,
+ 0x8b99, 0x158c, 0x7815, 0x9e78, 0x6f9e, 0x4470, 0x1d44, 0x341d, 0x2634,
+ 0x3f26, 0x793e, 0xc79, 0xcc0b, 0x26cc, 0xd126, 0x1fd1, 0xb41f, 0xb6b4,
+ 0x22b7, 0xa122, 0xa1, 0x7f01, 0x837e, 0x3b83, 0xaf3b, 0x6fae, 0x916f,
+ 0xb490, 0xffb3, 0xceff, 0x50cf, 0x7550, 0x7275, 0x1272, 0x2613, 0xaa26,
+ 0xd5aa, 0x7d5, 0x9607, 0x96, 0xb100, 0xf8b0, 0x4bf8, 0xdd4c, 0xeddd,
+ 0x98ed, 0x2599, 0x9325, 0xeb92, 0x8feb, 0xcc8f, 0x2acd, 0x392b, 0x3b39,
+ 0xcb3b, 0x6acb, 0xd46a, 0xb8d4, 0x6ab8, 0x106a, 0x2f10, 0x892f, 0x789,
+ 0xc806, 0x45c8, 0x7445, 0x3c74, 0x3a3c, 0xcf39, 0xd7ce, 0x58d8, 0x6e58,
+ 0x336e, 0x1034, 0xee10, 0xe9ed, 0xc2e9, 0x3fc2, 0xd53e, 0xd2d4, 0xead2,
+ 0x8fea, 0x2190, 0x1162, 0xbe11, 0x8cbe, 0x6d8c, 0xfb6c, 0x6dfb, 0xd36e,
+ 0x3ad3, 0xf3a, 0x870e, 0xc287, 0x53c3, 0xc54, 0x5b0c, 0x7d5a, 0x797d,
+ 0xec79, 0x5dec, 0x4d5e, 0x184e, 0xd618, 0x60d6, 0xb360, 0x98b3, 0xf298,
+ 0xb1f2, 0x69b1, 0xf969, 0xef9, 0xab0e, 0x21ab, 0xe321, 0x24e3, 0x8224,
+ 0x5481, 0x5954, 0x7a59, 0xff7a, 0x7dff, 0x1a7d, 0xa51a, 0x46a5, 0x6b47,
+ 0xe6b, 0x830e, 0xa083, 0xff9f, 0xd0ff, 0xffd0, 0xe6ff, 0x7de7, 0xc67d,
+ 0xd0c6, 0x61d1, 0x3a62, 0xc3b, 0x150c, 0x1715, 0x4517, 0x5345, 0x3954,
+ 0xdd39, 0xdadd, 0x32db, 0x6a33, 0xd169, 0x86d1, 0xb687, 0x3fb6, 0x883f,
+ 0xa487, 0x39a4, 0x2139, 0xbe20, 0xffbe, 0xedfe, 0x8ded, 0x368e, 0xc335,
+ 0x51c3, 0x9851, 0xf297, 0xd6f2, 0xb9d6, 0x95ba, 0x2096, 0xea1f, 0x76e9,
+ 0x4e76, 0xe04d, 0xd0df, 0x80d0, 0xa280, 0xfca2, 0x75fc, 0xef75, 0x32ef,
+ 0x6833, 0xdf68, 0xc4df, 0x76c4, 0xb77, 0xb10a, 0xbfb1, 0x58bf, 0x5258,
+ 0x4d52, 0x6c4d, 0x7e6c, 0xb67e, 0xccb5, 0x8ccc, 0xbe8c, 0xc8bd, 0x9ac8,
+ 0xa99b, 0x52a9, 0x2f53, 0xc30, 0x3e0c, 0xb83d, 0x83b7, 0x5383, 0x7e53,
+ 0x4f7e, 0xe24e, 0xb3e1, 0x8db3, 0x618e, 0xc861, 0xfcc8, 0x34fc, 0x9b35,
+ 0xaa9b, 0xb1aa, 0x5eb1, 0x395e, 0x8639, 0xd486, 0x8bd4, 0x558b, 0x2156,
+ 0xf721, 0x4ef6, 0x14f, 0x7301, 0xdd72, 0x49de, 0x894a, 0x9889, 0x8898,
+ 0x7788, 0x7b77, 0x637b, 0xb963, 0xabb9, 0x7cab, 0xc87b, 0x21c8, 0xcb21,
+ 0xdfca, 0xbfdf, 0xf2bf, 0x6af2, 0x626b, 0xb261, 0x3cb2, 0xc63c, 0xc9c6,
+ 0xc9c9, 0xb4c9, 0xf9b4, 0x91f9, 0x4091, 0x3a40, 0xcc39, 0xd1cb, 0x7ed1,
+ 0x537f, 0x6753, 0xa167, 0xba49, 0x88ba, 0x7789, 0x3877, 0xf037, 0xd3ef,
+ 0xb5d4, 0x55b6, 0xa555, 0xeca4, 0xa1ec, 0xb6a2, 0x7b7, 0x9507, 0xfd94,
+ 0x82fd, 0x5c83, 0x765c, 0x9676, 0x3f97, 0xda3f, 0x6fda, 0x646f, 0x3064,
+ 0x5e30, 0x655e, 0x6465, 0xcb64, 0xcdca, 0x4ccd, 0x3f4c, 0x243f, 0x6f24,
+ 0x656f, 0x6065, 0x3560, 0x3b36, 0xac3b, 0x4aac, 0x714a, 0x7e71, 0xda7e,
+ 0x7fda, 0xda7f, 0x6fda, 0xff6f, 0xc6ff, 0xedc6, 0xd4ed, 0x70d5, 0xeb70,
+ 0xa3eb, 0x80a3, 0xca80, 0x3fcb, 0x2540, 0xf825, 0x7ef8, 0xf87e, 0x73f8,
+ 0xb474, 0xb4b4, 0x92b5, 0x9293, 0x93, 0x3500, 0x7134, 0x9071, 0xfa8f,
+ 0x51fa, 0x1452, 0xba13, 0x7ab9, 0x957a, 0x8a95, 0x6e8a, 0x6d6e, 0x7c6d,
+ 0x447c, 0x9744, 0x4597, 0x8945, 0xef88, 0x8fee, 0x3190, 0x4831, 0x8447,
+ 0xa183, 0x1da1, 0xd41d, 0x2dd4, 0x4f2e, 0xc94e, 0xcbc9, 0xc9cb, 0x9ec9,
+ 0x319e, 0xd531, 0x20d5, 0x4021, 0xb23f, 0x29b2, 0xd828, 0xecd8, 0x5ded,
+ 0xfc5d, 0x4dfc, 0xd24d, 0x6bd2, 0x5f6b, 0xb35e, 0x7fb3, 0xee7e, 0x56ee,
+ 0xa657, 0x68a6, 0x8768, 0x7787, 0xb077, 0x4cb1, 0x764c, 0xb175, 0x7b1,
+ 0x3d07, 0x603d, 0x3560, 0x3e35, 0xb03d, 0xd6b0, 0xc8d6, 0xd8c8, 0x8bd8,
+ 0x3e8c, 0x303f, 0xd530, 0xf1d4, 0x42f1, 0xca42, 0xddca, 0x41dd, 0x3141,
+ 0x132, 0xe901, 0x8e9, 0xbe09, 0xe0bd, 0x2ce0, 0x862d, 0x3986, 0x9139,
+ 0x6d91, 0x6a6d, 0x8d6a, 0x1b8d, 0xac1b, 0xedab, 0x54ed, 0xc054, 0xcebf,
+ 0xc1ce, 0x5c2, 0x3805, 0x6038, 0x5960, 0xd359, 0xdd3, 0xbe0d, 0xafbd,
+ 0x6daf, 0x206d, 0x2c20, 0x862c, 0x8e86, 0xec8d, 0xa2ec, 0xa3a2, 0x51a3,
+ 0x8051, 0xfd7f, 0x91fd, 0xa292, 0xaf14, 0xeeae, 0x59ef, 0x535a, 0x8653,
+ 0x3986, 0x9539, 0xb895, 0xa0b8, 0x26a0, 0x2227, 0xc022, 0x77c0, 0xad77,
+ 0x46ad, 0xaa46, 0x60aa, 0x8560, 0x4785, 0xd747, 0x45d7, 0x2346, 0x5f23,
+ 0x25f, 0x1d02, 0x71d, 0x8206, 0xc82, 0x180c, 0x3018, 0x4b30, 0x4b,
+ 0x3001, 0x1230, 0x2d12, 0x8c2d, 0x148d, 0x4015, 0x5f3f, 0x3d5f, 0x6b3d,
+ 0x396b, 0x473a, 0xf746, 0x44f7, 0x8945, 0x3489, 0xcb34, 0x84ca, 0xd984,
+ 0xf0d9, 0xbcf0, 0x63bd, 0x3264, 0xf332, 0x45f3, 0x7346, 0x5673, 0xb056,
+ 0xd3b0, 0x4ad4, 0x184b, 0x7d18, 0x6c7d, 0xbb6c, 0xfeba, 0xe0fe, 0x10e1,
+ 0x5410, 0x2954, 0x9f28, 0x3a9f, 0x5a3a, 0xdb59, 0xbdc, 0xb40b, 0x1ab4,
+ 0x131b, 0x5d12, 0x6d5c, 0xe16c, 0xb0e0, 0x89b0, 0xba88, 0xbb, 0x3c01,
+ 0xe13b, 0x6fe1, 0x446f, 0xa344, 0x81a3, 0xfe81, 0xc7fd, 0x38c8, 0xb38,
+ 0x1a0b, 0x6d19, 0xf36c, 0x47f3, 0x6d48, 0xb76d, 0xd3b7, 0xd8d2, 0x52d9,
+ 0x4b53, 0xa54a, 0x34a5, 0xc534, 0x9bc4, 0xed9b, 0xbeed, 0x3ebe, 0x233e,
+ 0x9f22, 0x4a9f, 0x774b, 0x4577, 0xa545, 0x64a5, 0xb65, 0x870b, 0x487,
+ 0x9204, 0x5f91, 0xd55f, 0x35d5, 0x1a35, 0x71a, 0x7a07, 0x4e7a, 0xfc4e,
+ 0x1efc, 0x481f, 0x7448, 0xde74, 0xa7dd, 0x1ea7, 0xaa1e, 0xcfaa, 0xfbcf,
+ 0xedfb, 0x6eee, 0x386f, 0x4538, 0x6e45, 0xd96d, 0x11d9, 0x7912, 0x4b79,
+ 0x494b, 0x6049, 0xac5f, 0x65ac, 0x1366, 0x5913, 0xe458, 0x7ae4, 0x387a,
+ 0x3c38, 0xb03c, 0x76b0, 0x9376, 0xe193, 0x42e1, 0x7742, 0x6476, 0x3564,
+ 0x3c35, 0x6a3c, 0xcc69, 0x94cc, 0x5d95, 0xe5e, 0xee0d, 0x4ced, 0xce4c,
+ 0x52ce, 0xaa52, 0xdaaa, 0xe4da, 0x1de5, 0x4530, 0x5445, 0x3954, 0xb639,
+ 0x81b6, 0x7381, 0x1574, 0xc215, 0x10c2, 0x3f10, 0x6b3f, 0xe76b, 0x7be7,
+ 0xbc7b, 0xf7bb, 0x41f7, 0xcc41, 0x38cc, 0x4239, 0xa942, 0x4a9, 0xc504,
+ 0x7cc4, 0x437c, 0x6743, 0xea67, 0x8dea, 0xe88d, 0xd8e8, 0xdcd8, 0x17dd,
+ 0x5718, 0x958, 0xa609, 0x41a5, 0x5842, 0x159, 0x9f01, 0x269f, 0x5a26,
+ 0x405a, 0xc340, 0xb4c3, 0xd4b4, 0xf4d3, 0xf1f4, 0x39f2, 0xe439, 0x67e4,
+ 0x4168, 0xa441, 0xdda3, 0xdedd, 0x9df, 0xab0a, 0xa5ab, 0x9a6, 0xba09,
+ 0x9ab9, 0xad9a, 0x5ae, 0xe205, 0xece2, 0xecec, 0x14ed, 0xd614, 0x6bd5,
+ 0x916c, 0x3391, 0x6f33, 0x206f, 0x8020, 0x780, 0x7207, 0x2472, 0x8a23,
+ 0xb689, 0x3ab6, 0xf739, 0x97f6, 0xb097, 0xa4b0, 0xe6a4, 0x88e6, 0x2789,
+ 0xb28, 0x350b, 0x1f35, 0x431e, 0x1043, 0xc30f, 0x79c3, 0x379, 0x5703,
+ 0x3256, 0x4732, 0x7247, 0x9d72, 0x489d, 0xd348, 0xa4d3, 0x7ca4, 0xbf7b,
+ 0x45c0, 0x7b45, 0x337b, 0x4034, 0x843f, 0xd083, 0x35d0, 0x6335, 0x4d63,
+ 0xe14c, 0xcce0, 0xfecc, 0x35ff, 0x5636, 0xf856, 0xeef8, 0x2def, 0xfc2d,
+ 0x4fc, 0x6e04, 0xb66d, 0x78b6, 0xbb78, 0x3dbb, 0x9a3d, 0x839a, 0x9283,
+ 0x593, 0xd504, 0x23d5, 0x5424, 0xd054, 0x61d0, 0xdb61, 0x17db, 0x1f18,
+ 0x381f, 0x9e37, 0x679e, 0x1d68, 0x381d, 0x8038, 0x917f, 0x491, 0xbb04,
+ 0x23bb, 0x4124, 0xd41, 0xa30c, 0x8ba3, 0x8b8b, 0xc68b, 0xd2c6, 0xebd2,
+ 0x93eb, 0xbd93, 0x99bd, 0x1a99, 0xea19, 0x58ea, 0xcf58, 0x73cf, 0x1073,
+ 0x9e10, 0x139e, 0xea13, 0xcde9, 0x3ecd, 0x883f, 0xf89, 0x180f, 0x2a18,
+ 0x212a, 0xce20, 0x73ce, 0xf373, 0x60f3, 0xad60, 0x4093, 0x8e40, 0xb98e,
+ 0xbfb9, 0xf1bf, 0x8bf1, 0x5e8c, 0xe95e, 0x14e9, 0x4e14, 0x1c4e, 0x7f1c,
+ 0xe77e, 0x6fe7, 0xf26f, 0x13f2, 0x8b13, 0xda8a, 0x5fda, 0xea5f, 0x4eea,
+ 0xa84f, 0x88a8, 0x1f88, 0x2820, 0x9728, 0x5a97, 0x3f5b, 0xb23f, 0x70b2,
+ 0x2c70, 0x232d, 0xf623, 0x4f6, 0x905, 0x7509, 0xd675, 0x28d7, 0x9428,
+ 0x3794, 0xf036, 0x2bf0, 0xba2c, 0xedb9, 0xd7ed, 0x59d8, 0xed59, 0x4ed,
+ 0xe304, 0x18e3, 0x5c19, 0x3d5c, 0x753d, 0x6d75, 0x956d, 0x7f95, 0xc47f,
+ 0x83c4, 0xa84, 0x2e0a, 0x5f2e, 0xb95f, 0x77b9, 0x6d78, 0xf46d, 0x1bf4,
+ 0xed1b, 0xd6ed, 0xe0d6, 0x5e1, 0x3905, 0x5638, 0xa355, 0x99a2, 0xbe99,
+ 0xb4bd, 0x85b4, 0x2e86, 0x542e, 0x6654, 0xd765, 0x73d7, 0x3a74, 0x383a,
+ 0x2638, 0x7826, 0x7677, 0x9a76, 0x7e99, 0x2e7e, 0xea2d, 0xa6ea, 0x8a7,
+ 0x109, 0x3300, 0xad32, 0x5fad, 0x465f, 0x2f46, 0xc62f, 0xd4c5, 0xad5,
+ 0xcb0a, 0x4cb, 0xb004, 0x7baf, 0xe47b, 0x92e4, 0x8e92, 0x638e, 0x1763,
+ 0xc17, 0xf20b, 0x1ff2, 0x8920, 0x5889, 0xcb58, 0xf8cb, 0xcaf8, 0x84cb,
+ 0x9f84, 0x8a9f, 0x918a, 0x4991, 0x8249, 0xff81, 0x46ff, 0x5046, 0x5f50,
+ 0x725f, 0xf772, 0x8ef7, 0xe08f, 0xc1e0, 0x1fc2, 0x9e1f, 0x8b9d, 0x108b,
+ 0x411, 0x2b04, 0xb02a, 0x1fb0, 0x1020, 0x7a0f, 0x587a, 0x8958, 0xb188,
+ 0xb1b1, 0x49b2, 0xb949, 0x7ab9, 0x917a, 0xfc91, 0xe6fc, 0x47e7, 0xbc47,
+ 0x8fbb, 0xea8e, 0x34ea, 0x2635, 0x1726, 0x9616, 0xc196, 0xa6c1, 0xf3a6,
+ 0x11f3, 0x4811, 0x3e48, 0xeb3e, 0xf7ea, 0x1bf8, 0xdb1c, 0x8adb, 0xe18a,
+ 0x42e1, 0x9d42, 0x5d9c, 0x6e5d, 0x286e, 0x4928, 0x9a49, 0xb09c, 0xa6b0,
+ 0x2a7, 0xe702, 0xf5e6, 0x9af5, 0xf9b, 0x810f, 0x8080, 0x180, 0x1702,
+ 0x5117, 0xa650, 0x11a6, 0x1011, 0x550f, 0xd554, 0xbdd5, 0x6bbe, 0xc66b,
+ 0xfc7, 0x5510, 0x5555, 0x7655, 0x177, 0x2b02, 0x6f2a, 0xb70, 0x9f0b,
+ 0xcf9e, 0xf3cf, 0x3ff4, 0xcb40, 0x8ecb, 0x768e, 0x5277, 0x8652, 0x9186,
+ 0x9991, 0x5099, 0xd350, 0x93d3, 0x6d94, 0xe6d, 0x530e, 0x3153, 0xa531,
+ 0x64a5, 0x7964, 0x7c79, 0x467c, 0x1746, 0x3017, 0x3730, 0x538, 0x5,
+ 0x1e00, 0x5b1e, 0x955a, 0xae95, 0x3eaf, 0xff3e, 0xf8ff, 0xb2f9, 0xa1b3,
+ 0xb2a1, 0x5b2, 0xad05, 0x7cac, 0x2d7c, 0xd32c, 0x80d2, 0x7280, 0x8d72,
+ 0x1b8e, 0x831b, 0xac82, 0xfdac, 0xa7fd, 0x15a8, 0xd614, 0xe0d5, 0x7be0,
+ 0xb37b, 0x61b3, 0x9661, 0x9d95, 0xc79d, 0x83c7, 0xd883, 0xead7, 0xceb,
+ 0xf60c, 0xa9f5, 0x19a9, 0xa019, 0x8f9f, 0xd48f, 0x3ad5, 0x853a, 0x985,
+ 0x5309, 0x6f52, 0x1370, 0x6e13, 0xa96d, 0x98a9, 0x5198, 0x9f51, 0xb69f,
+ 0xa1b6, 0x2ea1, 0x672e, 0x2067, 0x6520, 0xaf65, 0x6eaf, 0x7e6f, 0xee7e,
+ 0x17ef, 0xa917, 0xcea8, 0x9ace, 0xff99, 0x5dff, 0xdf5d, 0x38df, 0xa39,
+ 0x1c0b, 0xe01b, 0x46e0, 0xcb46, 0x90cb, 0xba90, 0x4bb, 0x9104, 0x9d90,
+ 0xc89c, 0xf6c8, 0x6cf6, 0x886c, 0x1789, 0xbd17, 0x70bc, 0x7e71, 0x17e,
+ 0x1f01, 0xa01f, 0xbaa0, 0x14bb, 0xfc14, 0x7afb, 0xa07a, 0x3da0, 0xbf3d,
+ 0x48bf, 0x8c48, 0x968b, 0x9d96, 0xfd9d, 0x96fd, 0x9796, 0x6b97, 0xd16b,
+ 0xf4d1, 0x3bf4, 0x253c, 0x9125, 0x6691, 0xc166, 0x34c1, 0x5735, 0x1a57,
+ 0xdc19, 0x77db, 0x8577, 0x4a85, 0x824a, 0x9182, 0x7f91, 0xfd7f, 0xb4c3,
+ 0xb5b4, 0xb3b5, 0x7eb3, 0x617e, 0x4e61, 0xa4f, 0x530a, 0x3f52, 0xa33e,
+ 0x34a3, 0x9234, 0xf091, 0xf4f0, 0x1bf5, 0x311b, 0x9631, 0x6a96, 0x386b,
+ 0x1d39, 0xe91d, 0xe8e9, 0x69e8, 0x426a, 0xee42, 0x89ee, 0x368a, 0x2837,
+ 0x7428, 0x5974, 0x6159, 0x1d62, 0x7b1d, 0xf77a, 0x7bf7, 0x6b7c, 0x696c,
+ 0xf969, 0x4cf9, 0x714c, 0x4e71, 0x6b4e, 0x256c, 0x6e25, 0xe96d, 0x94e9,
+ 0x8f94, 0x3e8f, 0x343e, 0x4634, 0xb646, 0x97b5, 0x8997, 0xe8a, 0x900e,
+ 0x8090, 0xfd80, 0xa0fd, 0x16a1, 0xf416, 0xebf4, 0x95ec, 0x1196, 0x8911,
+ 0x3d89, 0xda3c, 0x9fd9, 0xd79f, 0x4bd7, 0x214c, 0x3021, 0x4f30, 0x994e,
+ 0x5c99, 0x6f5d, 0x326f, 0xab31, 0x6aab, 0xe969, 0x90e9, 0x1190, 0xff10,
+ 0xa2fe, 0xe0a2, 0x66e1, 0x4067, 0x9e3f, 0x2d9e, 0x712d, 0x8170, 0xd180,
+ 0xffd1, 0x25ff, 0x3826, 0x2538, 0x5f24, 0xc45e, 0x1cc4, 0xdf1c, 0x93df,
+ 0xc793, 0x80c7, 0x2380, 0xd223, 0x7ed2, 0xfc7e, 0x22fd, 0x7422, 0x1474,
+ 0xb714, 0x7db6, 0x857d, 0xa85, 0xa60a, 0x88a6, 0x4289, 0x7842, 0xc278,
+ 0xf7c2, 0xcdf7, 0x84cd, 0xae84, 0x8cae, 0xb98c, 0x1aba, 0x4d1a, 0x884c,
+ 0x4688, 0xcc46, 0xd8cb, 0x2bd9, 0xbe2b, 0xa2be, 0x72a2, 0xf772, 0xd2f6,
+ 0x75d2, 0xc075, 0xa3c0, 0x63a3, 0xae63, 0x8fae, 0x2a90, 0x5f2a, 0xef5f,
+ 0x5cef, 0xa05c, 0x89a0, 0x5e89, 0x6b5e, 0x736b, 0x773, 0x9d07, 0xe99c,
+ 0x27ea, 0x2028, 0xc20, 0x980b, 0x4797, 0x2848, 0x9828, 0xc197, 0x48c2,
+ 0x2449, 0x7024, 0x570, 0x3e05, 0xd3e, 0xf60c, 0xbbf5, 0x69bb, 0x3f6a,
+ 0x740, 0xf006, 0xe0ef, 0xbbe0, 0xadbb, 0x56ad, 0xcf56, 0xbfce, 0xa9bf,
+ 0x205b, 0x6920, 0xae69, 0x50ae, 0x2050, 0xf01f, 0x27f0, 0x9427, 0x8993,
+ 0x8689, 0x4087, 0x6e40, 0xb16e, 0xa1b1, 0xe8a1, 0x87e8, 0x6f88, 0xfe6f,
+ 0x4cfe, 0xe94d, 0xd5e9, 0x47d6, 0x3148, 0x5f31, 0xc35f, 0x13c4, 0xa413,
+ 0x5a5, 0x2405, 0xc223, 0x66c2, 0x3667, 0x5e37, 0x5f5e, 0x2f5f, 0x8c2f,
+ 0xe48c, 0xd0e4, 0x4d1, 0xd104, 0xe4d0, 0xcee4, 0xfcf, 0x480f, 0xa447,
+ 0x5ea4, 0xff5e, 0xbefe, 0x8dbe, 0x1d8e, 0x411d, 0x1841, 0x6918, 0x5469,
+ 0x1155, 0xc611, 0xaac6, 0x37ab, 0x2f37, 0xca2e, 0x87ca, 0xbd87, 0xabbd,
+ 0xb3ab, 0xcb4, 0xce0c, 0xfccd, 0xa5fd, 0x72a5, 0xf072, 0x83f0, 0xfe83,
+ 0x97fd, 0xc997, 0xb0c9, 0xadb0, 0xe6ac, 0x88e6, 0x1088, 0xbe10, 0x16be,
+ 0xa916, 0xa3a8, 0x46a3, 0x5447, 0xe953, 0x84e8, 0x2085, 0xa11f, 0xfa1,
+ 0xdd0f, 0xbedc, 0x5abe, 0x805a, 0xc97f, 0x6dc9, 0x826d, 0x4a82, 0x934a,
+ 0x5293, 0xd852, 0xd3d8, 0xadd3, 0xf4ad, 0xf3f4, 0xfcf3, 0xfefc, 0xcafe,
+ 0xb7ca, 0x3cb8, 0xa13c, 0x18a1, 0x1418, 0xea13, 0x91ea, 0xf891, 0x53f8,
+ 0xa254, 0xe9a2, 0x87ea, 0x4188, 0x1c41, 0xdc1b, 0xf5db, 0xcaf5, 0x45ca,
+ 0x6d45, 0x396d, 0xde39, 0x90dd, 0x1e91, 0x1e, 0x7b00, 0x6a7b, 0xa46a,
+ 0xc9a3, 0x9bc9, 0x389b, 0x1139, 0x5211, 0x1f52, 0xeb1f, 0xabeb, 0x48ab,
+ 0x9348, 0xb392, 0x17b3, 0x1618, 0x5b16, 0x175b, 0xdc17, 0xdedb, 0x1cdf,
+ 0xeb1c, 0xd1ea, 0x4ad2, 0xd4b, 0xc20c, 0x24c2, 0x7b25, 0x137b, 0x8b13,
+ 0x618b, 0xa061, 0xff9f, 0xfffe, 0x72ff, 0xf572, 0xe2f5, 0xcfe2, 0xd2cf,
+ 0x75d3, 0x6a76, 0xc469, 0x1ec4, 0xfc1d, 0x59fb, 0x455a, 0x7a45, 0xa479,
+ 0xb7a4
+};
+
static u8 tmp_buf[TEST_BUFLEN];
#define full_csum(buff, len, sum) csum_fold(csum_partial(buff, len, sum))
@@ -338,10 +575,55 @@ static void test_csum_no_carry_inputs(struct kunit *test)
}
}
+static void test_ip_fast_csum(struct kunit *test)
+{
+ __sum16 csum_result, expected;
+
+ for (int len = IPv4_MIN_WORDS; len < IPv4_MAX_WORDS; len++) {
+ for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index++) {
+ csum_result = ip_fast_csum(random_buf + index, len);
+ expected =
+ expected_fast_csum[(len - IPv4_MIN_WORDS) *
+ NUM_IP_FAST_CSUM_TESTS +
+ index];
+ CHECK_EQ(expected, csum_result);
+ }
+ }
+}
+
+static void test_csum_ipv6_magic(struct kunit *test)
+{
+ const struct in6_addr *saddr;
+ const struct in6_addr *daddr;
+ unsigned int len;
+ unsigned char proto;
+ unsigned int csum;
+
+ const int daddr_offset = sizeof(struct in6_addr);
+ const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr);
+ const int proto_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
+ sizeof(int);
+ const int csum_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
+ sizeof(int) + sizeof(char);
+
+ for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+ saddr = (const struct in6_addr *)(random_buf + i);
+ daddr = (const struct in6_addr *)(random_buf + i +
+ daddr_offset);
+ len = *(unsigned int *)(random_buf + i + len_offset);
+ proto = *(random_buf + i + proto_offset);
+ csum = *(unsigned int *)(random_buf + i + csum_offset);
+ CHECK_EQ(expected_csum_ipv6_magic[i],
+ csum_ipv6_magic(saddr, daddr, len, proto, csum));
+ }
+}
+
static struct kunit_case __refdata checksum_test_cases[] = {
KUNIT_CASE(test_csum_fixed_random_inputs),
KUNIT_CASE(test_csum_all_carry_inputs),
KUNIT_CASE(test_csum_no_carry_inputs),
+ KUNIT_CASE(test_ip_fast_csum),
+ KUNIT_CASE(test_csum_ipv6_magic),
{}
};
--
2.43.0
On Wed, Dec 27, 2023 at 9:38 AM Charlie Jenkins <[email protected]> wrote:
>
> Support static branches depending on the value of misaligned accesses.
> This will be used by a later patch in the series. All online cpus must
> be considered "fast" for this static branch to be flipped.
>
> Signed-off-by: Charlie Jenkins <[email protected]>
This is fancier than I would have gone for, I probably would have
punted on heterogeneous hotplug out of laziness for now. However, what
you've done looks smart, in that we'll basically flip the branch if at
any moment all the online CPUs are fast. I've got some nits below, but
won't withhold my review for them (making them optional I suppose :)).
Reviewed-by: Evan Green <[email protected]>
> ---
> arch/riscv/include/asm/cpufeature.h | 2 +
> arch/riscv/kernel/cpufeature.c | 89 +++++++++++++++++++++++++++++++++++--
> 2 files changed, 87 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> index a418c3112cd6..7b129e5e2f07 100644
> --- a/arch/riscv/include/asm/cpufeature.h
> +++ b/arch/riscv/include/asm/cpufeature.h
> @@ -133,4 +133,6 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
> return __riscv_isa_extension_available(hart_isa[cpu].isa, ext);
> }
>
> +DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
> +
> #endif
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index b3785ffc1570..dfd716b93565 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -8,8 +8,10 @@
>
> #include <linux/acpi.h>
> #include <linux/bitmap.h>
> +#include <linux/cpu.h>
> #include <linux/cpuhotplug.h>
> #include <linux/ctype.h>
> +#include <linux/jump_label.h>
> #include <linux/log2.h>
> #include <linux/memory.h>
> #include <linux/module.h>
> @@ -44,6 +46,8 @@ struct riscv_isainfo hart_isa[NR_CPUS];
> /* Performance information */
> DEFINE_PER_CPU(long, misaligned_access_speed);
>
> +static cpumask_t fast_misaligned_access;
> +
> /**
> * riscv_isa_extension_base() - Get base extension word
> *
> @@ -643,6 +647,16 @@ static int check_unaligned_access(void *param)
> (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");
>
> per_cpu(misaligned_access_speed, cpu) = speed;
> +
> + /*
> + * Set the value of fast_misaligned_access of a CPU. These operations
> + * are atomic to avoid race conditions.
> + */
> + if (speed == RISCV_HWPROBE_MISALIGNED_FAST)
> + cpumask_set_cpu(cpu, &fast_misaligned_access);
> + else
> + cpumask_clear_cpu(cpu, &fast_misaligned_access);
> +
> return 0;
> }
>
> @@ -655,13 +669,70 @@ static void check_unaligned_access_nonboot_cpu(void *param)
> check_unaligned_access(pages[cpu]);
> }
>
> +DEFINE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
> +
> +static int exclude_set_unaligned_access_static_branches(int cpu)
> +{
> + /*
> + * Same as set_unaligned_access_static_branches, except excludes the
> + * given CPU from the result. When a CPU is hotplugged into an offline
> + * state, this function is called before the CPU is set to offline in
> + * the cpumask, and thus the CPU needs to be explicitly excluded.
> + */
> +
> + cpumask_t online_fast_misaligned_access;
> +
> + cpumask_and(&online_fast_misaligned_access, &fast_misaligned_access, cpu_online_mask);
> + cpumask_clear_cpu(cpu, &online_fast_misaligned_access);
> +
> + if (cpumask_weight(&online_fast_misaligned_access) == (num_online_cpus() - 1))
> + static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
> + else
> + static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
> +
> + return 0;
> +}
A minor nit: the function above and below are looking a little
copy/pasty, and lead to multiple spots where the static branch gets
changed. You could make a third function that actually does the
setting with parameters, then these two could call it in different
ways. The return types also don't need to be int, since you always
return 0. Something like:
static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
{
if (cpumask_weight(mask) == weight) {
static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
} else {
static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
}
}
static void set_unaligned_access_branches(void)
{
cpumask_t fast_and_online;
cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask);
modify_unaligned_access_branches(&fast_and_online, num_online_cpus());
}
static void set_unaligned_access_branches_except_cpu(unsigned int cpu)
{
cpumask_t fast_except_me;
cpumask_and(&online_fast_misaligned_access,
&fast_misaligned_access, cpu_online_mask);
cpumask_clear_cpu(cpu, &fast_except_me);
modify_unaligned_access_branches(&fast_except_me,
num_online_cpus() - 1);
}
> +
> +static int set_unaligned_access_static_branches(void)
> +{
> + /*
> + * This will be called after check_unaligned_access_all_cpus so the
> + * result of unaligned access speed for all CPUs will be available.
> + *
> + * To avoid the number of online cpus changing between reading
> + * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be
> + * held before calling this function.
> + */
> + cpumask_t online_fast_misaligned_access;
> +
> + cpumask_and(&online_fast_misaligned_access, &fast_misaligned_access, cpu_online_mask);
> +
> + if (cpumask_weight(&online_fast_misaligned_access) == num_online_cpus())
> + static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
> + else
> + static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
> +
> + return 0;
> +}
> +
> +static int lock_and_set_unaligned_access_static_branch(void)
> +{
> + cpus_read_lock();
> + set_unaligned_access_static_branches();
> + cpus_read_unlock();
> +
> + return 0;
> +}
> +
> +arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
> +
> static int riscv_online_cpu(unsigned int cpu)
> {
> static struct page *buf;
>
> /* We are already set since the last check */
> if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
> - return 0;
> + goto exit;
>
> buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
> if (!buf) {
> @@ -671,7 +742,14 @@ static int riscv_online_cpu(unsigned int cpu)
>
> check_unaligned_access(buf);
> __free_pages(buf, MISALIGNED_BUFFER_ORDER);
> - return 0;
> +
> +exit:
> + return set_unaligned_access_static_branches();
> +}
> +
> +static int riscv_offline_cpu(unsigned int cpu)
> +{
> + return exclude_set_unaligned_access_static_branches(cpu);
> }
>
> /* Measure unaligned access on all CPUs present at boot in parallel. */
> @@ -705,9 +783,12 @@ static int check_unaligned_access_all_cpus(void)
> /* Check core 0. */
> smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
>
> - /* Setup hotplug callback for any new CPUs that come online. */
> + /*
> + * Setup hotplug callbacks for any new CPUs that come online or go
> + * offline.
> + */
> cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
> - riscv_online_cpu, NULL);
> + riscv_online_cpu, riscv_offline_cpu);
>
> out:
> unaligned_emulation_finish();
>
> --
> 2.43.0
>
On Mon, Jan 08, 2024 at 11:22:34AM -0800, Evan Green wrote:
> On Wed, Dec 27, 2023 at 9:38 AM Charlie Jenkins <[email protected]> wrote:
> >
> > Support static branches depending on the value of misaligned accesses.
> > This will be used by a later patch in the series. All online cpus must
> > be considered "fast" for this static branch to be flipped.
> >
> > Signed-off-by: Charlie Jenkins <[email protected]>
>
> This is fancier than I would have gone for, I probably would have
> punted on heterogeneous hotplug out of laziness for now. However, what
> you've done looks smart, in that we'll basically flip the branch if at
> any moment all the online CPUs are fast. I've got some nits below, but
> won't withhold my review for them (making them optional I suppose :)).
>
> Reviewed-by: Evan Green <[email protected]>
>
Thanks!
> > ---
> > arch/riscv/include/asm/cpufeature.h | 2 +
> > arch/riscv/kernel/cpufeature.c | 89 +++++++++++++++++++++++++++++++++++--
> > 2 files changed, 87 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> > index a418c3112cd6..7b129e5e2f07 100644
> > --- a/arch/riscv/include/asm/cpufeature.h
> > +++ b/arch/riscv/include/asm/cpufeature.h
> > @@ -133,4 +133,6 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
> > return __riscv_isa_extension_available(hart_isa[cpu].isa, ext);
> > }
> >
> > +DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
> > +
> > #endif
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index b3785ffc1570..dfd716b93565 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -8,8 +8,10 @@
> >
> > #include <linux/acpi.h>
> > #include <linux/bitmap.h>
> > +#include <linux/cpu.h>
> > #include <linux/cpuhotplug.h>
> > #include <linux/ctype.h>
> > +#include <linux/jump_label.h>
> > #include <linux/log2.h>
> > #include <linux/memory.h>
> > #include <linux/module.h>
> > @@ -44,6 +46,8 @@ struct riscv_isainfo hart_isa[NR_CPUS];
> > /* Performance information */
> > DEFINE_PER_CPU(long, misaligned_access_speed);
> >
> > +static cpumask_t fast_misaligned_access;
> > +
> > /**
> > * riscv_isa_extension_base() - Get base extension word
> > *
> > @@ -643,6 +647,16 @@ static int check_unaligned_access(void *param)
> > (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");
> >
> > per_cpu(misaligned_access_speed, cpu) = speed;
> > +
> > + /*
> > + * Set the value of fast_misaligned_access of a CPU. These operations
> > + * are atomic to avoid race conditions.
> > + */
> > + if (speed == RISCV_HWPROBE_MISALIGNED_FAST)
> > + cpumask_set_cpu(cpu, &fast_misaligned_access);
> > + else
> > + cpumask_clear_cpu(cpu, &fast_misaligned_access);
> > +
> > return 0;
> > }
> >
> > @@ -655,13 +669,70 @@ static void check_unaligned_access_nonboot_cpu(void *param)
> > check_unaligned_access(pages[cpu]);
> > }
> >
> > +DEFINE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
> > +
> > +static int exclude_set_unaligned_access_static_branches(int cpu)
> > +{
> > + /*
> > + * Same as set_unaligned_access_static_branches, except excludes the
> > + * given CPU from the result. When a CPU is hotplugged into an offline
> > + * state, this function is called before the CPU is set to offline in
> > + * the cpumask, and thus the CPU needs to be explicitly excluded.
> > + */
> > +
> > + cpumask_t online_fast_misaligned_access;
> > +
> > + cpumask_and(&online_fast_misaligned_access, &fast_misaligned_access, cpu_online_mask);
> > + cpumask_clear_cpu(cpu, &online_fast_misaligned_access);
> > +
> > + if (cpumask_weight(&online_fast_misaligned_access) == (num_online_cpus() - 1))
> > + static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
> > + else
> > + static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
> > +
> > + return 0;
> > +}
>
> A minor nit: the function above and below are looking a little
> copy/pasty, and lead to multiple spots where the static branch gets
> changed. You could make a third function that actually does the
> setting with parameters, then these two could call it in different
> ways. The return types also don't need to be int, since you always
> return 0. Something like:
>
> static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
> {
> if (cpumask_weight(mask) == weight) {
> static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
> } else {
> static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
> }
> }
>
> static void set_unaligned_access_branches(void)
> {
> cpumask_t fast_and_online;
>
> cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask);
> modify_unaligned_access_branches(&fast_and_online, num_online_cpus());
> }
>
> static void set_unaligned_access_branches_except_cpu(unsigned int cpu)
> {
> cpumask_t fast_except_me;
>
> cpumask_and(&online_fast_misaligned_access,
> &fast_misaligned_access, cpu_online_mask);
> cpumask_clear_cpu(cpu, &fast_except_me);
> modify_unaligned_access_branches(&fast_except_me,
> num_online_cpus() - 1);
> }
>
Great suggestions, I will apply these changes and send out a new
version.
- Charlie
> > +
> > +static int set_unaligned_access_static_branches(void)
> > +{
> > + /*
> > + * This will be called after check_unaligned_access_all_cpus so the
> > + * result of unaligned access speed for all CPUs will be available.
> > + *
> > + * To avoid the number of online cpus changing between reading
> > + * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be
> > + * held before calling this function.
> > + */
> > + cpumask_t online_fast_misaligned_access;
> > +
> > + cpumask_and(&online_fast_misaligned_access, &fast_misaligned_access, cpu_online_mask);
> > +
> > + if (cpumask_weight(&online_fast_misaligned_access) == num_online_cpus())
> > + static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
> > + else
> > + static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
> > +
> > + return 0;
> > +}
> > +
> > +static int lock_and_set_unaligned_access_static_branch(void)
> > +{
> > + cpus_read_lock();
> > + set_unaligned_access_static_branches();
> > + cpus_read_unlock();
> > +
> > + return 0;
> > +}
> > +
> > +arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
> > +
> > static int riscv_online_cpu(unsigned int cpu)
> > {
> > static struct page *buf;
> >
> > /* We are already set since the last check */
> > if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
> > - return 0;
> > + goto exit;
> >
> > buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
> > if (!buf) {
> > @@ -671,7 +742,14 @@ static int riscv_online_cpu(unsigned int cpu)
> >
> > check_unaligned_access(buf);
> > __free_pages(buf, MISALIGNED_BUFFER_ORDER);
> > - return 0;
> > +
> > +exit:
> > + return set_unaligned_access_static_branches();
> > +}
> > +
> > +static int riscv_offline_cpu(unsigned int cpu)
> > +{
> > + return exclude_set_unaligned_access_static_branches(cpu);
> > }
> >
> > /* Measure unaligned access on all CPUs present at boot in parallel. */
> > @@ -705,9 +783,12 @@ static int check_unaligned_access_all_cpus(void)
> > /* Check core 0. */
> > smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
> >
> > - /* Setup hotplug callback for any new CPUs that come online. */
> > + /*
> > + * Setup hotplug callbacks for any new CPUs that come online or go
> > + * offline.
> > + */
> > cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
> > - riscv_online_cpu, NULL);
> > + riscv_online_cpu, riscv_offline_cpu);
> >
> > out:
> > unaligned_emulation_finish();
> >
> > --
> > 2.43.0
> >