2024-01-08 23:57:47

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v15 0/5] riscv: Add fine-tuned checksum functions

Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking. Tested on QEMU, this series
allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.

This patch takes heavy use of the Zbb extension using alternatives
patching.

To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT.

I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.

ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.

Relies on https://lore.kernel.org/lkml/[email protected]/

---

The algorithm proposed to replace the default csum_fold can be seen to
compute the same result by running all 2^32 possible inputs.

static inline unsigned int ror32(unsigned int word, unsigned int shift)
{
return (word >> (shift & 31)) | (word << ((-shift) & 31));
}

unsigned short csum_fold(unsigned int csum)
{
unsigned int sum = csum;
sum = (sum & 0xffff) + (sum >> 16);
sum = (sum & 0xffff) + (sum >> 16);
return ~sum;
}

unsigned short csum_fold_arc(unsigned int csum)
{
return ((~csum - ror32(csum, 16)) >> 16);
}

int main()
{
unsigned int start = 0x0;
do {
if (csum_fold(start) != csum_fold_arc(start)) {
printf("Not the same %u\n", start);
return -1;
}
start += 1;
} while(start != 0x0);
printf("The same\n");
return 0;
}

Cc: Paul Walmsley <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Arnd Bergmann <[email protected]>
To: Charlie Jenkins <[email protected]>
To: Palmer Dabbelt <[email protected]>
To: Conor Dooley <[email protected]>
To: Samuel Holland <[email protected]>
To: David Laight <[email protected]>
To: Xiao Wang <[email protected]>
To: Evan Green <[email protected]>
To: Guo Ren <[email protected]>
To: [email protected]
To: [email protected]
To: [email protected]
Signed-off-by: Charlie Jenkins <[email protected]>

---
Changes in v15:
- Create modify_unaligned_access_branches to consolidate duplicate code
(Evan)
- Link to v14: https://lore.kernel.org/r/[email protected]

Changes in v14:
- Update misaligned static branch when CPUs are hotplugged (Guo)
- Leave off Evan's reviewed-by on patch 2 since it was completely
re-written
- Link to v13: https://lore.kernel.org/r/[email protected]

Changes in v13:
- Move cast from patch 4 to patch 3
- Link to v12: https://lore.kernel.org/r/[email protected]

Changes in v12:
- Rebase onto 6.7-rc5
- Add performance stats in the cover letter
- Link to v11: https://lore.kernel.org/r/[email protected]

Changes in v11:
- Extensive modifications to comply to sparse
- Organize include statements (Xiao)
- Add csum_ipv6_magic to commit message (Xiao)
- Remove extraneous len statement (Xiao)
- Add kasan_check_read call (Xiao)
- Improve comment field checksum.h (Xiao)
- Consolidate "buff" and "len" into one parameter "end" (Xiao)
- Link to v10: https://lore.kernel.org/r/[email protected]

Changes in v10:
- Move tests that were riscv-specific to be arch agnostic (Arnd)
- Link to v9: https://lore.kernel.org/r/[email protected]

Changes in v9:
- Use ror64 (Xiao)
- Move do_csum and csum_ipv6_magic headers to patch 4 (Xiao)
- Remove word "IP" from checksum headers (Xiao)
- Swap to using ifndef CONFIG_32BIT instead of ifdef CONFIG_64BIT (Xiao)
- Run no alignment code when buff is aligned (Xiao)
- Consolidate two do_csum implementations overlap into do_csum_common
- Link to v8: https://lore.kernel.org/r/[email protected]

Changes in v8:
- Speedups of 12% without Zbb and 21% with Zbb when cpu supports fast
misaligned accesses for do_csum
- Various formatting updates
- Patch now relies on https://lore.kernel.org/lkml/[email protected]/
- Link to v7: https://lore.kernel.org/r/[email protected]

Changes in v7:
- Included linux/bitops.h in asm-generic/checksum.h to use ror (Conor)
- Optimized loop in do_csum (David)
- Used ror instead of shifting (David)
- Unfortunately had to reintroduce ifdefs because gcc is not smart
enough to not throw warnings on code that will never execute
- Use ifdef instead of IS_ENABLED on __LITTLE_ENDIAN because IS_ENABLED
does not work on that
- Only optimize for zbb when alternatives is enabled in do_csum
- Link to v6: https://lore.kernel.org/r/[email protected]

Changes in v6:
- Fix accuracy of commit message for csum_fold
- Fix indentation
- Link to v5: https://lore.kernel.org/r/[email protected]

Changes in v5:
- Drop vector patches
- Check ZBB enabled before doing any ZBB code (Conor)
- Check endianness in IS_ENABLED
- Revert to the simpler non-tree based version of ipv6_csum_magic since
David pointed out that the tree based version is not better.
- Link to v4: https://lore.kernel.org/r/[email protected]

Changes in v4:
- Suggestion by David Laight to use an improved checksum used in
arch/arc.
- Eliminates zero-extension on rv32, but not on rv64.
- Reduces data dependency which should improve execution speed on
rv32 and rv64
- Still passes CHECKSUM_KUNIT and RISCV_CHECKSUM_KUNIT on rv32 and
rv64 with and without zbb.
- Link to v3: https://lore.kernel.org/r/[email protected]

Changes in v3:
- Use riscv_has_extension_likely and has_vector where possible (Conor)
- Reduce ifdefs by using IS_ENABLED where possible (Conor)
- Use kernel_vector_begin in the vector code (Samuel)
- Link to v2: https://lore.kernel.org/r/[email protected]

Changes in v2:
- After more benchmarking, rework functions to improve performance.
- Remove tests that overlapped with the already existing checksum
tests and make tests more extensive.
- Use alternatives to activate code with Zbb and vector extensions
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Charlie Jenkins (5):
asm-generic: Improve csum_fold
riscv: Add static key for misaligned accesses
riscv: Add checksum header
riscv: Add checksum library
kunit: Add tests for csum_ipv6_magic and ip_fast_csum

arch/riscv/include/asm/checksum.h | 93 ++++++++++
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/kernel/cpufeature.c | 90 +++++++++-
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/csum.c | 326 ++++++++++++++++++++++++++++++++++++
include/asm-generic/checksum.h | 6 +-
lib/checksum_kunit.c | 284 ++++++++++++++++++++++++++++++-
7 files changed, 795 insertions(+), 7 deletions(-)
---
base-commit: a39b6ac3781d46ba18193c9dbb2110f31e9bffe9
change-id: 20230804-optimize_checksum-db145288ac21
--
- Charlie



2024-01-08 23:57:52

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v15 1/5] asm-generic: Improve csum_fold

This csum_fold implementation introduced into arch/arc by Vineet Gupta
is better than the default implementation on at least arc, x86, and
riscv. Using GCC trunk and compiling non-inlined version, this
implementation has 41.6667%, 25% fewer instructions on riscv64, x86-64
respectively with -O3 optimization. Most implmentations override this
default in asm, but this should be more performant than all of those
other implementations except for arm which has barrel shifting and
sparc32 which has a carry flag.

Signed-off-by: Charlie Jenkins <[email protected]>
Reviewed-by: David Laight <[email protected]>
---
include/asm-generic/checksum.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/checksum.h b/include/asm-generic/checksum.h
index 43e18db89c14..ad928cce268b 100644
--- a/include/asm-generic/checksum.h
+++ b/include/asm-generic/checksum.h
@@ -2,6 +2,8 @@
#ifndef __ASM_GENERIC_CHECKSUM_H
#define __ASM_GENERIC_CHECKSUM_H

+#include <linux/bitops.h>
+
/*
* computes the checksum of a memory block at buff, length len,
* and adds in "sum" (32-bit)
@@ -31,9 +33,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl);
static inline __sum16 csum_fold(__wsum csum)
{
u32 sum = (__force u32)csum;
- sum = (sum & 0xffff) + (sum >> 16);
- sum = (sum & 0xffff) + (sum >> 16);
- return (__force __sum16)~sum;
+ return (__force __sum16)((~sum - ror32(sum, 16)) >> 16);
}
#endif


--
2.43.0


2024-01-08 23:58:07

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v15 2/5] riscv: Add static key for misaligned accesses

Support static branches depending on the value of misaligned accesses.
This will be used by a later patch in the series. At any point in time,
this static branch will only be enabled if all online CPUs are
considered "fast".

Signed-off-by: Charlie Jenkins <[email protected]>
Reviewed-by: Evan Green <[email protected]>
---
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/kernel/cpufeature.c | 90 +++++++++++++++++++++++++++++++++++--
2 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index a418c3112cd6..7b129e5e2f07 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -133,4 +133,6 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
return __riscv_isa_extension_available(hart_isa[cpu].isa, ext);
}

+DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+
#endif
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index b3785ffc1570..b62baeb504d8 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -8,8 +8,10 @@

#include <linux/acpi.h>
#include <linux/bitmap.h>
+#include <linux/cpu.h>
#include <linux/cpuhotplug.h>
#include <linux/ctype.h>
+#include <linux/jump_label.h>
#include <linux/log2.h>
#include <linux/memory.h>
#include <linux/module.h>
@@ -44,6 +46,8 @@ struct riscv_isainfo hart_isa[NR_CPUS];
/* Performance information */
DEFINE_PER_CPU(long, misaligned_access_speed);

+static cpumask_t fast_misaligned_access;
+
/**
* riscv_isa_extension_base() - Get base extension word
*
@@ -643,6 +647,16 @@ static int check_unaligned_access(void *param)
(speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");

per_cpu(misaligned_access_speed, cpu) = speed;
+
+ /*
+ * Set the value of fast_misaligned_access of a CPU. These operations
+ * are atomic to avoid race conditions.
+ */
+ if (speed == RISCV_HWPROBE_MISALIGNED_FAST)
+ cpumask_set_cpu(cpu, &fast_misaligned_access);
+ else
+ cpumask_clear_cpu(cpu, &fast_misaligned_access);
+
return 0;
}

@@ -655,13 +669,69 @@ static void check_unaligned_access_nonboot_cpu(void *param)
check_unaligned_access(pages[cpu]);
}

+DEFINE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+
+static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
+{
+ if (cpumask_weight(mask) == weight)
+ static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
+ else
+ static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
+}
+
+static void set_unaligned_access_static_branches_except_cpu(int cpu)
+{
+ /*
+ * Same as set_unaligned_access_static_branches, except excludes the
+ * given CPU from the result. When a CPU is hotplugged into an offline
+ * state, this function is called before the CPU is set to offline in
+ * the cpumask, and thus the CPU needs to be explicitly excluded.
+ */
+
+ cpumask_t fast_except_me;
+
+ cpumask_and(&fast_except_me, &fast_misaligned_access, cpu_online_mask);
+ cpumask_clear_cpu(cpu, &fast_except_me);
+
+ modify_unaligned_access_branches(&fast_except_me, num_online_cpus() - 1);
+}
+
+static void set_unaligned_access_static_branches(void)
+{
+ /*
+ * This will be called after check_unaligned_access_all_cpus so the
+ * result of unaligned access speed for all CPUs will be available.
+ *
+ * To avoid the number of online cpus changing between reading
+ * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be
+ * held before calling this function.
+ */
+
+ cpumask_t fast_and_online;
+
+ cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask);
+
+ modify_unaligned_access_branches(&fast_and_online, num_online_cpus());
+}
+
+static int lock_and_set_unaligned_access_static_branch(void)
+{
+ cpus_read_lock();
+ set_unaligned_access_static_branches();
+ cpus_read_unlock();
+
+ return 0;
+}
+
+arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
+
static int riscv_online_cpu(unsigned int cpu)
{
static struct page *buf;

/* We are already set since the last check */
if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
- return 0;
+ goto exit;

buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
if (!buf) {
@@ -671,6 +741,17 @@ static int riscv_online_cpu(unsigned int cpu)

check_unaligned_access(buf);
__free_pages(buf, MISALIGNED_BUFFER_ORDER);
+
+exit:
+ set_unaligned_access_static_branches();
+
+ return 0;
+}
+
+static int riscv_offline_cpu(unsigned int cpu)
+{
+ set_unaligned_access_static_branches_except_cpu(cpu);
+
return 0;
}

@@ -705,9 +786,12 @@ static int check_unaligned_access_all_cpus(void)
/* Check core 0. */
smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);

- /* Setup hotplug callback for any new CPUs that come online. */
+ /*
+ * Setup hotplug callbacks for any new CPUs that come online or go
+ * offline.
+ */
cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
- riscv_online_cpu, NULL);
+ riscv_online_cpu, riscv_offline_cpu);

out:
unaligned_emulation_finish();

--
2.43.0


2024-01-08 23:58:20

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v15 3/5] riscv: Add checksum header

Provide checksum algorithms that have been designed to leverage riscv
instructions such as rotate. In 64-bit, can take advantage of the larger
register to avoid some overflow checking.

Signed-off-by: Charlie Jenkins <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Reviewed-by: Xiao Wang <[email protected]>
---
arch/riscv/include/asm/checksum.h | 82 +++++++++++++++++++++++++++++++++++++++
1 file changed, 82 insertions(+)

diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
new file mode 100644
index 000000000000..5a810126aac7
--- /dev/null
+++ b/arch/riscv/include/asm/checksum.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Checksum routines
+ *
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#ifndef __ASM_RISCV_CHECKSUM_H
+#define __ASM_RISCV_CHECKSUM_H
+
+#include <linux/in6.h>
+#include <linux/uaccess.h>
+
+#define ip_fast_csum ip_fast_csum
+
+/* Define riscv versions of functions before importing asm-generic/checksum.h */
+#include <asm-generic/checksum.h>
+
+/**
+ * Quickly compute an IP checksum with the assumption that IPv4 headers will
+ * always be in multiples of 32-bits, and have an ihl of at least 5.
+ *
+ * @ihl: the number of 32 bit segments and must be greater than or equal to 5.
+ * @iph: assumed to be word aligned given that NET_IP_ALIGN is set to 2 on
+ * riscv, defining IP headers to be aligned.
+ */
+static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
+{
+ unsigned long csum = 0;
+ int pos = 0;
+
+ do {
+ csum += ((const unsigned int *)iph)[pos];
+ if (IS_ENABLED(CONFIG_32BIT))
+ csum += csum < ((const unsigned int *)iph)[pos];
+ } while (++pos < ihl);
+
+ /*
+ * ZBB only saves three instructions on 32-bit and five on 64-bit so not
+ * worth checking if supported without Alternatives.
+ */
+ if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+ IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+ unsigned long fold_temp;
+
+ asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ RISCV_ISA_EXT_ZBB, 1)
+ :
+ :
+ :
+ : no_zbb);
+
+ if (IS_ENABLED(CONFIG_32BIT)) {
+ asm(".option push \n\
+ .option arch,+zbb \n\
+ not %[fold_temp], %[csum] \n\
+ rori %[csum], %[csum], 16 \n\
+ sub %[csum], %[fold_temp], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
+ } else {
+ asm(".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 32 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ srli %[csum], %[csum], 32 \n\
+ not %[fold_temp], %[csum] \n\
+ roriw %[csum], %[csum], 16 \n\
+ subw %[csum], %[fold_temp], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
+ }
+ return (__force __sum16)(csum >> 16);
+ }
+no_zbb:
+#ifndef CONFIG_32BIT
+ csum += ror64(csum, 32);
+ csum >>= 32;
+#endif
+ return csum_fold((__force __wsum)csum);
+}
+
+#endif /* __ASM_RISCV_CHECKSUM_H */

--
2.43.0


2024-01-08 23:58:52

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v15 4/5] riscv: Add checksum library

Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
will load from the buffer in groups of 32 bits, and when compiled for
64-bit will load in groups of 64 bits.

Additionally provide riscv optimized implementation of csum_ipv6_magic.

Signed-off-by: Charlie Jenkins <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Reviewed-by: Xiao Wang <[email protected]>
---
arch/riscv/include/asm/checksum.h | 11 ++
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/csum.c | 326 ++++++++++++++++++++++++++++++++++++++
3 files changed, 338 insertions(+)

diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
index 5a810126aac7..a5b60b54b101 100644
--- a/arch/riscv/include/asm/checksum.h
+++ b/arch/riscv/include/asm/checksum.h
@@ -12,6 +12,17 @@

#define ip_fast_csum ip_fast_csum

+extern unsigned int do_csum(const unsigned char *buff, int len);
+#define do_csum do_csum
+
+/* Default version is sufficient for 32 bit */
+#ifndef CONFIG_32BIT
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ __u32 len, __u8 proto, __wsum sum);
+#endif
+
/* Define riscv versions of functions before importing asm-generic/checksum.h */
#include <asm-generic/checksum.h>

diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 26cb2502ecf8..2aa1a4ad361f 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -6,6 +6,7 @@ lib-y += memmove.o
lib-y += strcmp.o
lib-y += strlen.o
lib-y += strncmp.o
+lib-y += csum.o
lib-$(CONFIG_MMU) += uaccess.o
lib-$(CONFIG_64BIT) += tishift.o
lib-$(CONFIG_RISCV_ISA_ZICBOZ) += clear_page.o
diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
new file mode 100644
index 000000000000..06ce8e7250d9
--- /dev/null
+++ b/arch/riscv/lib/csum.c
@@ -0,0 +1,326 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Checksum library
+ *
+ * Influenced by arch/arm64/lib/csum.c
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/jump_label.h>
+#include <linux/kasan-checks.h>
+#include <linux/kernel.h>
+
+#include <asm/cpufeature.h>
+
+#include <net/checksum.h>
+
+/* Default version is sufficient for 32 bit */
+#ifndef CONFIG_32BIT
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ __u32 len, __u8 proto, __wsum csum)
+{
+ unsigned int ulen, uproto;
+ unsigned long sum = (__force unsigned long)csum;
+
+ sum += (__force unsigned long)saddr->s6_addr32[0];
+ sum += (__force unsigned long)saddr->s6_addr32[1];
+ sum += (__force unsigned long)saddr->s6_addr32[2];
+ sum += (__force unsigned long)saddr->s6_addr32[3];
+
+ sum += (__force unsigned long)daddr->s6_addr32[0];
+ sum += (__force unsigned long)daddr->s6_addr32[1];
+ sum += (__force unsigned long)daddr->s6_addr32[2];
+ sum += (__force unsigned long)daddr->s6_addr32[3];
+
+ ulen = (__force unsigned int)htonl((unsigned int)len);
+ sum += ulen;
+
+ uproto = (__force unsigned int)htonl(proto);
+ sum += uproto;
+
+ /*
+ * Zbb support saves 4 instructions, so not worth checking without
+ * alternatives if supported
+ */
+ if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+ IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+ unsigned long fold_temp;
+
+ /*
+ * Zbb is likely available when the kernel is compiled with Zbb
+ * support, so nop when Zbb is available and jump when Zbb is
+ * not available.
+ */
+ asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ RISCV_ISA_EXT_ZBB, 1)
+ :
+ :
+ :
+ : no_zbb);
+ asm(".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[sum], 32 \n\
+ add %[sum], %[fold_temp], %[sum] \n\
+ srli %[sum], %[sum], 32 \n\
+ not %[fold_temp], %[sum] \n\
+ roriw %[sum], %[sum], 16 \n\
+ subw %[sum], %[fold_temp], %[sum] \n\
+ .option pop"
+ : [sum] "+r" (sum), [fold_temp] "=&r" (fold_temp));
+ return (__force __sum16)(sum >> 16);
+ }
+no_zbb:
+ sum += ror64(sum, 32);
+ sum >>= 32;
+ return csum_fold((__force __wsum)sum);
+}
+EXPORT_SYMBOL(csum_ipv6_magic);
+#endif /* !CONFIG_32BIT */
+
+#ifdef CONFIG_32BIT
+#define OFFSET_MASK 3
+#elif CONFIG_64BIT
+#define OFFSET_MASK 7
+#endif
+
+static inline __no_sanitize_address unsigned long
+do_csum_common(const unsigned long *ptr, const unsigned long *end,
+ unsigned long data)
+{
+ unsigned int shift;
+ unsigned long csum = 0, carry = 0;
+
+ /*
+ * Do 32-bit reads on RV32 and 64-bit reads otherwise. This should be
+ * faster than doing 32-bit reads on architectures that support larger
+ * reads.
+ */
+ while (ptr < end) {
+ csum += data;
+ carry += csum < data;
+ data = *(ptr++);
+ }
+
+ /*
+ * Perform alignment (and over-read) bytes on the tail if any bytes
+ * leftover.
+ */
+ shift = ((long)ptr - (long)end) * 8;
+#ifdef __LITTLE_ENDIAN
+ data = (data << shift) >> shift;
+#else
+ data = (data >> shift) << shift;
+#endif
+ csum += data;
+ carry += csum < data;
+ csum += carry;
+ csum += csum < carry;
+
+ return csum;
+}
+
+/*
+ * Algorithm accounts for buff being misaligned.
+ * If buff is not aligned, will over-read bytes but not use the bytes that it
+ * shouldn't. The same thing will occur on the tail-end of the read.
+ */
+static inline __no_sanitize_address unsigned int
+do_csum_with_alignment(const unsigned char *buff, int len)
+{
+ unsigned int offset, shift;
+ unsigned long csum, data;
+ const unsigned long *ptr, *end;
+
+ /*
+ * Align address to closest word (double word on rv64) that comes before
+ * buff. This should always be in the same page and cache line.
+ * Directly call KASAN with the alignment we will be using.
+ */
+ offset = (unsigned long)buff & OFFSET_MASK;
+ kasan_check_read(buff, len);
+ ptr = (const unsigned long *)(buff - offset);
+
+ /*
+ * Clear the most significant bytes that were over-read if buff was not
+ * aligned.
+ */
+ shift = offset * 8;
+ data = *(ptr++);
+#ifdef __LITTLE_ENDIAN
+ data = (data >> shift) << shift;
+#else
+ data = (data << shift) >> shift;
+#endif
+ end = (const unsigned long *)(buff + len);
+ csum = do_csum_common(ptr, end, data);
+
+ /*
+ * Zbb support saves 6 instructions, so not worth checking without
+ * alternatives if supported
+ */
+ if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+ IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+ unsigned long fold_temp;
+
+ /*
+ * Zbb is likely available when the kernel is compiled with Zbb
+ * support, so nop when Zbb is available and jump when Zbb is
+ * not available.
+ */
+ asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ RISCV_ISA_EXT_ZBB, 1)
+ :
+ :
+ :
+ : no_zbb);
+
+#ifdef CONFIG_32BIT
+ asm_volatile_goto(".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 16 \n\
+ andi %[offset], %[offset], 1 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ beq %[offset], zero, %l[end] \n\
+ rev8 %[csum], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+ : [offset] "r" (offset)
+ :
+ : end);
+
+ return (unsigned short)csum;
+#else /* !CONFIG_32BIT */
+ asm_volatile_goto(".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 32 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ srli %[csum], %[csum], 32 \n\
+ roriw %[fold_temp], %[csum], 16 \n\
+ addw %[csum], %[fold_temp], %[csum] \n\
+ andi %[offset], %[offset], 1 \n\
+ beq %[offset], zero, %l[end] \n\
+ rev8 %[csum], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+ : [offset] "r" (offset)
+ :
+ : end);
+
+ return (csum << 16) >> 48;
+#endif /* !CONFIG_32BIT */
+end:
+ return csum >> 16;
+ }
+no_zbb:
+#ifndef CONFIG_32BIT
+ csum += ror64(csum, 32);
+ csum >>= 32;
+#endif
+ csum = (u32)csum + ror32((u32)csum, 16);
+ if (offset & 1)
+ return (u16)swab32(csum);
+ return csum >> 16;
+}
+
+/*
+ * Does not perform alignment, should only be used if machine has fast
+ * misaligned accesses, or when buff is known to be aligned.
+ */
+static inline __no_sanitize_address unsigned int
+do_csum_no_alignment(const unsigned char *buff, int len)
+{
+ unsigned long csum, data;
+ const unsigned long *ptr, *end;
+
+ ptr = (const unsigned long *)(buff);
+ data = *(ptr++);
+
+ kasan_check_read(buff, len);
+
+ end = (const unsigned long *)(buff + len);
+ csum = do_csum_common(ptr, end, data);
+
+ /*
+ * Zbb support saves 6 instructions, so not worth checking without
+ * alternatives if supported
+ */
+ if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+ IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+ unsigned long fold_temp;
+
+ /*
+ * Zbb is likely available when the kernel is compiled with Zbb
+ * support, so nop when Zbb is available and jump when Zbb is
+ * not available.
+ */
+ asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ RISCV_ISA_EXT_ZBB, 1)
+ :
+ :
+ :
+ : no_zbb);
+
+#ifdef CONFIG_32BIT
+ asm (".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 16 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+ :
+ : );
+
+#else /* !CONFIG_32BIT */
+ asm (".option push \n\
+ .option arch,+zbb \n\
+ rori %[fold_temp], %[csum], 32 \n\
+ add %[csum], %[fold_temp], %[csum] \n\
+ srli %[csum], %[csum], 32 \n\
+ roriw %[fold_temp], %[csum], 16 \n\
+ addw %[csum], %[fold_temp], %[csum] \n\
+ .option pop"
+ : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+ :
+ : );
+#endif /* !CONFIG_32BIT */
+ return csum >> 16;
+ }
+no_zbb:
+#ifndef CONFIG_32BIT
+ csum += ror64(csum, 32);
+ csum >>= 32;
+#endif
+ csum = (u32)csum + ror32((u32)csum, 16);
+ return csum >> 16;
+}
+
+/*
+ * Perform a checksum on an arbitrary memory address.
+ * Will do a light-weight address alignment if buff is misaligned, unless
+ * cpu supports fast misaligned accesses.
+ */
+unsigned int do_csum(const unsigned char *buff, int len)
+{
+ if (unlikely(len <= 0))
+ return 0;
+
+ /*
+ * Significant performance gains can be seen by not doing alignment
+ * on machines with fast misaligned accesses.
+ *
+ * There is some duplicate code between the "with_alignment" and
+ * "no_alignment" implmentations, but the overlap is too awkward to be
+ * able to fit in one function without introducing multiple static
+ * branches. The largest chunk of overlap was delegated into the
+ * do_csum_common function.
+ */
+ if (static_branch_likely(&fast_misaligned_access_speed_key))
+ return do_csum_no_alignment(buff, len);
+
+ if (((unsigned long)buff & OFFSET_MASK) == 0)
+ return do_csum_no_alignment(buff, len);
+
+ return do_csum_with_alignment(buff, len);
+}

--
2.43.0


2024-01-08 23:59:02

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum

Supplement existing checksum tests with tests for csum_ipv6_magic and
ip_fast_csum.

Signed-off-by: Charlie Jenkins <[email protected]>
---
lib/checksum_kunit.c | 284 ++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 283 insertions(+), 1 deletion(-)

diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 0eed92b77ba3..af3e5ca4e170 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -1,15 +1,21 @@
// SPDX-License-Identifier: GPL-2.0+
/*
- * Test cases csum_partial and csum_fold
+ * Test cases csum_partial, csum_fold, ip_fast_csum, csum_ipv6_magic
*/

#include <kunit/test.h>
#include <asm/checksum.h>
+#include <net/ip6_checksum.h>

#define MAX_LEN 512
#define MAX_ALIGN 64
#define TEST_BUFLEN (MAX_LEN + MAX_ALIGN)

+#define IPv4_MIN_WORDS 5
+#define IPv4_MAX_WORDS 15
+#define NUM_IPv6_TESTS 200
+#define NUM_IP_FAST_CSUM_TESTS 181
+
/* Values for a little endian CPU. Byte swap each half on big endian CPU. */
static const u32 random_init_sum = 0x2847aab;
static const u8 random_buf[] = {
@@ -209,6 +215,237 @@ static const u32 init_sums_no_overflow[] = {
0xffff0000, 0xfffffffb,
};

+static const __sum16 expected_csum_ipv6_magic[] = {
+ 0x18d4, 0x3085, 0x2e4b, 0xd9f4, 0xbdc8, 0x78f, 0x1034, 0x8422, 0x6fc0,
+ 0xd2f6, 0xbeb5, 0x9d3, 0x7e2a, 0x312e, 0x778e, 0xc1bb, 0x7cf2, 0x9d1e,
+ 0xca21, 0xf3ff, 0x7569, 0xb02e, 0xca86, 0x7e76, 0x4539, 0x45e3, 0xf28d,
+ 0xdf81, 0x8fd5, 0x3b5d, 0x8324, 0xf471, 0x83be, 0x1daf, 0x8c46, 0xe682,
+ 0xd1fb, 0x6b2e, 0xe687, 0x2a33, 0x4833, 0x2d67, 0x660f, 0x2e79, 0xd65e,
+ 0x6b62, 0x6672, 0x5dbd, 0x8680, 0xbaa5, 0x2229, 0x2125, 0x2d01, 0x1cc0,
+ 0x6d36, 0x33c0, 0xee36, 0xd832, 0x9820, 0x8a31, 0x53c5, 0x2e2, 0xdb0e,
+ 0x49ed, 0x17a7, 0x77a0, 0xd72e, 0x3d72, 0x7dc8, 0x5b17, 0xf55d, 0xa4d9,
+ 0x1446, 0x5d56, 0x6b2e, 0x69a5, 0xadb6, 0xff2a, 0x92e, 0xe044, 0x3402,
+ 0xbb60, 0xec7f, 0xe7e6, 0x1986, 0x32f4, 0x8f8, 0x5e00, 0x47c6, 0x3059,
+ 0x3969, 0xe957, 0x4388, 0x2854, 0x3334, 0xea71, 0xa6de, 0x33f9, 0x83fc,
+ 0x37b4, 0x5531, 0x3404, 0x1010, 0xed30, 0x610a, 0xc95, 0x9aed, 0x6ff,
+ 0x5136, 0x2741, 0x660e, 0x8b80, 0xf71, 0xa263, 0x88af, 0x7a73, 0x3c37,
+ 0x1908, 0x6db5, 0x2e92, 0x1cd2, 0x70c8, 0xee16, 0xe80, 0xcd55, 0x6e6,
+ 0x6434, 0x127, 0x655d, 0x2ea0, 0xb4f4, 0xdc20, 0x5671, 0xe462, 0xe52b,
+ 0xdb44, 0x3589, 0xc48f, 0xe60b, 0xd2d2, 0x66ad, 0x498, 0x436, 0xb917,
+ 0xf0ca, 0x1a6e, 0x1cb7, 0xbf61, 0x2870, 0xc7e8, 0x5b30, 0xe4a5, 0x168,
+ 0xadfc, 0xd035, 0xe690, 0xe283, 0xfb27, 0xe4ad, 0xb1a5, 0xf2d5, 0xc4b6,
+ 0x8a30, 0xd7d5, 0x7df9, 0x91d5, 0x63ed, 0x2d21, 0x312b, 0xab19, 0xa632,
+ 0x8d2e, 0xef06, 0x57b9, 0xc373, 0xbd1f, 0xa41f, 0x8444, 0x9975, 0x90cb,
+ 0xc49c, 0xe965, 0x4eff, 0x5a, 0xef6d, 0xe81a, 0xe260, 0x853a, 0xff7a,
+ 0x99aa, 0xb06b, 0xee19, 0xcc2c, 0xf34c, 0x7c49, 0xdac3, 0xa71e, 0xc988,
+ 0x3845, 0x1014
+};
+
+static const __sum16 expected_fast_csum[] = {
+ 0xda83, 0x45da, 0x4f46, 0x4e4f, 0x34e, 0xe902, 0xa5e9, 0x87a5, 0x7187,
+ 0x5671, 0xf556, 0x6df5, 0x816d, 0x8f81, 0xbb8f, 0xfbba, 0x5afb, 0xbe5a,
+ 0xedbe, 0xabee, 0x6aac, 0xe6b, 0xea0d, 0x67ea, 0x7e68, 0x8a7e, 0x6f8a,
+ 0x3a70, 0x9f3a, 0xe89e, 0x75e8, 0x7976, 0xfa79, 0x2cfa, 0x3c2c, 0x463c,
+ 0x7146, 0x7a71, 0x547a, 0xfd53, 0x99fc, 0xb699, 0x92b6, 0xdb91, 0xe8da,
+ 0x5fe9, 0x1e60, 0xae1d, 0x39ae, 0xf439, 0xa1f4, 0xdda1, 0xede, 0x790f,
+ 0x579, 0x1206, 0x9012, 0x2490, 0xd224, 0x5cd2, 0xa65d, 0xca7, 0x220d,
+ 0xf922, 0xbf9, 0x920b, 0x1b92, 0x361c, 0x2e36, 0x4d2e, 0x24d, 0x2,
+ 0xcfff, 0x90cf, 0xa591, 0x93a5, 0x7993, 0x9579, 0xc894, 0x50c8, 0x5f50,
+ 0xd55e, 0xcad5, 0xf3c9, 0x8f4, 0x4409, 0x5043, 0x5b50, 0x55b, 0x2205,
+ 0x1e22, 0x801e, 0x3780, 0xe137, 0x7ee0, 0xf67d, 0x3cf6, 0xa53c, 0x2ea5,
+ 0x472e, 0x5147, 0xcf51, 0x1bcf, 0x951c, 0x1e95, 0xc71e, 0xe4c7, 0xc3e4,
+ 0x3dc3, 0xee3d, 0xa4ed, 0xf9a4, 0xcbf8, 0x75cb, 0xb375, 0x50b4, 0x3551,
+ 0xf835, 0x19f8, 0x8c1a, 0x538c, 0xad52, 0xa3ac, 0xb0a3, 0x5cb0, 0x6c5c,
+ 0x5b6c, 0xc05a, 0x92c0, 0x4792, 0xbe47, 0x53be, 0x1554, 0x5715, 0x4b57,
+ 0xe54a, 0x20e5, 0x21, 0xd500, 0xa1d4, 0xa8a1, 0x57a9, 0xca57, 0x5ca,
+ 0x1c06, 0x4f1c, 0xe24e, 0xd9e2, 0xf0d9, 0x4af1, 0x474b, 0x8146, 0xe81,
+ 0xfd0e, 0x84fd, 0x7c85, 0xba7c, 0x17ba, 0x4a17, 0x964a, 0xf595, 0xff5,
+ 0x5310, 0x3253, 0x6432, 0x4263, 0x2242, 0xe121, 0x32e1, 0xf632, 0xc5f5,
+ 0x21c6, 0x7d22, 0x8e7c, 0x418e, 0x5641, 0x3156, 0x7c31, 0x737c, 0x373,
+ 0x2503, 0xc22a, 0x3c2, 0x4a04, 0x8549, 0x5285, 0xa352, 0xe8a3, 0x6fe8,
+ 0x1a6f, 0x211a, 0xe021, 0x38e0, 0x7638, 0xf575, 0x9df5, 0x169e, 0xf116,
+ 0x23f1, 0xcd23, 0xece, 0x660f, 0x4866, 0x6a48, 0x716a, 0xee71, 0xa2ee,
+ 0xb8a2, 0x61b9, 0xa361, 0xf7a2, 0x26f7, 0x1127, 0x6611, 0xe065, 0x36e0,
+ 0x1837, 0x3018, 0x1c30, 0x721b, 0x3e71, 0xe43d, 0x99e4, 0x9e9a, 0xb79d,
+ 0xa9b7, 0xcaa, 0xeb0c, 0x4eb, 0x1305, 0x8813, 0xb687, 0xa9b6, 0xfba9,
+ 0xd7fb, 0xccd8, 0x2ecd, 0x652f, 0xae65, 0x3fae, 0x3a40, 0x563a, 0x7556,
+ 0x2776, 0x1228, 0xef12, 0xf9ee, 0xcef9, 0x56cf, 0xa956, 0x24a9, 0xba24,
+ 0x5fba, 0x665f, 0xf465, 0x8ff4, 0x6d8f, 0x346d, 0x5f34, 0x385f, 0xd137,
+ 0xb8d0, 0xacb8, 0x55ac, 0x7455, 0xe874, 0x89e8, 0xd189, 0xa0d1, 0xb2a0,
+ 0xb8b2, 0x36b8, 0x5636, 0xd355, 0x8d3, 0x1908, 0x2118, 0xc21, 0x990c,
+ 0x8b99, 0x158c, 0x7815, 0x9e78, 0x6f9e, 0x4470, 0x1d44, 0x341d, 0x2634,
+ 0x3f26, 0x793e, 0xc79, 0xcc0b, 0x26cc, 0xd126, 0x1fd1, 0xb41f, 0xb6b4,
+ 0x22b7, 0xa122, 0xa1, 0x7f01, 0x837e, 0x3b83, 0xaf3b, 0x6fae, 0x916f,
+ 0xb490, 0xffb3, 0xceff, 0x50cf, 0x7550, 0x7275, 0x1272, 0x2613, 0xaa26,
+ 0xd5aa, 0x7d5, 0x9607, 0x96, 0xb100, 0xf8b0, 0x4bf8, 0xdd4c, 0xeddd,
+ 0x98ed, 0x2599, 0x9325, 0xeb92, 0x8feb, 0xcc8f, 0x2acd, 0x392b, 0x3b39,
+ 0xcb3b, 0x6acb, 0xd46a, 0xb8d4, 0x6ab8, 0x106a, 0x2f10, 0x892f, 0x789,
+ 0xc806, 0x45c8, 0x7445, 0x3c74, 0x3a3c, 0xcf39, 0xd7ce, 0x58d8, 0x6e58,
+ 0x336e, 0x1034, 0xee10, 0xe9ed, 0xc2e9, 0x3fc2, 0xd53e, 0xd2d4, 0xead2,
+ 0x8fea, 0x2190, 0x1162, 0xbe11, 0x8cbe, 0x6d8c, 0xfb6c, 0x6dfb, 0xd36e,
+ 0x3ad3, 0xf3a, 0x870e, 0xc287, 0x53c3, 0xc54, 0x5b0c, 0x7d5a, 0x797d,
+ 0xec79, 0x5dec, 0x4d5e, 0x184e, 0xd618, 0x60d6, 0xb360, 0x98b3, 0xf298,
+ 0xb1f2, 0x69b1, 0xf969, 0xef9, 0xab0e, 0x21ab, 0xe321, 0x24e3, 0x8224,
+ 0x5481, 0x5954, 0x7a59, 0xff7a, 0x7dff, 0x1a7d, 0xa51a, 0x46a5, 0x6b47,
+ 0xe6b, 0x830e, 0xa083, 0xff9f, 0xd0ff, 0xffd0, 0xe6ff, 0x7de7, 0xc67d,
+ 0xd0c6, 0x61d1, 0x3a62, 0xc3b, 0x150c, 0x1715, 0x4517, 0x5345, 0x3954,
+ 0xdd39, 0xdadd, 0x32db, 0x6a33, 0xd169, 0x86d1, 0xb687, 0x3fb6, 0x883f,
+ 0xa487, 0x39a4, 0x2139, 0xbe20, 0xffbe, 0xedfe, 0x8ded, 0x368e, 0xc335,
+ 0x51c3, 0x9851, 0xf297, 0xd6f2, 0xb9d6, 0x95ba, 0x2096, 0xea1f, 0x76e9,
+ 0x4e76, 0xe04d, 0xd0df, 0x80d0, 0xa280, 0xfca2, 0x75fc, 0xef75, 0x32ef,
+ 0x6833, 0xdf68, 0xc4df, 0x76c4, 0xb77, 0xb10a, 0xbfb1, 0x58bf, 0x5258,
+ 0x4d52, 0x6c4d, 0x7e6c, 0xb67e, 0xccb5, 0x8ccc, 0xbe8c, 0xc8bd, 0x9ac8,
+ 0xa99b, 0x52a9, 0x2f53, 0xc30, 0x3e0c, 0xb83d, 0x83b7, 0x5383, 0x7e53,
+ 0x4f7e, 0xe24e, 0xb3e1, 0x8db3, 0x618e, 0xc861, 0xfcc8, 0x34fc, 0x9b35,
+ 0xaa9b, 0xb1aa, 0x5eb1, 0x395e, 0x8639, 0xd486, 0x8bd4, 0x558b, 0x2156,
+ 0xf721, 0x4ef6, 0x14f, 0x7301, 0xdd72, 0x49de, 0x894a, 0x9889, 0x8898,
+ 0x7788, 0x7b77, 0x637b, 0xb963, 0xabb9, 0x7cab, 0xc87b, 0x21c8, 0xcb21,
+ 0xdfca, 0xbfdf, 0xf2bf, 0x6af2, 0x626b, 0xb261, 0x3cb2, 0xc63c, 0xc9c6,
+ 0xc9c9, 0xb4c9, 0xf9b4, 0x91f9, 0x4091, 0x3a40, 0xcc39, 0xd1cb, 0x7ed1,
+ 0x537f, 0x6753, 0xa167, 0xba49, 0x88ba, 0x7789, 0x3877, 0xf037, 0xd3ef,
+ 0xb5d4, 0x55b6, 0xa555, 0xeca4, 0xa1ec, 0xb6a2, 0x7b7, 0x9507, 0xfd94,
+ 0x82fd, 0x5c83, 0x765c, 0x9676, 0x3f97, 0xda3f, 0x6fda, 0x646f, 0x3064,
+ 0x5e30, 0x655e, 0x6465, 0xcb64, 0xcdca, 0x4ccd, 0x3f4c, 0x243f, 0x6f24,
+ 0x656f, 0x6065, 0x3560, 0x3b36, 0xac3b, 0x4aac, 0x714a, 0x7e71, 0xda7e,
+ 0x7fda, 0xda7f, 0x6fda, 0xff6f, 0xc6ff, 0xedc6, 0xd4ed, 0x70d5, 0xeb70,
+ 0xa3eb, 0x80a3, 0xca80, 0x3fcb, 0x2540, 0xf825, 0x7ef8, 0xf87e, 0x73f8,
+ 0xb474, 0xb4b4, 0x92b5, 0x9293, 0x93, 0x3500, 0x7134, 0x9071, 0xfa8f,
+ 0x51fa, 0x1452, 0xba13, 0x7ab9, 0x957a, 0x8a95, 0x6e8a, 0x6d6e, 0x7c6d,
+ 0x447c, 0x9744, 0x4597, 0x8945, 0xef88, 0x8fee, 0x3190, 0x4831, 0x8447,
+ 0xa183, 0x1da1, 0xd41d, 0x2dd4, 0x4f2e, 0xc94e, 0xcbc9, 0xc9cb, 0x9ec9,
+ 0x319e, 0xd531, 0x20d5, 0x4021, 0xb23f, 0x29b2, 0xd828, 0xecd8, 0x5ded,
+ 0xfc5d, 0x4dfc, 0xd24d, 0x6bd2, 0x5f6b, 0xb35e, 0x7fb3, 0xee7e, 0x56ee,
+ 0xa657, 0x68a6, 0x8768, 0x7787, 0xb077, 0x4cb1, 0x764c, 0xb175, 0x7b1,
+ 0x3d07, 0x603d, 0x3560, 0x3e35, 0xb03d, 0xd6b0, 0xc8d6, 0xd8c8, 0x8bd8,
+ 0x3e8c, 0x303f, 0xd530, 0xf1d4, 0x42f1, 0xca42, 0xddca, 0x41dd, 0x3141,
+ 0x132, 0xe901, 0x8e9, 0xbe09, 0xe0bd, 0x2ce0, 0x862d, 0x3986, 0x9139,
+ 0x6d91, 0x6a6d, 0x8d6a, 0x1b8d, 0xac1b, 0xedab, 0x54ed, 0xc054, 0xcebf,
+ 0xc1ce, 0x5c2, 0x3805, 0x6038, 0x5960, 0xd359, 0xdd3, 0xbe0d, 0xafbd,
+ 0x6daf, 0x206d, 0x2c20, 0x862c, 0x8e86, 0xec8d, 0xa2ec, 0xa3a2, 0x51a3,
+ 0x8051, 0xfd7f, 0x91fd, 0xa292, 0xaf14, 0xeeae, 0x59ef, 0x535a, 0x8653,
+ 0x3986, 0x9539, 0xb895, 0xa0b8, 0x26a0, 0x2227, 0xc022, 0x77c0, 0xad77,
+ 0x46ad, 0xaa46, 0x60aa, 0x8560, 0x4785, 0xd747, 0x45d7, 0x2346, 0x5f23,
+ 0x25f, 0x1d02, 0x71d, 0x8206, 0xc82, 0x180c, 0x3018, 0x4b30, 0x4b,
+ 0x3001, 0x1230, 0x2d12, 0x8c2d, 0x148d, 0x4015, 0x5f3f, 0x3d5f, 0x6b3d,
+ 0x396b, 0x473a, 0xf746, 0x44f7, 0x8945, 0x3489, 0xcb34, 0x84ca, 0xd984,
+ 0xf0d9, 0xbcf0, 0x63bd, 0x3264, 0xf332, 0x45f3, 0x7346, 0x5673, 0xb056,
+ 0xd3b0, 0x4ad4, 0x184b, 0x7d18, 0x6c7d, 0xbb6c, 0xfeba, 0xe0fe, 0x10e1,
+ 0x5410, 0x2954, 0x9f28, 0x3a9f, 0x5a3a, 0xdb59, 0xbdc, 0xb40b, 0x1ab4,
+ 0x131b, 0x5d12, 0x6d5c, 0xe16c, 0xb0e0, 0x89b0, 0xba88, 0xbb, 0x3c01,
+ 0xe13b, 0x6fe1, 0x446f, 0xa344, 0x81a3, 0xfe81, 0xc7fd, 0x38c8, 0xb38,
+ 0x1a0b, 0x6d19, 0xf36c, 0x47f3, 0x6d48, 0xb76d, 0xd3b7, 0xd8d2, 0x52d9,
+ 0x4b53, 0xa54a, 0x34a5, 0xc534, 0x9bc4, 0xed9b, 0xbeed, 0x3ebe, 0x233e,
+ 0x9f22, 0x4a9f, 0x774b, 0x4577, 0xa545, 0x64a5, 0xb65, 0x870b, 0x487,
+ 0x9204, 0x5f91, 0xd55f, 0x35d5, 0x1a35, 0x71a, 0x7a07, 0x4e7a, 0xfc4e,
+ 0x1efc, 0x481f, 0x7448, 0xde74, 0xa7dd, 0x1ea7, 0xaa1e, 0xcfaa, 0xfbcf,
+ 0xedfb, 0x6eee, 0x386f, 0x4538, 0x6e45, 0xd96d, 0x11d9, 0x7912, 0x4b79,
+ 0x494b, 0x6049, 0xac5f, 0x65ac, 0x1366, 0x5913, 0xe458, 0x7ae4, 0x387a,
+ 0x3c38, 0xb03c, 0x76b0, 0x9376, 0xe193, 0x42e1, 0x7742, 0x6476, 0x3564,
+ 0x3c35, 0x6a3c, 0xcc69, 0x94cc, 0x5d95, 0xe5e, 0xee0d, 0x4ced, 0xce4c,
+ 0x52ce, 0xaa52, 0xdaaa, 0xe4da, 0x1de5, 0x4530, 0x5445, 0x3954, 0xb639,
+ 0x81b6, 0x7381, 0x1574, 0xc215, 0x10c2, 0x3f10, 0x6b3f, 0xe76b, 0x7be7,
+ 0xbc7b, 0xf7bb, 0x41f7, 0xcc41, 0x38cc, 0x4239, 0xa942, 0x4a9, 0xc504,
+ 0x7cc4, 0x437c, 0x6743, 0xea67, 0x8dea, 0xe88d, 0xd8e8, 0xdcd8, 0x17dd,
+ 0x5718, 0x958, 0xa609, 0x41a5, 0x5842, 0x159, 0x9f01, 0x269f, 0x5a26,
+ 0x405a, 0xc340, 0xb4c3, 0xd4b4, 0xf4d3, 0xf1f4, 0x39f2, 0xe439, 0x67e4,
+ 0x4168, 0xa441, 0xdda3, 0xdedd, 0x9df, 0xab0a, 0xa5ab, 0x9a6, 0xba09,
+ 0x9ab9, 0xad9a, 0x5ae, 0xe205, 0xece2, 0xecec, 0x14ed, 0xd614, 0x6bd5,
+ 0x916c, 0x3391, 0x6f33, 0x206f, 0x8020, 0x780, 0x7207, 0x2472, 0x8a23,
+ 0xb689, 0x3ab6, 0xf739, 0x97f6, 0xb097, 0xa4b0, 0xe6a4, 0x88e6, 0x2789,
+ 0xb28, 0x350b, 0x1f35, 0x431e, 0x1043, 0xc30f, 0x79c3, 0x379, 0x5703,
+ 0x3256, 0x4732, 0x7247, 0x9d72, 0x489d, 0xd348, 0xa4d3, 0x7ca4, 0xbf7b,
+ 0x45c0, 0x7b45, 0x337b, 0x4034, 0x843f, 0xd083, 0x35d0, 0x6335, 0x4d63,
+ 0xe14c, 0xcce0, 0xfecc, 0x35ff, 0x5636, 0xf856, 0xeef8, 0x2def, 0xfc2d,
+ 0x4fc, 0x6e04, 0xb66d, 0x78b6, 0xbb78, 0x3dbb, 0x9a3d, 0x839a, 0x9283,
+ 0x593, 0xd504, 0x23d5, 0x5424, 0xd054, 0x61d0, 0xdb61, 0x17db, 0x1f18,
+ 0x381f, 0x9e37, 0x679e, 0x1d68, 0x381d, 0x8038, 0x917f, 0x491, 0xbb04,
+ 0x23bb, 0x4124, 0xd41, 0xa30c, 0x8ba3, 0x8b8b, 0xc68b, 0xd2c6, 0xebd2,
+ 0x93eb, 0xbd93, 0x99bd, 0x1a99, 0xea19, 0x58ea, 0xcf58, 0x73cf, 0x1073,
+ 0x9e10, 0x139e, 0xea13, 0xcde9, 0x3ecd, 0x883f, 0xf89, 0x180f, 0x2a18,
+ 0x212a, 0xce20, 0x73ce, 0xf373, 0x60f3, 0xad60, 0x4093, 0x8e40, 0xb98e,
+ 0xbfb9, 0xf1bf, 0x8bf1, 0x5e8c, 0xe95e, 0x14e9, 0x4e14, 0x1c4e, 0x7f1c,
+ 0xe77e, 0x6fe7, 0xf26f, 0x13f2, 0x8b13, 0xda8a, 0x5fda, 0xea5f, 0x4eea,
+ 0xa84f, 0x88a8, 0x1f88, 0x2820, 0x9728, 0x5a97, 0x3f5b, 0xb23f, 0x70b2,
+ 0x2c70, 0x232d, 0xf623, 0x4f6, 0x905, 0x7509, 0xd675, 0x28d7, 0x9428,
+ 0x3794, 0xf036, 0x2bf0, 0xba2c, 0xedb9, 0xd7ed, 0x59d8, 0xed59, 0x4ed,
+ 0xe304, 0x18e3, 0x5c19, 0x3d5c, 0x753d, 0x6d75, 0x956d, 0x7f95, 0xc47f,
+ 0x83c4, 0xa84, 0x2e0a, 0x5f2e, 0xb95f, 0x77b9, 0x6d78, 0xf46d, 0x1bf4,
+ 0xed1b, 0xd6ed, 0xe0d6, 0x5e1, 0x3905, 0x5638, 0xa355, 0x99a2, 0xbe99,
+ 0xb4bd, 0x85b4, 0x2e86, 0x542e, 0x6654, 0xd765, 0x73d7, 0x3a74, 0x383a,
+ 0x2638, 0x7826, 0x7677, 0x9a76, 0x7e99, 0x2e7e, 0xea2d, 0xa6ea, 0x8a7,
+ 0x109, 0x3300, 0xad32, 0x5fad, 0x465f, 0x2f46, 0xc62f, 0xd4c5, 0xad5,
+ 0xcb0a, 0x4cb, 0xb004, 0x7baf, 0xe47b, 0x92e4, 0x8e92, 0x638e, 0x1763,
+ 0xc17, 0xf20b, 0x1ff2, 0x8920, 0x5889, 0xcb58, 0xf8cb, 0xcaf8, 0x84cb,
+ 0x9f84, 0x8a9f, 0x918a, 0x4991, 0x8249, 0xff81, 0x46ff, 0x5046, 0x5f50,
+ 0x725f, 0xf772, 0x8ef7, 0xe08f, 0xc1e0, 0x1fc2, 0x9e1f, 0x8b9d, 0x108b,
+ 0x411, 0x2b04, 0xb02a, 0x1fb0, 0x1020, 0x7a0f, 0x587a, 0x8958, 0xb188,
+ 0xb1b1, 0x49b2, 0xb949, 0x7ab9, 0x917a, 0xfc91, 0xe6fc, 0x47e7, 0xbc47,
+ 0x8fbb, 0xea8e, 0x34ea, 0x2635, 0x1726, 0x9616, 0xc196, 0xa6c1, 0xf3a6,
+ 0x11f3, 0x4811, 0x3e48, 0xeb3e, 0xf7ea, 0x1bf8, 0xdb1c, 0x8adb, 0xe18a,
+ 0x42e1, 0x9d42, 0x5d9c, 0x6e5d, 0x286e, 0x4928, 0x9a49, 0xb09c, 0xa6b0,
+ 0x2a7, 0xe702, 0xf5e6, 0x9af5, 0xf9b, 0x810f, 0x8080, 0x180, 0x1702,
+ 0x5117, 0xa650, 0x11a6, 0x1011, 0x550f, 0xd554, 0xbdd5, 0x6bbe, 0xc66b,
+ 0xfc7, 0x5510, 0x5555, 0x7655, 0x177, 0x2b02, 0x6f2a, 0xb70, 0x9f0b,
+ 0xcf9e, 0xf3cf, 0x3ff4, 0xcb40, 0x8ecb, 0x768e, 0x5277, 0x8652, 0x9186,
+ 0x9991, 0x5099, 0xd350, 0x93d3, 0x6d94, 0xe6d, 0x530e, 0x3153, 0xa531,
+ 0x64a5, 0x7964, 0x7c79, 0x467c, 0x1746, 0x3017, 0x3730, 0x538, 0x5,
+ 0x1e00, 0x5b1e, 0x955a, 0xae95, 0x3eaf, 0xff3e, 0xf8ff, 0xb2f9, 0xa1b3,
+ 0xb2a1, 0x5b2, 0xad05, 0x7cac, 0x2d7c, 0xd32c, 0x80d2, 0x7280, 0x8d72,
+ 0x1b8e, 0x831b, 0xac82, 0xfdac, 0xa7fd, 0x15a8, 0xd614, 0xe0d5, 0x7be0,
+ 0xb37b, 0x61b3, 0x9661, 0x9d95, 0xc79d, 0x83c7, 0xd883, 0xead7, 0xceb,
+ 0xf60c, 0xa9f5, 0x19a9, 0xa019, 0x8f9f, 0xd48f, 0x3ad5, 0x853a, 0x985,
+ 0x5309, 0x6f52, 0x1370, 0x6e13, 0xa96d, 0x98a9, 0x5198, 0x9f51, 0xb69f,
+ 0xa1b6, 0x2ea1, 0x672e, 0x2067, 0x6520, 0xaf65, 0x6eaf, 0x7e6f, 0xee7e,
+ 0x17ef, 0xa917, 0xcea8, 0x9ace, 0xff99, 0x5dff, 0xdf5d, 0x38df, 0xa39,
+ 0x1c0b, 0xe01b, 0x46e0, 0xcb46, 0x90cb, 0xba90, 0x4bb, 0x9104, 0x9d90,
+ 0xc89c, 0xf6c8, 0x6cf6, 0x886c, 0x1789, 0xbd17, 0x70bc, 0x7e71, 0x17e,
+ 0x1f01, 0xa01f, 0xbaa0, 0x14bb, 0xfc14, 0x7afb, 0xa07a, 0x3da0, 0xbf3d,
+ 0x48bf, 0x8c48, 0x968b, 0x9d96, 0xfd9d, 0x96fd, 0x9796, 0x6b97, 0xd16b,
+ 0xf4d1, 0x3bf4, 0x253c, 0x9125, 0x6691, 0xc166, 0x34c1, 0x5735, 0x1a57,
+ 0xdc19, 0x77db, 0x8577, 0x4a85, 0x824a, 0x9182, 0x7f91, 0xfd7f, 0xb4c3,
+ 0xb5b4, 0xb3b5, 0x7eb3, 0x617e, 0x4e61, 0xa4f, 0x530a, 0x3f52, 0xa33e,
+ 0x34a3, 0x9234, 0xf091, 0xf4f0, 0x1bf5, 0x311b, 0x9631, 0x6a96, 0x386b,
+ 0x1d39, 0xe91d, 0xe8e9, 0x69e8, 0x426a, 0xee42, 0x89ee, 0x368a, 0x2837,
+ 0x7428, 0x5974, 0x6159, 0x1d62, 0x7b1d, 0xf77a, 0x7bf7, 0x6b7c, 0x696c,
+ 0xf969, 0x4cf9, 0x714c, 0x4e71, 0x6b4e, 0x256c, 0x6e25, 0xe96d, 0x94e9,
+ 0x8f94, 0x3e8f, 0x343e, 0x4634, 0xb646, 0x97b5, 0x8997, 0xe8a, 0x900e,
+ 0x8090, 0xfd80, 0xa0fd, 0x16a1, 0xf416, 0xebf4, 0x95ec, 0x1196, 0x8911,
+ 0x3d89, 0xda3c, 0x9fd9, 0xd79f, 0x4bd7, 0x214c, 0x3021, 0x4f30, 0x994e,
+ 0x5c99, 0x6f5d, 0x326f, 0xab31, 0x6aab, 0xe969, 0x90e9, 0x1190, 0xff10,
+ 0xa2fe, 0xe0a2, 0x66e1, 0x4067, 0x9e3f, 0x2d9e, 0x712d, 0x8170, 0xd180,
+ 0xffd1, 0x25ff, 0x3826, 0x2538, 0x5f24, 0xc45e, 0x1cc4, 0xdf1c, 0x93df,
+ 0xc793, 0x80c7, 0x2380, 0xd223, 0x7ed2, 0xfc7e, 0x22fd, 0x7422, 0x1474,
+ 0xb714, 0x7db6, 0x857d, 0xa85, 0xa60a, 0x88a6, 0x4289, 0x7842, 0xc278,
+ 0xf7c2, 0xcdf7, 0x84cd, 0xae84, 0x8cae, 0xb98c, 0x1aba, 0x4d1a, 0x884c,
+ 0x4688, 0xcc46, 0xd8cb, 0x2bd9, 0xbe2b, 0xa2be, 0x72a2, 0xf772, 0xd2f6,
+ 0x75d2, 0xc075, 0xa3c0, 0x63a3, 0xae63, 0x8fae, 0x2a90, 0x5f2a, 0xef5f,
+ 0x5cef, 0xa05c, 0x89a0, 0x5e89, 0x6b5e, 0x736b, 0x773, 0x9d07, 0xe99c,
+ 0x27ea, 0x2028, 0xc20, 0x980b, 0x4797, 0x2848, 0x9828, 0xc197, 0x48c2,
+ 0x2449, 0x7024, 0x570, 0x3e05, 0xd3e, 0xf60c, 0xbbf5, 0x69bb, 0x3f6a,
+ 0x740, 0xf006, 0xe0ef, 0xbbe0, 0xadbb, 0x56ad, 0xcf56, 0xbfce, 0xa9bf,
+ 0x205b, 0x6920, 0xae69, 0x50ae, 0x2050, 0xf01f, 0x27f0, 0x9427, 0x8993,
+ 0x8689, 0x4087, 0x6e40, 0xb16e, 0xa1b1, 0xe8a1, 0x87e8, 0x6f88, 0xfe6f,
+ 0x4cfe, 0xe94d, 0xd5e9, 0x47d6, 0x3148, 0x5f31, 0xc35f, 0x13c4, 0xa413,
+ 0x5a5, 0x2405, 0xc223, 0x66c2, 0x3667, 0x5e37, 0x5f5e, 0x2f5f, 0x8c2f,
+ 0xe48c, 0xd0e4, 0x4d1, 0xd104, 0xe4d0, 0xcee4, 0xfcf, 0x480f, 0xa447,
+ 0x5ea4, 0xff5e, 0xbefe, 0x8dbe, 0x1d8e, 0x411d, 0x1841, 0x6918, 0x5469,
+ 0x1155, 0xc611, 0xaac6, 0x37ab, 0x2f37, 0xca2e, 0x87ca, 0xbd87, 0xabbd,
+ 0xb3ab, 0xcb4, 0xce0c, 0xfccd, 0xa5fd, 0x72a5, 0xf072, 0x83f0, 0xfe83,
+ 0x97fd, 0xc997, 0xb0c9, 0xadb0, 0xe6ac, 0x88e6, 0x1088, 0xbe10, 0x16be,
+ 0xa916, 0xa3a8, 0x46a3, 0x5447, 0xe953, 0x84e8, 0x2085, 0xa11f, 0xfa1,
+ 0xdd0f, 0xbedc, 0x5abe, 0x805a, 0xc97f, 0x6dc9, 0x826d, 0x4a82, 0x934a,
+ 0x5293, 0xd852, 0xd3d8, 0xadd3, 0xf4ad, 0xf3f4, 0xfcf3, 0xfefc, 0xcafe,
+ 0xb7ca, 0x3cb8, 0xa13c, 0x18a1, 0x1418, 0xea13, 0x91ea, 0xf891, 0x53f8,
+ 0xa254, 0xe9a2, 0x87ea, 0x4188, 0x1c41, 0xdc1b, 0xf5db, 0xcaf5, 0x45ca,
+ 0x6d45, 0x396d, 0xde39, 0x90dd, 0x1e91, 0x1e, 0x7b00, 0x6a7b, 0xa46a,
+ 0xc9a3, 0x9bc9, 0x389b, 0x1139, 0x5211, 0x1f52, 0xeb1f, 0xabeb, 0x48ab,
+ 0x9348, 0xb392, 0x17b3, 0x1618, 0x5b16, 0x175b, 0xdc17, 0xdedb, 0x1cdf,
+ 0xeb1c, 0xd1ea, 0x4ad2, 0xd4b, 0xc20c, 0x24c2, 0x7b25, 0x137b, 0x8b13,
+ 0x618b, 0xa061, 0xff9f, 0xfffe, 0x72ff, 0xf572, 0xe2f5, 0xcfe2, 0xd2cf,
+ 0x75d3, 0x6a76, 0xc469, 0x1ec4, 0xfc1d, 0x59fb, 0x455a, 0x7a45, 0xa479,
+ 0xb7a4
+};
+
static u8 tmp_buf[TEST_BUFLEN];

#define full_csum(buff, len, sum) csum_fold(csum_partial(buff, len, sum))
@@ -338,10 +575,55 @@ static void test_csum_no_carry_inputs(struct kunit *test)
}
}

+static void test_ip_fast_csum(struct kunit *test)
+{
+ __sum16 csum_result, expected;
+
+ for (int len = IPv4_MIN_WORDS; len < IPv4_MAX_WORDS; len++) {
+ for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index++) {
+ csum_result = ip_fast_csum(random_buf + index, len);
+ expected =
+ expected_fast_csum[(len - IPv4_MIN_WORDS) *
+ NUM_IP_FAST_CSUM_TESTS +
+ index];
+ CHECK_EQ(expected, csum_result);
+ }
+ }
+}
+
+static void test_csum_ipv6_magic(struct kunit *test)
+{
+ const struct in6_addr *saddr;
+ const struct in6_addr *daddr;
+ unsigned int len;
+ unsigned char proto;
+ unsigned int csum;
+
+ const int daddr_offset = sizeof(struct in6_addr);
+ const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr);
+ const int proto_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
+ sizeof(int);
+ const int csum_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
+ sizeof(int) + sizeof(char);
+
+ for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+ saddr = (const struct in6_addr *)(random_buf + i);
+ daddr = (const struct in6_addr *)(random_buf + i +
+ daddr_offset);
+ len = *(unsigned int *)(random_buf + i + len_offset);
+ proto = *(random_buf + i + proto_offset);
+ csum = *(unsigned int *)(random_buf + i + csum_offset);
+ CHECK_EQ(expected_csum_ipv6_magic[i],
+ csum_ipv6_magic(saddr, daddr, len, proto, csum));
+ }
+}
+
static struct kunit_case __refdata checksum_test_cases[] = {
KUNIT_CASE(test_csum_fixed_random_inputs),
KUNIT_CASE(test_csum_all_carry_inputs),
KUNIT_CASE(test_csum_no_carry_inputs),
+ KUNIT_CASE(test_ip_fast_csum),
+ KUNIT_CASE(test_csum_ipv6_magic),
{}
};


--
2.43.0


Subject: Re: [PATCH v15 0/5] riscv: Add fine-tuned checksum functions

Hello:

This series was applied to riscv/linux.git (fixes)
by Palmer Dabbelt <[email protected]>:

On Mon, 08 Jan 2024 15:57:01 -0800 you wrote:
> Each architecture generally implements fine-tuned checksum functions to
> leverage the instruction set. This patch adds the main checksum
> functions that are used in networking. Tested on QEMU, this series
> allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.
>
> This patch takes heavy use of the Zbb extension using alternatives
> patching.
>
> [...]

Here is the summary with links:
- [v15,1/5] asm-generic: Improve csum_fold
https://git.kernel.org/riscv/c/1e7196fa5b03
- [v15,2/5] riscv: Add static key for misaligned accesses
https://git.kernel.org/riscv/c/2ce5729fce8f
- [v15,3/5] riscv: Add checksum header
https://git.kernel.org/riscv/c/e11e367e9fe5
- [v15,4/5] riscv: Add checksum library
https://git.kernel.org/riscv/c/a04c192eabfb
- [v15,5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
https://git.kernel.org/riscv/c/6f4c45cbcb00

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



2024-01-22 17:34:20

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum

Hi,

On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
> Supplement existing checksum tests with tests for csum_ipv6_magic and
> ip_fast_csum.
>
> Signed-off-by: Charlie Jenkins <[email protected]>
> ---

With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.

[ 1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
[ 1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G N 6.8.0-rc1 #1
[ 1.839948] Hardware name: Generic DT based system
[ 1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
[ 1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
[ 1.840493] pc : [<21212f34>] lr : [<21117fd5>] psr: 0100020b
[ 1.840586] sp : 2180bebc ip : 46c7f0d2 fp : 21275b38
[ 1.840664] r10: 21276b60 r9 : 21275b28 r8 : 21465cfc
[ 1.840751] r7 : 00003085 r6 : 21275b4e r5 : 2138702c r4 : 00000001
[ 1.840847] r3 : 2c000000 r2 : 1ac7f0d2 r1 : 21275b39 r0 : 21275b29
[ 1.840942] xPSR: 0100020b

This translates to:

PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60 ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)

Obviously I can not say if this is a problem with qemu or a problem with
the Linux kernel. Given that, and the presumably low interest in
running mps2-an385 with Linux, I'll simply disable that test. Just take
it as a heads up that there _may_ be a problem with this on arm
nommu systems.

Thanks,
Guenter

2024-01-22 17:39:08

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum

From: Guenter Roeck
> Sent: 22 January 2024 16:40
>
> Hi,
>
> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
> > Supplement existing checksum tests with tests for csum_ipv6_magic and
> > ip_fast_csum.
> >
> > Signed-off-by: Charlie Jenkins <[email protected]>
> > ---
>
> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
>
> [ 1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
> [ 1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G N 6.8.0-rc1 #1
> [ 1.839948] Hardware name: Generic DT based system
> [ 1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
> [ 1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
> [ 1.840493] pc : [<21212f34>] lr : [<21117fd5>] psr: 0100020b
> [ 1.840586] sp : 2180bebc ip : 46c7f0d2 fp : 21275b38
> [ 1.840664] r10: 21276b60 r9 : 21275b28 r8 : 21465cfc
> [ 1.840751] r7 : 00003085 r6 : 21275b4e r5 : 2138702c r4 : 00000001
> [ 1.840847] r3 : 2c000000 r2 : 1ac7f0d2 r1 : 21275b39 r0 : 21275b29
> [ 1.840942] xPSR: 0100020b
>
> This translates to:
>
> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
>
> Obviously I can not say if this is a problem with qemu or a problem with
> the Linux kernel. Given that, and the presumably low interest in
> running mps2-an385 with Linux, I'll simply disable that test. Just take
> it as a heads up that there _may_ be a problem with this on arm
> nommu systems.

Can you drop in a disassembly of __csum_ipv6_magic ?
Actually I think it is:
ENTRY(__csum_ipv6_magic)
str lr, [sp, #-4]!
adds ip, r2, r3
ldmia r1, {r1 - r3, lr}

So the fault is (probably) a misaligned ldmia ?
Are they ever supported?

David

adcs ip, ip, r1
adcs ip, ip, r2
adcs ip, ip, r3
adcs ip, ip, lr
ldmia r0, {r0 - r3}
adcs r0, ip, r0
adcs r0, r0, r1
adcs r0, r0, r2
ldr r2, [sp, #4]
adcs r0, r0, r3
adcs r0, r0, r2
adcs r0, r0, #0
ldmfd sp!, {pc}
ENDPROC(__csum_ipv6_magic)

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2024-01-22 17:49:34

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum

On 1/22/24 08:52, David Laight wrote:
> From: Guenter Roeck
>> Sent: 22 January 2024 16:40
>>
>> Hi,
>>
>> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
>>> Supplement existing checksum tests with tests for csum_ipv6_magic and
>>> ip_fast_csum.
>>>
>>> Signed-off-by: Charlie Jenkins <[email protected]>
>>> ---
>>
>> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
>>
>> [ 1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
>> [ 1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G N 6.8.0-rc1 #1
>> [ 1.839948] Hardware name: Generic DT based system
>> [ 1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
>> [ 1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
>> [ 1.840493] pc : [<21212f34>] lr : [<21117fd5>] psr: 0100020b
>> [ 1.840586] sp : 2180bebc ip : 46c7f0d2 fp : 21275b38
>> [ 1.840664] r10: 21276b60 r9 : 21275b28 r8 : 21465cfc
>> [ 1.840751] r7 : 00003085 r6 : 21275b4e r5 : 2138702c r4 : 00000001
>> [ 1.840847] r3 : 2c000000 r2 : 1ac7f0d2 r1 : 21275b39 r0 : 21275b29
>> [ 1.840942] xPSR: 0100020b
>>
>> This translates to:
>>
>> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
>> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
>> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
>>
>> Obviously I can not say if this is a problem with qemu or a problem with
>> the Linux kernel. Given that, and the presumably low interest in
>> running mps2-an385 with Linux, I'll simply disable that test. Just take
>> it as a heads up that there _may_ be a problem with this on arm
>> nommu systems.
>
> Can you drop in a disassembly of __csum_ipv6_magic ?
> Actually I think it is:

It is, as per the PC pointer above. I don't know anything about arm assembler,
much less about its behavior with THUMB code.

> ENTRY(__csum_ipv6_magic)
> str lr, [sp, #-4]!
> adds ip, r2, r3
> ldmia r1, {r1 - r3, lr}
>
> So the fault is (probably) a misaligned ldmia ?
> Are they ever supported?
>

Good question. My primary guess is that this never worked. As I said,
this was just intended to be informational, (probably) no reason to bother.

Of course one might ask if it makes sense to even keep the arm nommu code
in the kernel, but that is of course a different question. I do wonder though
if anyone but me is running it.

Thanks,
Guenter

> David
>
> adcs ip, ip, r1
> adcs ip, ip, r2
> adcs ip, ip, r3
> adcs ip, ip, lr
> ldmia r0, {r0 - r3}
> adcs r0, ip, r0
> adcs r0, r0, r1
> adcs r0, r0, r2
> ldr r2, [sp, #4]
> adcs r0, r0, r3
> adcs r0, r0, r2
> adcs r0, r0, #0
> ldmfd sp!, {pc}
> ENDPROC(__csum_ipv6_magic)
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>


2024-01-22 21:43:03

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum

From: Guenter Roeck
> Sent: 22 January 2024 17:16
>
> On 1/22/24 08:52, David Laight wrote:
> > From: Guenter Roeck
> >> Sent: 22 January 2024 16:40
> >>
> >> Hi,
> >>
> >> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
> >>> Supplement existing checksum tests with tests for csum_ipv6_magic and
> >>> ip_fast_csum.
> >>>
> >>> Signed-off-by: Charlie Jenkins <[email protected]>
> >>> ---
> >>
> >> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
> >>
> >> [ 1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
> >> [ 1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G N 6.8.0-rc1 #1
> >> [ 1.839948] Hardware name: Generic DT based system
> >> [ 1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
> >> [ 1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
> >> [ 1.840493] pc : [<21212f34>] lr : [<21117fd5>] psr: 0100020b
> >> [ 1.840586] sp : 2180bebc ip : 46c7f0d2 fp : 21275b38
> >> [ 1.840664] r10: 21276b60 r9 : 21275b28 r8 : 21465cfc
> >> [ 1.840751] r7 : 00003085 r6 : 21275b4e r5 : 2138702c r4 : 00000001
> >> [ 1.840847] r3 : 2c000000 r2 : 1ac7f0d2 r1 : 21275b39 r0 : 21275b29
> >> [ 1.840942] xPSR: 0100020b
> >>
> >> This translates to:
> >>
> >> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
> >> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
> >> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
> >>
> >> Obviously I can not say if this is a problem with qemu or a problem with
> >> the Linux kernel. Given that, and the presumably low interest in
> >> running mps2-an385 with Linux, I'll simply disable that test. Just take
> >> it as a heads up that there _may_ be a problem with this on arm
> >> nommu systems.
> >
> > Can you drop in a disassembly of __csum_ipv6_magic ?
> > Actually I think it is:
>
> It is, as per the PC pointer above. I don't know anything about arm assembler,
> much less about its behavior with THUMB code.

Doesn't look like thumb to me (offset 8 is two 4-byte instructions) and
the code I found looks like arm to me.
(I haven't written any arm asm since before they invented thumb!)

> > ENTRY(__csum_ipv6_magic)
> > str lr, [sp, #-4]!
> > adds ip, r2, r3
> > ldmia r1, {r1 - r3, lr}
> >
> > So the fault is (probably) a misaligned ldmia ?
> > Are they ever supported?
> >
>
> Good question. My primary guess is that this never worked. As I said,
> this was just intended to be informational, (probably) no reason to bother.
>
> Of course one might ask if it makes sense to even keep the arm nommu code
> in the kernel, but that is of course a different question. I do wonder though
> if anyone but me is running it.

If it is an alignment fault it isn't a 'nommu' bug.

And traditionally arm didn't support misaligned transfers (well not
in anyway any other cpu did!).
It might be that the kernel assumes that all ethernet packets are
aligned, but the test suite isn't aligning the buffer.
Which would make it a test suite bug.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2024-01-22 23:46:49

by Palmer Dabbelt

[permalink] [raw]
Subject: RE: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum

On Mon, 22 Jan 2024 13:41:48 PST (-0800), [email protected] wrote:
> From: Guenter Roeck
>> Sent: 22 January 2024 17:16
>>
>> On 1/22/24 08:52, David Laight wrote:
>> > From: Guenter Roeck
>> >> Sent: 22 January 2024 16:40
>> >>
>> >> Hi,
>> >>
>> >> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
>> >>> Supplement existing checksum tests with tests for csum_ipv6_magic and
>> >>> ip_fast_csum.
>> >>>
>> >>> Signed-off-by: Charlie Jenkins <[email protected]>
>> >>> ---
>> >>
>> >> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
>> >>
>> >> [ 1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
>> >> [ 1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G N 6.8.0-rc1 #1
>> >> [ 1.839948] Hardware name: Generic DT based system
>> >> [ 1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
>> >> [ 1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
>> >> [ 1.840493] pc : [<21212f34>] lr : [<21117fd5>] psr: 0100020b
>> >> [ 1.840586] sp : 2180bebc ip : 46c7f0d2 fp : 21275b38
>> >> [ 1.840664] r10: 21276b60 r9 : 21275b28 r8 : 21465cfc
>> >> [ 1.840751] r7 : 00003085 r6 : 21275b4e r5 : 2138702c r4 : 00000001
>> >> [ 1.840847] r3 : 2c000000 r2 : 1ac7f0d2 r1 : 21275b39 r0 : 21275b29
>> >> [ 1.840942] xPSR: 0100020b
>> >>
>> >> This translates to:
>> >>
>> >> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
>> >> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
>> >> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
>> >>
>> >> Obviously I can not say if this is a problem with qemu or a problem with
>> >> the Linux kernel. Given that, and the presumably low interest in
>> >> running mps2-an385 with Linux, I'll simply disable that test. Just take
>> >> it as a heads up that there _may_ be a problem with this on arm
>> >> nommu systems.
>> >
>> > Can you drop in a disassembly of __csum_ipv6_magic ?
>> > Actually I think it is:
>>
>> It is, as per the PC pointer above. I don't know anything about arm assembler,
>> much less about its behavior with THUMB code.
>
> Doesn't look like thumb to me (offset 8 is two 4-byte instructions) and
> the code I found looks like arm to me.
> (I haven't written any arm asm since before they invented thumb!)
>
>> > ENTRY(__csum_ipv6_magic)
>> > str lr, [sp, #-4]!
>> > adds ip, r2, r3
>> > ldmia r1, {r1 - r3, lr}
>> >
>> > So the fault is (probably) a misaligned ldmia ?
>> > Are they ever supported?
>> >
>>
>> Good question. My primary guess is that this never worked. As I said,
>> this was just intended to be informational, (probably) no reason to bother.
>>
>> Of course one might ask if it makes sense to even keep the arm nommu code
>> in the kernel, but that is of course a different question. I do wonder though
>> if anyone but me is running it.
>
> If it is an alignment fault it isn't a 'nommu' bug.
>
> And traditionally arm didn't support misaligned transfers (well not
> in anyway any other cpu did!).
> It might be that the kernel assumes that all ethernet packets are
> aligned, but the test suite isn't aligning the buffer.
> Which would make it a test suite bug.

From talking to Evan and Vineet, I think you're right and this is a test
suite bug: specifically the tests weren't respecting NET_IP_ALIGN. That
didn't crop up for ip_fast_csum() as it just uses ldr which supports
misaligned accesses on the M3 (at least as far as I can tell).

So I think the right fix is something like

diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 225bb7701460..2dd282e27dd4 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -5,6 +5,7 @@

#include <kunit/test.h>
#include <asm/checksum.h>
+#include <asm/checksum.h>
#include <net/ip6_checksum.h>

#define MAX_LEN 512
@@ -15,6 +16,7 @@
#define IPv4_MAX_WORDS 15
#define NUM_IPv6_TESTS 200
#define NUM_IP_FAST_CSUM_TESTS 181
+#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)

/* Values for a little endian CPU. Byte swap each half on big endian CPU. */
static const u32 random_init_sum = 0x2847aab;
@@ -486,7 +488,7 @@ static void test_csum_fixed_random_inputs(struct kunit *test)
__sum16 result, expec;

assert_setup_correct(test);
- for (align = 0; align < TEST_BUFLEN; ++align) {
+ for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
memcpy(&tmp_buf[align], random_buf,
min(MAX_LEN, TEST_BUFLEN - align));
for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
@@ -513,7 +515,7 @@ static void test_csum_all_carry_inputs(struct kunit *test)

assert_setup_correct(test);
memset(tmp_buf, 0xff, TEST_BUFLEN);
- for (align = 0; align < TEST_BUFLEN; ++align) {
+ for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
++len) {
/*
@@ -553,7 +555,7 @@ static void test_csum_no_carry_inputs(struct kunit *test)

assert_setup_correct(test);
memset(tmp_buf, 0x4, TEST_BUFLEN);
- for (align = 0; align < TEST_BUFLEN; ++align) {
+ for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
++len) {
/*

but I haven't even build tested it...

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2024-01-23 01:10:22

by Charlie Jenkins

[permalink] [raw]
Subject: Re: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum

On Mon, Jan 22, 2024 at 03:39:16PM -0800, Palmer Dabbelt wrote:
> On Mon, 22 Jan 2024 13:41:48 PST (-0800), [email protected] wrote:
> > From: Guenter Roeck
> > > Sent: 22 January 2024 17:16
> > >
> > > On 1/22/24 08:52, David Laight wrote:
> > > > From: Guenter Roeck
> > > >> Sent: 22 January 2024 16:40
> > > >>
> > > >> Hi,
> > > >>
> > > >> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
> > > >>> Supplement existing checksum tests with tests for csum_ipv6_magic and
> > > >>> ip_fast_csum.
> > > >>>
> > > >>> Signed-off-by: Charlie Jenkins <[email protected]>
> > > >>> ---
> > > >>
> > > >> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
> > > >>
> > > >> [ 1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
> > > >> [ 1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G N 6.8.0-rc1 #1
> > > >> [ 1.839948] Hardware name: Generic DT based system
> > > >> [ 1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
> > > >> [ 1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
> > > >> [ 1.840493] pc : [<21212f34>] lr : [<21117fd5>] psr: 0100020b
> > > >> [ 1.840586] sp : 2180bebc ip : 46c7f0d2 fp : 21275b38
> > > >> [ 1.840664] r10: 21276b60 r9 : 21275b28 r8 : 21465cfc
> > > >> [ 1.840751] r7 : 00003085 r6 : 21275b4e r5 : 2138702c r4 : 00000001
> > > >> [ 1.840847] r3 : 2c000000 r2 : 1ac7f0d2 r1 : 21275b39 r0 : 21275b29
> > > >> [ 1.840942] xPSR: 0100020b
> > > >>
> > > >> This translates to:
> > > >>
> > > >> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
> > > >> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
> > > >> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
> > > >>
> > > >> Obviously I can not say if this is a problem with qemu or a problem with
> > > >> the Linux kernel. Given that, and the presumably low interest in
> > > >> running mps2-an385 with Linux, I'll simply disable that test. Just take
> > > >> it as a heads up that there _may_ be a problem with this on arm
> > > >> nommu systems.
> > > >
> > > > Can you drop in a disassembly of __csum_ipv6_magic ?
> > > > Actually I think it is:
> > >
> > > It is, as per the PC pointer above. I don't know anything about arm assembler,
> > > much less about its behavior with THUMB code.
> >
> > Doesn't look like thumb to me (offset 8 is two 4-byte instructions) and
> > the code I found looks like arm to me.
> > (I haven't written any arm asm since before they invented thumb!)
> >
> > > > ENTRY(__csum_ipv6_magic)
> > > > str lr, [sp, #-4]!
> > > > adds ip, r2, r3
> > > > ldmia r1, {r1 - r3, lr}
> > > >
> > > > So the fault is (probably) a misaligned ldmia ?
> > > > Are they ever supported?
> > > >
> > >
> > > Good question. My primary guess is that this never worked. As I said,
> > > this was just intended to be informational, (probably) no reason to bother.
> > >
> > > Of course one might ask if it makes sense to even keep the arm nommu code
> > > in the kernel, but that is of course a different question. I do wonder though
> > > if anyone but me is running it.
> >
> > If it is an alignment fault it isn't a 'nommu' bug.
> >
> > And traditionally arm didn't support misaligned transfers (well not
> > in anyway any other cpu did!).
> > It might be that the kernel assumes that all ethernet packets are
> > aligned, but the test suite isn't aligning the buffer.
> > Which would make it a test suite bug.
>
> From talking to Evan and Vineet, I think you're right and this is a test
> suite bug: specifically the tests weren't respecting NET_IP_ALIGN. That
> didn't crop up for ip_fast_csum() as it just uses ldr which supports
> misaligned accesses on the M3 (at least as far as I can tell).
>
> So I think the right fix is something like
>
> diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
> index 225bb7701460..2dd282e27dd4 100644
> --- a/lib/checksum_kunit.c
> +++ b/lib/checksum_kunit.c
> @@ -5,6 +5,7 @@
> #include <kunit/test.h>
> #include <asm/checksum.h>
> +#include <asm/checksum.h>
> #include <net/ip6_checksum.h>
> #define MAX_LEN 512
> @@ -15,6 +16,7 @@
> #define IPv4_MAX_WORDS 15
> #define NUM_IPv6_TESTS 200
> #define NUM_IP_FAST_CSUM_TESTS 181
> +#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)
> /* Values for a little endian CPU. Byte swap each half on big endian CPU. */
> static const u32 random_init_sum = 0x2847aab;
> @@ -486,7 +488,7 @@ static void test_csum_fixed_random_inputs(struct kunit *test)
> __sum16 result, expec;
> assert_setup_correct(test);
> - for (align = 0; align < TEST_BUFLEN; ++align) {
> + for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
> memcpy(&tmp_buf[align], random_buf,
> min(MAX_LEN, TEST_BUFLEN - align));
> for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
> @@ -513,7 +515,7 @@ static void test_csum_all_carry_inputs(struct kunit *test)
> assert_setup_correct(test);
> memset(tmp_buf, 0xff, TEST_BUFLEN);
> - for (align = 0; align < TEST_BUFLEN; ++align) {
> + for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
> for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
> ++len) {
> /*
> @@ -553,7 +555,7 @@ static void test_csum_no_carry_inputs(struct kunit *test)
> assert_setup_correct(test);
> memset(tmp_buf, 0x4, TEST_BUFLEN);
> - for (align = 0; align < TEST_BUFLEN; ++align) {
> + for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
> for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
> ++len) {
> /*
>
> but I haven't even build tested it...

This doesn't fix the test_csum_ipv6_magic test case that was causing the
initial problem, but the same trick can be done in that test.

- Charlie

>
> >
> > David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)

2024-01-23 02:04:14

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum

On 1/22/24 15:55, Charlie Jenkins wrote:
> On Mon, Jan 22, 2024 at 03:39:16PM -0800, Palmer Dabbelt wrote:
>> On Mon, 22 Jan 2024 13:41:48 PST (-0800), [email protected] wrote:
>>> From: Guenter Roeck
>>>> Sent: 22 January 2024 17:16
>>>>
>>>> On 1/22/24 08:52, David Laight wrote:
>>>>> From: Guenter Roeck
>>>>>> Sent: 22 January 2024 16:40
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
>>>>>>> Supplement existing checksum tests with tests for csum_ipv6_magic and
>>>>>>> ip_fast_csum.
>>>>>>>
>>>>>>> Signed-off-by: Charlie Jenkins <[email protected]>
>>>>>>> ---
>>>>>>
>>>>>> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
>>>>>>
>>>>>> [ 1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
>>>>>> [ 1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G N 6.8.0-rc1 #1
>>>>>> [ 1.839948] Hardware name: Generic DT based system
>>>>>> [ 1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
>>>>>> [ 1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
>>>>>> [ 1.840493] pc : [<21212f34>] lr : [<21117fd5>] psr: 0100020b
>>>>>> [ 1.840586] sp : 2180bebc ip : 46c7f0d2 fp : 21275b38
>>>>>> [ 1.840664] r10: 21276b60 r9 : 21275b28 r8 : 21465cfc
>>>>>> [ 1.840751] r7 : 00003085 r6 : 21275b4e r5 : 2138702c r4 : 00000001
>>>>>> [ 1.840847] r3 : 2c000000 r2 : 1ac7f0d2 r1 : 21275b39 r0 : 21275b29
>>>>>> [ 1.840942] xPSR: 0100020b
>>>>>>
>>>>>> This translates to:
>>>>>>
>>>>>> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
>>>>>> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
>>>>>> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
>>>>>>
>>>>>> Obviously I can not say if this is a problem with qemu or a problem with
>>>>>> the Linux kernel. Given that, and the presumably low interest in
>>>>>> running mps2-an385 with Linux, I'll simply disable that test. Just take
>>>>>> it as a heads up that there _may_ be a problem with this on arm
>>>>>> nommu systems.
>>>>>
>>>>> Can you drop in a disassembly of __csum_ipv6_magic ?
>>>>> Actually I think it is:
>>>>
>>>> It is, as per the PC pointer above. I don't know anything about arm assembler,
>>>> much less about its behavior with THUMB code.
>>>
>>> Doesn't look like thumb to me (offset 8 is two 4-byte instructions) and
>>> the code I found looks like arm to me.
>>> (I haven't written any arm asm since before they invented thumb!)
>>>
>>>>> ENTRY(__csum_ipv6_magic)
>>>>> str lr, [sp, #-4]!
>>>>> adds ip, r2, r3
>>>>> ldmia r1, {r1 - r3, lr}
>>>>>
>>>>> So the fault is (probably) a misaligned ldmia ?
>>>>> Are they ever supported?
>>>>>
>>>>
>>>> Good question. My primary guess is that this never worked. As I said,
>>>> this was just intended to be informational, (probably) no reason to bother.
>>>>
>>>> Of course one might ask if it makes sense to even keep the arm nommu code
>>>> in the kernel, but that is of course a different question. I do wonder though
>>>> if anyone but me is running it.
>>>
>>> If it is an alignment fault it isn't a 'nommu' bug.
>>>
>>> And traditionally arm didn't support misaligned transfers (well not
>>> in anyway any other cpu did!).
>>> It might be that the kernel assumes that all ethernet packets are
>>> aligned, but the test suite isn't aligning the buffer.
>>> Which would make it a test suite bug.
>>
>> From talking to Evan and Vineet, I think you're right and this is a test
>> suite bug: specifically the tests weren't respecting NET_IP_ALIGN. That
>> didn't crop up for ip_fast_csum() as it just uses ldr which supports
>> misaligned accesses on the M3 (at least as far as I can tell).
>>
>> So I think the right fix is something like
>>
>> diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
>> index 225bb7701460..2dd282e27dd4 100644
>> --- a/lib/checksum_kunit.c
>> +++ b/lib/checksum_kunit.c
>> @@ -5,6 +5,7 @@
>> #include <kunit/test.h>
>> #include <asm/checksum.h>
>> +#include <asm/checksum.h>
>> #include <net/ip6_checksum.h>
>> #define MAX_LEN 512
>> @@ -15,6 +16,7 @@
>> #define IPv4_MAX_WORDS 15
>> #define NUM_IPv6_TESTS 200
>> #define NUM_IP_FAST_CSUM_TESTS 181
>> +#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)
>> /* Values for a little endian CPU. Byte swap each half on big endian CPU. */
>> static const u32 random_init_sum = 0x2847aab;
>> @@ -486,7 +488,7 @@ static void test_csum_fixed_random_inputs(struct kunit *test)
>> __sum16 result, expec;
>> assert_setup_correct(test);
>> - for (align = 0; align < TEST_BUFLEN; ++align) {
>> + for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
>> memcpy(&tmp_buf[align], random_buf,
>> min(MAX_LEN, TEST_BUFLEN - align));
>> for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
>> @@ -513,7 +515,7 @@ static void test_csum_all_carry_inputs(struct kunit *test)
>> assert_setup_correct(test);
>> memset(tmp_buf, 0xff, TEST_BUFLEN);
>> - for (align = 0; align < TEST_BUFLEN; ++align) {
>> + for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
>> for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
>> ++len) {
>> /*
>> @@ -553,7 +555,7 @@ static void test_csum_no_carry_inputs(struct kunit *test)
>> assert_setup_correct(test);
>> memset(tmp_buf, 0x4, TEST_BUFLEN);
>> - for (align = 0; align < TEST_BUFLEN; ++align) {
>> + for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
>> for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
>> ++len) {
>> /*
>>
>> but I haven't even build tested it...
>
> This doesn't fix the test_csum_ipv6_magic test case that was causing the
> initial problem, but the same trick can be done in that test.
>

The above didn't (and still doesn't) fail for me. The following fixes the problem.
So, yes, I guess the problem has to do with alignment. I don't know if NET_IP_ALIGN
would do the trick, though - it works, but it seems to me that the definition of
NET_IP_ALIGN is supposed to address potential performance issues, not mandatory
IP header alignment.

Thanks,
Guenter

---
diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 225bb7701460..c8730af2a474 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -591,6 +591,8 @@ static void test_ip_fast_csum(struct kunit *test)
}
}

+#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)
+
static void test_csum_ipv6_magic(struct kunit *test)
{
#if defined(CONFIG_NET)
@@ -607,7 +609,7 @@ static void test_csum_ipv6_magic(struct kunit *test)
const int csum_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
sizeof(int) + sizeof(char);

- for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+ for (int i = 0; i < NUM_IPv6_TESTS; i+=SUPPORTED_ALIGNMENT) {
saddr = (const struct in6_addr *)(random_buf + i);
daddr = (const struct in6_addr *)(random_buf + i +
daddr_offset);



2024-01-23 10:16:35

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum

From: Guenter Roeck
> Sent: 23 January 2024 01:06
...
> >> +#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)
> >> /* Values for a little endian CPU. Byte swap each half on big endian CPU. */
> >> static const u32 random_init_sum = 0x2847aab;
> >> @@ -486,7 +488,7 @@ static void test_csum_fixed_random_inputs(struct kunit *test)
> >> __sum16 result, expec;
> >> assert_setup_correct(test);
> >> - for (align = 0; align < TEST_BUFLEN; ++align) {
> >> + for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
...

That is all wrong.
NET_IP_ALIGN is the offset for the base of ethernet frames.
If zero the IP header will (usually) be misaligned.
If two the mac addresses are misaligned in order to align
the IP header (6+6+2 bytes in).
I don't think any other values are actually valid, but
there is always that possibility.

So the definition should really be:
#define SUPPORTED_ALIGNMENT (NET_IP_ALIGN ? 4 : 1)

(Which might happen to be the same values :-)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)