2021-07-19 12:53:00

by Akira Tsukamoto

[permalink] [raw]
Subject: [PATCH v4 0/1] riscv: improving uaccess with logs from network bench

Hi Guenter, Geert and Qiu,

I fixed the bug which was overrunning the copy when the size was in the
between 8*SZREG to 9*SZREG. The SZREG holds the bytes per register size
which is 4 for RV32 and 8 for RV64.

Do you mind trying this patch? It works OK at my place.

Since I had to respin the patch I added word copy without unrolling when
the size is in the between 2*SZREG to 9*SZREG to reduce the number of byte
copies which has heavy overhead as Palmer has mentioned when he included
this patch to riscv/for-next.


I rewrote the functions but heavily influenced by Garry's memcpy
function [1]. It must be written in assembler to handle page faults
manually inside the function unlike other memcpy functions.

This patch will reduce cpu usage dramatically in kernel space especially
for applications which use sys-call with large buffer size, such as network
applications. The main reason behind this is that every unaligned memory
access will raise exceptions and switch between s-mode and m-mode causing
large overhead.

---
v3 -> v4:
- Fixed overrun copy
- Added word copy without unrolling to reduce byte copy for left over

v2 -> v3:
- Merged all patches

v1 -> v2:
- Added shift copy
- Separated patches for readability of changes in assembler
- Using perf results

[1] https://lkml.org/lkml/2021/2/16/778

Akira Tsukamoto (1):
riscv: __asm_copy_to-from_user: Optimize unaligned memory access and
pipeline stall

arch/riscv/lib/uaccess.S | 218 ++++++++++++++++++++++++++++++++-------
1 file changed, 183 insertions(+), 35 deletions(-)

--
2.17.1


2021-07-19 12:54:06

by Akira Tsukamoto

[permalink] [raw]
Subject: [PATCH v4 1/1] riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall


This patch will reduce cpu usage dramatically in kernel space especially
for application which use sys-call with large buffer size, such as
network applications. The main reason behind this is that every
unaligned memory access will raise exceptions and switch between s-mode
and m-mode causing large overhead.

First copy in bytes until reaches the first word aligned boundary in
destination memory address. This is the preparation before the bulk
aligned word copy.

The destination address is aligned now, but oftentimes the source
address is not in an aligned boundary. To reduce the unaligned memory
access, it reads the data from source in aligned boundaries, which will
cause the data to have an offset, and then combines the data in the next
iteration by fixing offset with shifting before writing to destination.
The majority of the improving copy speed comes from this shift copy.

In the lucky situation that the both source and destination address are
on the aligned boundary, perform load and store with register size to
copy the data. Without the unrolling, it will reduce the speed since the
next store instruction for the same register using from the load will
stall the pipeline. If the size of copy is too small for unrolled copy
perform regular word copy.

At last, copying the remainder in one byte at a time.


The motivation to create the patch was to improve network performance on
beaglev beta board. By observing with perf, the memcpy and
__asm_copy_to_user had heavy cpu usage and the network speed was limited
at around 680Mbps on 1Gbps lan.

Typical network applications use system calls with a large buffer on
send/recv() and sendto/recvfrom() for the optimization.

The bench result, when patching only copy_user. The memcpy is without
Matteo's patches but listing the both since they are the top two largest
overhead.

All results are from the same base kernel, same rootfs and same BeagleV
beta board.

Results of iperf3 have speedup on UDP with the copy_user patch alone.

--- UDP send ---
306 Mbits/sec 362 Mbits/sec
305 Mbits/sec 362 Mbits/sec

--- UDP recv ---
772 Mbits/sec 787 Mbits/sec
773 Mbits/sec 784 Mbits/sec

Comparison by "perf top -Ue task-clock" while running iperf3.

--- TCP recv ---
* Before
40.40% [kernel] [k] memcpy
33.09% [kernel] [k] __asm_copy_to_user
* With patch
50.35% [kernel] [k] memcpy
13.76% [kernel] [k] __asm_copy_to_user

--- TCP send ---
* Before
19.96% [kernel] [k] memcpy
9.84% [kernel] [k] __asm_copy_to_user
* With patch
14.27% [kernel] [k] memcpy
7.37% [kernel] [k] __asm_copy_to_user

--- UDP recv ---
* Before
44.45% [kernel] [k] memcpy
31.04% [kernel] [k] __asm_copy_to_user
* With patch
55.62% [kernel] [k] memcpy
11.22% [kernel] [k] __asm_copy_to_user

--- UDP send ---
* Before
25.18% [kernel] [k] memcpy
22.50% [kernel] [k] __asm_copy_to_user
* With patch
28.90% [kernel] [k] memcpy
9.49% [kernel] [k] __asm_copy_to_user


Signed-off-by: Akira Tsukamoto <[email protected]>
---
arch/riscv/lib/uaccess.S | 218 ++++++++++++++++++++++++++++++++-------
1 file changed, 183 insertions(+), 35 deletions(-)

diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index fceaeb18cc64..f1518fd3be99 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -19,50 +19,198 @@ ENTRY(__asm_copy_from_user)
li t6, SR_SUM
csrs CSR_STATUS, t6

- add a3, a1, a2
- /* Use word-oriented copy only if low-order bits match */
- andi t0, a0, SZREG-1
- andi t1, a1, SZREG-1
- bne t0, t1, 2f
+ /* Save for return value */
+ mv t5, a2

- addi t0, a1, SZREG-1
- andi t1, a3, ~(SZREG-1)
- andi t0, t0, ~(SZREG-1)
/*
- * a3: terminal address of source region
- * t0: lowest XLEN-aligned address in source
- * t1: highest XLEN-aligned address in source
+ * Register allocation for code below:
+ * a0 - start of uncopied dst
+ * a1 - start of uncopied src
+ * a2 - size
+ * t0 - end of uncopied dst
*/
- bgeu t0, t1, 2f
- bltu a1, t0, 4f
+ add t0, a0, a2
+
+ /*
+ * Use byte copy only if too small.
+ * SZREG holds 4 for RV32 and 8 for RV64
+ * a3 - 2*SZREG is minimum size for word_copy
+ * SZREG for aligning dst + SZREG for word_copy
+ */
+ li a3, 2*SZREG
+ bltu a2, a3, .Lbyte_copy_tail
+
+ /*
+ * Copy first bytes until dst is aligned to word boundary.
+ * a0 - start of dst
+ * t1 - start of aligned dst
+ */
+ addi t1, a0, SZREG-1
+ andi t1, t1, ~(SZREG-1)
+ /* dst is already aligned, skip */
+ beq a0, t1, .Lskip_align_dst
1:
- fixup REG_L, t2, (a1), 10f
- fixup REG_S, t2, (a0), 10f
- addi a1, a1, SZREG
- addi a0, a0, SZREG
- bltu a1, t1, 1b
+ /* a5 - one byte for copying data */
+ fixup lb a5, 0(a1), 10f
+ addi a1, a1, 1 /* src */
+ fixup sb a5, 0(a0), 10f
+ addi a0, a0, 1 /* dst */
+ bltu a0, t1, 1b /* t1 - start of aligned dst */
+
+.Lskip_align_dst:
+ /*
+ * Now dst is aligned.
+ * Use shift-copy if src is misaligned.
+ * Use word-copy if both src and dst are aligned because
+ * can not use shift-copy which do not require shifting
+ */
+ /* a1 - start of src */
+ andi a3, a1, SZREG-1
+ bnez a3, .Lshift_copy
+
+.Lcheck_size_bulk:
+ /*
+ * Evaluate the size to choose word_copy or word_copy_unlrolled
+ * The word_copy_unlrolled requires larger than 8*SZREG
+ */
+ li a3, 8*SZREG
+ add a4, a0, a3
+ bltu a4, t0, .Lword_copy_unlrolled
+
+.Lword_copy:
+ /*
+ * Both src and dst are aligned
+ * word copy with every SZREG iteration
+ *
+ * a0 - start of aligned dst
+ * a1 - start of aligned src
+ * t0 - end of aligned dst
+ */
+ bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */
+ addi t0, t0, -(SZREG) /* not to over run */
+1:
+ REG_L a5, 0(a1)
+ addi a1, a1, SZREG
+ REG_S a5, 0(a0)
+ addi a0, a0, SZREG
+ bltu a0, t0, 1b
+
+ addi t0, t0, SZREG /* revert to original value */
+ j .Lbyte_copy_tail
+
+.Lword_copy_unlrolled:
+ /*
+ * Both src and dst are aligned
+ * unrolled word copy with every 8*SZREG iteration
+ *
+ * a0 - start of aligned dst
+ * a1 - start of aligned src
+ * t0 - end of aligned dst
+ */
+ addi t0, t0, -(8*SZREG) /* not to over run */
2:
- bltu a1, a3, 5f
+ fixup REG_L a4, 0(a1), 10f
+ fixup REG_L a5, SZREG(a1), 10f
+ fixup REG_L a6, 2*SZREG(a1), 10f
+ fixup REG_L a7, 3*SZREG(a1), 10f
+ fixup REG_L t1, 4*SZREG(a1), 10f
+ fixup REG_L t2, 5*SZREG(a1), 10f
+ fixup REG_L t3, 6*SZREG(a1), 10f
+ fixup REG_L t4, 7*SZREG(a1), 10f
+ fixup REG_S a4, 0(a0), 10f
+ fixup REG_S a5, SZREG(a0), 10f
+ fixup REG_S a6, 2*SZREG(a0), 10f
+ fixup REG_S a7, 3*SZREG(a0), 10f
+ fixup REG_S t1, 4*SZREG(a0), 10f
+ fixup REG_S t2, 5*SZREG(a0), 10f
+ fixup REG_S t3, 6*SZREG(a0), 10f
+ fixup REG_S t4, 7*SZREG(a0), 10f
+ addi a0, a0, 8*SZREG
+ addi a1, a1, 8*SZREG
+ bltu a0, t0, 2b
+
+ addi t0, t0, 8*SZREG /* revert to original value */
+
+ /*
+ * Remaining might large enough for word_copy to reduce slow byte
+ * copy
+ */
+ j .Lcheck_size_bulk
+
+.Lshift_copy:
+
+ /*
+ * Word copy with shifting.
+ * For misaligned copy we still perform aligned word copy, but
+ * we need to use the value fetched from the previous iteration and
+ * do some shifts.
+ * This is safe because reading is less than a word size.
+ *
+ * a0 - start of aligned dst
+ * a1 - start of src
+ * a3 - a1 & mask:(SZREG-1)
+ * t0 - end of uncopied dst
+ * t1 - end of aligned dst
+ */
+ /* calculating aligned word boundary for dst */
+ andi t1, t0, ~(SZREG-1)
+ /* Converting unaligned src to aligned arc */
+ andi a1, a1, ~(SZREG-1)
+
+ /*
+ * Calculate shifts
+ * t3 - prev shift
+ * t4 - current shift
+ */
+ slli t3, a3, LGREG
+ li a5, SZREG*8
+ sub t4, a5, t3
+
+ /* Load the first word to combine with second word */
+ fixup REG_L a5, 0(a1), 10f

3:
+ /* Main shifting copy
+ *
+ * a0 - start of aligned dst
+ * a1 - start of aligned src
+ * t1 - end of aligned dst
+ */
+
+ /* At least one iteration will be executed */
+ srl a4, a5, t3
+ fixup REG_L a5, SZREG(a1), 10f
+ addi a1, a1, SZREG
+ sll a2, a5, t4
+ or a2, a2, a4
+ fixup REG_S a2, 0(a0), 10f
+ addi a0, a0, SZREG
+ bltu a0, t1, 3b
+
+ /* Revert src to original unaligned value */
+ add a1, a1, a3
+
+.Lbyte_copy_tail:
+ /*
+ * Byte copy anything left.
+ *
+ * a0 - start of remaining dst
+ * a1 - start of remaining src
+ * t0 - end of remaining dst
+ */
+ bgeu a0, t0, .Lend_copy_user /* check if end of copy */
+4:
+ fixup lb a5, 0(a1), 10f
+ addi a1, a1, 1 /* src */
+ fixup sb a5, 0(a0), 10f
+ addi a0, a0, 1 /* dst */
+ bltu a0, t0, 4b /* t0 - end of dst */
+
+.Lend_copy_user:
/* Disable access to user memory */
csrc CSR_STATUS, t6
- li a0, 0
+ li a0, 0
ret
-4: /* Edge case: unalignment */
- fixup lbu, t2, (a1), 10f
- fixup sb, t2, (a0), 10f
- addi a1, a1, 1
- addi a0, a0, 1
- bltu a1, t0, 4b
- j 1b
-5: /* Edge case: remainder */
- fixup lbu, t2, (a1), 10f
- fixup sb, t2, (a0), 10f
- addi a1, a1, 1
- addi a0, a0, 1
- bltu a1, a3, 5b
- j 3b
ENDPROC(__asm_copy_to_user)
ENDPROC(__asm_copy_from_user)
EXPORT_SYMBOL(__asm_copy_to_user)
@@ -117,7 +265,7 @@ EXPORT_SYMBOL(__clear_user)
10:
/* Disable access to user memory */
csrs CSR_STATUS, t6
- mv a0, a2
+ mv a0, t5
ret
11:
csrs CSR_STATUS, t6
--
2.17.1


2021-07-19 13:54:25

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall

Hi,

On 7/19/21 5:53 AM, Akira Tsukamoto wrote:
>
> This patch will reduce cpu usage dramatically in kernel space especially
> for application which use sys-call with large buffer size, such as
> network applications. The main reason behind this is that every
> unaligned memory access will raise exceptions and switch between s-mode
> and m-mode causing large overhead.
>

I had to revert the original patch from the mainline kernel prior to applying
this patch. Obviously that means that there may be other changes affecting the
outcome.

riscv64 images work, but riscv32 images still fail with this patch applied.
The error is a bit different than before, though.

...
[ 11.899979] Run /sbin/init as init process
[ 12.152666] random: fast init done
moun: applet not found
"�����V�t: applet not found
/bi�����V�F-: applet not found
moun: applet not found
swaon: applet not found
hostname-F: applet not found

After this, the image hangs.

For comparison, the mainline kernel (v5.14-rc2) fails as follows.

[ 10.788105] Run /sbin/init as init process
Starting syslogd: OK
Starting klogd: OK
/etc/init.d/S02sysctl: line 68: syntax error: EOF in backquote substitution
/etc/init.d/S20urandom: line 1: syntax error: unterminated quoted string
Starting network: /bin/sh: syntax error: unterminated quoted string
sed: unmatched '/'
/bin/sh: syntax error: unterminated quoted string
FAIL
/etc/init.d/S55runtest: line 48: syntax error: EOF in backquote substitution

I'll be happy to provide information on how to reproduce the problem
if needed. Please let me know.

Thanks,
Guenter

2021-07-19 14:01:25

by Akira Tsukamoto

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall


On 7/19/2021 10:51 PM, Guenter Roeck wrote:
> Hi,
>
> On 7/19/21 5:53 AM, Akira Tsukamoto wrote:
>>
>> This patch will reduce cpu usage dramatically in kernel space especially
>> for application which use sys-call with large buffer size, such as
>> network applications. The main reason behind this is that every
>> unaligned memory access will raise exceptions and switch between s-mode
>> and m-mode causing large overhead.
>>
>
> I had to revert the original patch from the mainline kernel prior to applying
> this patch. Obviously that means that there may be other changes affecting the
> outcome.
>
> riscv64 images work, but riscv32 images still fail with this patch applied.
> The error is a bit different than before, though.
>
> ...
> [   11.899979] Run /sbin/init as init process
> [   12.152666] random: fast init done
> moun: applet not found
> "�����V�t: applet not found
> /bi�����V�F-: applet not found
> moun: applet not found
> swaon: applet not found
> hostname-F: applet not found
>
> After this, the image hangs.
>
> For comparison, the mainline kernel (v5.14-rc2) fails as follows.
>
> [   10.788105] Run /sbin/init as init process
> Starting syslogd: OK
> Starting klogd: OK
> /etc/init.d/S02sysctl: line 68: syntax error: EOF in backquote substitution
> /etc/init.d/S20urandom: line 1: syntax error: unterminated quoted string
> Starting network: /bin/sh: syntax error: unterminated quoted string
> sed: unmatched '/'
> /bin/sh: syntax error: unterminated quoted string
> FAIL
> /etc/init.d/S55runtest: line 48: syntax error: EOF in backquote substitution
>
> I'll be happy to provide information on how to reproduce the problem
> if needed. Please let me know.

Yes, I do would like to know the procedure of build instruction of your rv32 image.
Then I would reproduce the error and look into how to fix it.

Akira

2021-07-19 14:27:40

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall

On 7/19/21 7:00 AM, Akira Tsukamoto wrote:
>
> On 7/19/2021 10:51 PM, Guenter Roeck wrote:
>> Hi,
>>
>> On 7/19/21 5:53 AM, Akira Tsukamoto wrote:
>>>
>>> This patch will reduce cpu usage dramatically in kernel space especially
>>> for application which use sys-call with large buffer size, such as
>>> network applications. The main reason behind this is that every
>>> unaligned memory access will raise exceptions and switch between s-mode
>>> and m-mode causing large overhead.
>>>
>>
>> I had to revert the original patch from the mainline kernel prior to applying
>> this patch. Obviously that means that there may be other changes affecting the
>> outcome.
>>
>> riscv64 images work, but riscv32 images still fail with this patch applied.
>> The error is a bit different than before, though.
>>
>> ...
>> [   11.899979] Run /sbin/init as init process
>> [   12.152666] random: fast init done
>> moun: applet not found
>> "�����V�t: applet not found
>> /bi�����V�F-: applet not found
>> moun: applet not found
>> swaon: applet not found
>> hostname-F: applet not found
>>
>> After this, the image hangs.
>>
>> For comparison, the mainline kernel (v5.14-rc2) fails as follows.
>>
>> [   10.788105] Run /sbin/init as init process
>> Starting syslogd: OK
>> Starting klogd: OK
>> /etc/init.d/S02sysctl: line 68: syntax error: EOF in backquote substitution
>> /etc/init.d/S20urandom: line 1: syntax error: unterminated quoted string
>> Starting network: /bin/sh: syntax error: unterminated quoted string
>> sed: unmatched '/'
>> /bin/sh: syntax error: unterminated quoted string
>> FAIL
>> /etc/init.d/S55runtest: line 48: syntax error: EOF in backquote substitution
>>
>> I'll be happy to provide information on how to reproduce the problem
>> if needed. Please let me know.
>
> Yes, I do would like to know the procedure of build instruction of your rv32 image.
> Then I would reproduce the error and look into how to fix it.
>
Please have a look at http://server.roeck-us.net/qemu/riscv32/
and let me know if you need anything else.

Thanks,
Guenter

2021-07-19 14:53:04

by Akira Tsukamoto

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall


On 7/19/2021 11:24 PM, Guenter Roeck wrote:
> On 7/19/21 7:00 AM, Akira Tsukamoto wrote:
>>
>> On 7/19/2021 10:51 PM, Guenter Roeck wrote:
>>> Hi,
>>>
>>> On 7/19/21 5:53 AM, Akira Tsukamoto wrote:
>>>>
>>>> This patch will reduce cpu usage dramatically in kernel space especially
>>>> for application which use sys-call with large buffer size, such as
>>>> network applications. The main reason behind this is that every
>>>> unaligned memory access will raise exceptions and switch between s-mode
>>>> and m-mode causing large overhead.
>>>>
>>>
>>> I had to revert the original patch from the mainline kernel prior to applying
>>> this patch. Obviously that means that there may be other changes affecting the
>>> outcome.
>>>
>>> riscv64 images work, but riscv32 images still fail with this patch applied.
>>> The error is a bit different than before, though.
>>>
>>> ...
>>> [   11.899979] Run /sbin/init as init process
>>> [   12.152666] random: fast init done
>>> moun: applet not found
>>> "�����V�t: applet not found
>>> /bi�����V�F-: applet not found
>>> moun: applet not found
>>> swaon: applet not found
>>> hostname-F: applet not found
>>>
>>> After this, the image hangs.
>>>
>>> For comparison, the mainline kernel (v5.14-rc2) fails as follows.
>>>
>>> [   10.788105] Run /sbin/init as init process
>>> Starting syslogd: OK
>>> Starting klogd: OK
>>> /etc/init.d/S02sysctl: line 68: syntax error: EOF in backquote substitution
>>> /etc/init.d/S20urandom: line 1: syntax error: unterminated quoted string
>>> Starting network: /bin/sh: syntax error: unterminated quoted string
>>> sed: unmatched '/'
>>> /bin/sh: syntax error: unterminated quoted string
>>> FAIL
>>> /etc/init.d/S55runtest: line 48: syntax error: EOF in backquote substitution
>>>
>>> I'll be happy to provide information on how to reproduce the problem
>>> if needed. Please let me know.
>>
>> Yes, I do would like to know the procedure of build instruction of your rv32 image.
>> Then I would reproduce the error and look into how to fix it.
>>
> Please have a look at http://server.roeck-us.net/qemu/riscv32/
> and let me know if you need anything else.

Thanks for the link. I will work on it tomorrow in my time.
Building the kernel with defconfig looks much quicker than the config.
I will try it from defconfig and then config.

Also I would like to use the same rv32 toolchain. Are you using prebuilt
riscv32-linux-gcc? Should not make differences but just in case.

Is there a public reference for the way of building your qemu and opensbi?
Not sure which version of qemu with which rv32 to build them, rv32i, rv32imad or else.

I really appreciate your help.

Akira

2021-07-19 14:53:24

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall

Hi Tsukamoto-san,

On Mon, Jul 19, 2021 at 2:53 PM Akira Tsukamoto
<[email protected]> wrote:
> This patch will reduce cpu usage dramatically in kernel space especially
> for application which use sys-call with large buffer size, such as
> network applications. The main reason behind this is that every
> unaligned memory access will raise exceptions and switch between s-mode
> and m-mode causing large overhead.

[...]

> Signed-off-by: Akira Tsukamoto <[email protected]>

Thanks for your patch!

As v3 is part of v5.14-rc1, all fixes and improvements need to be
send as incremental patches.

After reverting ca6eaaa210deec0e ("riscv: __asm_copy_to-from_user:
Optimize unaligned memory access and pipeline stall") and applying
v4, booting linux-on-litex-vexriscv still fails, but now differently
(real crash):

/bi�����V�F-: applet not found
2'�����t: applet not found
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
CPU: 0 PID: 1 Comm: init Not tainted
5.14.0-rc2-orangecrab-01933-g5c9574869017 #357
Call Trace:
Unable to handle kernel NULL pointer dereference at virtual address 00000af0
Oops [#1]
CPU: 0 PID: 1 Comm: init Not tainted
5.14.0-rc2-orangecrab-01933-g5c9574869017 #357
epc : walk_stackframe+0x11c/0x13c
ra : dump_backtrace+0x2c/0x3c
epc : c0003970 ra : c00039bc sp : c1835e20
gp : c06a7690 tp : c1838000 t0 : 00000000
t1 : 00000000 t2 : 00000000 s0 : c1835e50
s1 : c05d8180 a0 : 00001000 a1 : 00000000
a2 : c04dfd68 a3 : c05d8180 a4 : ab1d4cdc
a5 : 00001000 a6 : c067d204 a7 : ffffefff
s2 : 00000000 s3 : c05cc9f4 s4 : 00000000
s5 : c05d8180 s6 : c04dfd68 s7 : 00000001
s8 : 00000000 s9 : 95b6f158 s10: 00000000
s11: 00000001 t3 : 00000000 t4 : 00000001
t5 : 00000000 t6 : 00000000
status: 00000100 badaddr: 00000af0 cause: 0000000d
[<c0003970>] walk_stackframe+0x11c/0x13c
[<c00039bc>] dump_backtrace+0x2c/0x3c
[<c04dfde8>] show_stack+0x44/0x5c
[<c04e4c98>] dump_stack_lvl+0x2c/0x40
[<c04e4cc8>] dump_stack+0x1c/0x2c
[<c04dff3c>] panic+0x13c/0x330
[<c000c774>] do_exit+0x830/0x8b8
[<c000c888>] do_group_exit+0x40/0xac
[<c000c918>] __wake_up_parent+0x0/0x34
[<c0002128>] ret_from_syscall+0x0/0x4
---[ end trace d147f0f146982b08 ]---
note: init[1] exited with preempt_count 1
Fixing recursive fault but reboot is needed!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-07-19 14:57:28

by Akira Tsukamoto

[permalink] [raw]
Subject: Re: [PATCH v4 0/1] riscv: improving uaccess with logs from network bench

Hi Palmer,

Please do not bather with this patch.
It still have bug for rv32 reported by Guenter and I will regenerate the
patch against v5.14-rc* since I made the patch against v5.13.x for this
patch.

Akira

On 7/19/2021 9:51 PM, Akira Tsukamoto wrote:
> Hi Guenter, Geert and Qiu,
>
> I fixed the bug which was overrunning the copy when the size was in the
> between 8*SZREG to 9*SZREG. The SZREG holds the bytes per register size
> which is 4 for RV32 and 8 for RV64.
>
> Do you mind trying this patch? It works OK at my place.
>
> Since I had to respin the patch I added word copy without unrolling when
> the size is in the between 2*SZREG to 9*SZREG to reduce the number of byte
> copies which has heavy overhead as Palmer has mentioned when he included
> this patch to riscv/for-next.
>
>
> I rewrote the functions but heavily influenced by Garry's memcpy
> function [1]. It must be written in assembler to handle page faults
> manually inside the function unlike other memcpy functions.
>
> This patch will reduce cpu usage dramatically in kernel space especially
> for applications which use sys-call with large buffer size, such as network
> applications. The main reason behind this is that every unaligned memory
> access will raise exceptions and switch between s-mode and m-mode causing
> large overhead.
>
> ---
> v3 -> v4:
> - Fixed overrun copy
> - Added word copy without unrolling to reduce byte copy for left over
>
> v2 -> v3:
> - Merged all patches
>
> v1 -> v2:
> - Added shift copy
> - Separated patches for readability of changes in assembler
> - Using perf results
>
> [1] https://lkml.org/lkml/2021/2/16/778
>
> Akira Tsukamoto (1):
> riscv: __asm_copy_to-from_user: Optimize unaligned memory access and
> pipeline stall
>
> arch/riscv/lib/uaccess.S | 218 ++++++++++++++++++++++++++++++++-------
> 1 file changed, 183 insertions(+), 35 deletions(-)
>

2021-07-19 16:21:58

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall

On 7/19/21 7:49 AM, Akira Tsukamoto wrote:
>
> On 7/19/2021 11:24 PM, Guenter Roeck wrote:
>> On 7/19/21 7:00 AM, Akira Tsukamoto wrote:
>>>
>>> On 7/19/2021 10:51 PM, Guenter Roeck wrote:
>>>> Hi,
>>>>
>>>> On 7/19/21 5:53 AM, Akira Tsukamoto wrote:
>>>>>
>>>>> This patch will reduce cpu usage dramatically in kernel space especially
>>>>> for application which use sys-call with large buffer size, such as
>>>>> network applications. The main reason behind this is that every
>>>>> unaligned memory access will raise exceptions and switch between s-mode
>>>>> and m-mode causing large overhead.
>>>>>
>>>>
>>>> I had to revert the original patch from the mainline kernel prior to applying
>>>> this patch. Obviously that means that there may be other changes affecting the
>>>> outcome.
>>>>
>>>> riscv64 images work, but riscv32 images still fail with this patch applied.
>>>> The error is a bit different than before, though.
>>>>
>>>> ...
>>>> [   11.899979] Run /sbin/init as init process
>>>> [   12.152666] random: fast init done
>>>> moun: applet not found
>>>> "�����V�t: applet not found
>>>> /bi�����V�F-: applet not found
>>>> moun: applet not found
>>>> swaon: applet not found
>>>> hostname-F: applet not found
>>>>
>>>> After this, the image hangs.
>>>>
>>>> For comparison, the mainline kernel (v5.14-rc2) fails as follows.
>>>>
>>>> [   10.788105] Run /sbin/init as init process
>>>> Starting syslogd: OK
>>>> Starting klogd: OK
>>>> /etc/init.d/S02sysctl: line 68: syntax error: EOF in backquote substitution
>>>> /etc/init.d/S20urandom: line 1: syntax error: unterminated quoted string
>>>> Starting network: /bin/sh: syntax error: unterminated quoted string
>>>> sed: unmatched '/'
>>>> /bin/sh: syntax error: unterminated quoted string
>>>> FAIL
>>>> /etc/init.d/S55runtest: line 48: syntax error: EOF in backquote substitution
>>>>
>>>> I'll be happy to provide information on how to reproduce the problem
>>>> if needed. Please let me know.
>>>
>>> Yes, I do would like to know the procedure of build instruction of your rv32 image.
>>> Then I would reproduce the error and look into how to fix it.
>>>
>> Please have a look at http://server.roeck-us.net/qemu/riscv32/
>> and let me know if you need anything else.
>
> Thanks for the link. I will work on it tomorrow in my time.
> Building the kernel with defconfig looks much quicker than the config.
> I will try it from defconfig and then config.
>
> Also I would like to use the same rv32 toolchain. Are you using prebuilt
> riscv32-linux-gcc? Should not make differences but just in case.
>

I use a toolchain built with either buildroot (when building the root file system)
or with the buildall scripts from https://github.com/jmesmon/buildall.git
(for the kernel). Either case, it is the default riscv32 toolchain from both.

> Is there a public reference for the way of building your qemu and opensbi?
> Not sure which version of qemu with which rv32 to build them, rv32i, rv32imad or else.

I don't know about rv32i or rv32imad, sorry. I build qemu from source using
--disable-user --disable-gnutls --disable-docs \
--disable-nettle --disable-gcrypt --disable-vnc-png \
--disable-xen --disable-xen-pci-passthrough \
--disable-libssh
configuration options. The firmware image (opensbi) is built as part of that.
You should be able to use a pre-built version of qemu v6.0 if that is available
somewhere.

Hope this helps,
Guenter

2021-07-20 01:59:26

by Akira Tsukamoto

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall

Hi Geert,

On 7/19/2021 11:49 PM, Geert Uytterhoeven wrote:
> Hi Tsukamoto-san,
>
> On Mon, Jul 19, 2021 at 2:53 PM Akira Tsukamoto
> <[email protected]> wrote:
>> This patch will reduce cpu usage dramatically in kernel space especially
>> for application which use sys-call with large buffer size, such as
>> network applications. The main reason behind this is that every
>> unaligned memory access will raise exceptions and switch between s-mode
>> and m-mode causing large overhead.
>
> [...]
>
>> Signed-off-by: Akira Tsukamoto <[email protected]>
>
> Thanks for your patch!

Thanks for trying!

>
> As v3 is part of v5.14-rc1, all fixes and improvements need to be
> send as incremental patches.

Yeah, I was not paying attention to v5.14-rc1. Will regenerate it
when I spot the remaining bug on rv32.

>
> After reverting ca6eaaa210deec0e ("riscv: __asm_copy_to-from_user:
> Optimize unaligned memory access and pipeline stall") and applying
> v4, booting linux-on-litex-vexriscv still fails, but now differently
> (real crash):

This time it looks like under copy instead of overrun copy last time.
The SZREG, LGREG, REG_L and REG_S should been taking care of the
difference of rv32 and rv64, but probably still forgetting applying
some places.

Akira

>
> /bi�����V�F-: applet not found
> 2'�����t: applet not found
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> CPU: 0 PID: 1 Comm: init Not tainted
> 5.14.0-rc2-orangecrab-01933-g5c9574869017 #357
> Call Trace:
> Unable to handle kernel NULL pointer dereference at virtual address 00000af0
> Oops [#1]
> CPU: 0 PID: 1 Comm: init Not tainted
> 5.14.0-rc2-orangecrab-01933-g5c9574869017 #357
> epc : walk_stackframe+0x11c/0x13c
> ra : dump_backtrace+0x2c/0x3c
> epc : c0003970 ra : c00039bc sp : c1835e20
> gp : c06a7690 tp : c1838000 t0 : 00000000
> t1 : 00000000 t2 : 00000000 s0 : c1835e50
> s1 : c05d8180 a0 : 00001000 a1 : 00000000
> a2 : c04dfd68 a3 : c05d8180 a4 : ab1d4cdc
> a5 : 00001000 a6 : c067d204 a7 : ffffefff
> s2 : 00000000 s3 : c05cc9f4 s4 : 00000000
> s5 : c05d8180 s6 : c04dfd68 s7 : 00000001
> s8 : 00000000 s9 : 95b6f158 s10: 00000000
> s11: 00000001 t3 : 00000000 t4 : 00000001
> t5 : 00000000 t6 : 00000000
> status: 00000100 badaddr: 00000af0 cause: 0000000d
> [<c0003970>] walk_stackframe+0x11c/0x13c
> [<c00039bc>] dump_backtrace+0x2c/0x3c
> [<c04dfde8>] show_stack+0x44/0x5c
> [<c04e4c98>] dump_stack_lvl+0x2c/0x40
> [<c04e4cc8>] dump_stack+0x1c/0x2c
> [<c04dff3c>] panic+0x13c/0x330
> [<c000c774>] do_exit+0x830/0x8b8
> [<c000c888>] do_group_exit+0x40/0xac
> [<c000c918>] __wake_up_parent+0x0/0x34
> [<c0002128>] ret_from_syscall+0x0/0x4
> ---[ end trace d147f0f146982b08 ]---
> note: init[1] exited with preempt_count 1
> Fixing recursive fault but reboot is needed!
>
> Gr{oetje,eeting}s,
>
> Geert
>

2021-07-20 06:56:47

by Akira Tsukamoto

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall



On 7/20/21 00:16, Guenter Roeck wrote:
> On 7/19/21 7:49 AM, Akira Tsukamoto wrote:
>>
>> On 7/19/2021 11:24 PM, Guenter Roeck wrote:
>>> On 7/19/21 7:00 AM, Akira Tsukamoto wrote:
>>>>
>>>> On 7/19/2021 10:51 PM, Guenter Roeck wrote:
>>>>> Hi,
>>>>>
>>>>> On 7/19/21 5:53 AM, Akira Tsukamoto wrote:
>>>>>>
>>>>>> This patch will reduce cpu usage dramatically in kernel space especially
>>>>>> for application which use sys-call with large buffer size, such as
>>>>>> network applications. The main reason behind this is that every
>>>>>> unaligned memory access will raise exceptions and switch between s-mode
>>>>>> and m-mode causing large overhead.
>>>>>>
>>>>>
>>>>> I had to revert the original patch from the mainline kernel prior to applying
>>>>> this patch. Obviously that means that there may be other changes affecting the
>>>>> outcome.
>>>>>
>>>>> riscv64 images work, but riscv32 images still fail with this patch applied.
>>>>> The error is a bit different than before, though.
>>>>>
>>>>> ...
>>>>> [   11.899979] Run /sbin/init as init process
>>>>> [   12.152666] random: fast init done
>>>>> moun: applet not found
>>>>> "�����V�t: applet not found
>>>>> /bi�����V�F-: applet not found
>>>>> moun: applet not found
>>>>> swaon: applet not found
>>>>> hostname-F: applet not found
>>>>>
>>>>> After this, the image hangs.
>>>>>
>>>>> For comparison, the mainline kernel (v5.14-rc2) fails as follows.
>>>>>
>>>>> [   10.788105] Run /sbin/init as init process
>>>>> Starting syslogd: OK
>>>>> Starting klogd: OK
>>>>> /etc/init.d/S02sysctl: line 68: syntax error: EOF in backquote substitution
>>>>> /etc/init.d/S20urandom: line 1: syntax error: unterminated quoted string
>>>>> Starting network: /bin/sh: syntax error: unterminated quoted string
>>>>> sed: unmatched '/'
>>>>> /bin/sh: syntax error: unterminated quoted string
>>>>> FAIL
>>>>> /etc/init.d/S55runtest: line 48: syntax error: EOF in backquote substitution
>>>>>
>>>>> I'll be happy to provide information on how to reproduce the problem
>>>>> if needed. Please let me know.
>>>>
>>>> Yes, I do would like to know the procedure of build instruction of your rv32 image.
>>>> Then I would reproduce the error and look into how to fix it.
>>>>
>>> Please have a look at http://server.roeck-us.net/qemu/riscv32/
>>> and let me know if you need anything else.
>>
>> Thanks for the link. I will work on it tomorrow in my time.
>> Building the kernel with defconfig looks much quicker than the config.
>> I will try it from defconfig and then config.
>>
>> Also I would like to use the same rv32 toolchain. Are you using prebuilt
>> riscv32-linux-gcc? Should not make differences but just in case.
>>
>
> I use a toolchain built with either buildroot (when building the root file system)
> or with the buildall scripts from https://github.com/jmesmon/buildall.git
> (for the kernel). Either case, it is the default riscv32 toolchain from both.
>
>> Is there a public reference for the way of building your qemu and opensbi?
>> Not sure which version of qemu with which rv32 to build them, rv32i, rv32imad or else.
>
> I don't know about rv32i or rv32imad, sorry. I build qemu from source using
>     --disable-user --disable-gnutls --disable-docs \
>         --disable-nettle --disable-gcrypt --disable-vnc-png \
>         --disable-xen --disable-xen-pci-passthrough \
>         --disable-libssh
> configuration options. The firmware image (opensbi) is built as part of that.
> You should be able to use a pre-built version of qemu v6.0 if that is available
> somewhere.
>
> Hope this helps,

Super, I was perfectly able to reproduce the exactly the same error messages
with the toolchain in buildroot and your rootfs.

There were two bugs, one is overrun copy and second is wrong shifting in .Lshift_copy
for rv32.

Please wait for my fix.

Akira