2024-02-23 05:29:17

by Fei Wu

[permalink] [raw]
Subject: riscv syscall performance regression

Hi All,

I am doing some performance regression testing on a sophgo machine, the
unixbench syscall benchmark drops 14% from 6.1 to 6.6. This change
should be due to commit f0bddf50 riscv: entry: Convert to generic entry.
I know it's a tradeoff, just checking if it's been discussed already and
any improvement can be done.

The unixbench benchmark I used is:
$ ./syscall 10 getpid

The dynamic instruction count per syscall is increased from ~200 to
~250, this should be the key factor so I switch to test it on system
QEMU to avoid porting different versions on sophgo, and use plugin
libinsn.so to count the instructions. There are a few background noises
during test but the impact should be limited. This is dyninst count per
syscall I got:

* commit d0db02c6 (right before the change): ~200
* commit f0bddf50 (the change): ~250
* commit ffd2cb6b (latest upstream): ~250

Any comment?

Thanks,
Fei.


2024-02-27 01:17:48

by Guo Ren

[permalink] [raw]
Subject: Re: riscv syscall performance regression

On Fri, Feb 23, 2024 at 1:29 PM Wu, Fei <[email protected]> wrote:
>
> Hi All,
>
> I am doing some performance regression testing on a sophgo machine, the
> unixbench syscall benchmark drops 14% from 6.1 to 6.6. This change
> should be due to commit f0bddf50 riscv: entry: Convert to generic entry.
> I know it's a tradeoff, just checking if it's been discussed already and
> any improvement can be done.
>
> The unixbench benchmark I used is:
> $ ./syscall 10 getpid
>
> The dynamic instruction count per syscall is increased from ~200 to
> ~250, this should be the key factor so I switch to test it on system
> QEMU to avoid porting different versions on sophgo, and use plugin
> libinsn.so to count the instructions. There are a few background noises
> during test but the impact should be limited. This is dyninst count per
> syscall I got:
>
> * commit d0db02c6 (right before the change): ~200
> * commit f0bddf50 (the change): ~250
> * commit ffd2cb6b (latest upstream): ~250
>
> Any comment?
1. I think this is about generic entry performance, all architectures
should move to that framework and improve the generic entry
performance together.

2. Another point is there are added sched functions in the generic
entry code, so using a simple empty syscall can't show the benefit of
generic entry.

3. Could we use vdso to improve getpid?

PS:
Now, the syscall arguments are using pt_regs instead of
syscall_wrapper, which broke the rv32 syscall, ref:
https://github.com/T-head-Semi/linux/pull/5

>
> Thanks,
> Fei.



--
Best Regards
Guo Ren