2019-12-23 14:33:06

by Christophe Leroy

[permalink] [raw]
Subject: [RFC PATCH v2 00/10] powerpc/32: switch VDSO to C implementation.

This is a second tentative to switch powerpc/32 vdso to generic C implementation.

It will likely not work on 64 bits or even build properly at the moment.

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit and fallback needs to be performed in ASM.

To allow that, the fallback calls are moved out of the common code
and left to the arches.

A few other changes in the common code have allowed performance improvement.

The performance has improved since first RFC, but it is still lower than
current assembly VDSO.

On a powerpc8xx, with current powerpc/32 ASM VDSO:

gettimeofday: vdso: 737 nsec/call
clock-getres-realtime: vdso: 475 nsec/call
clock-gettime-realtime: vdso: 892 nsec/call
clock-getres-monotonic: vdso: 475 nsec/call
clock-gettime-monotonic: vdso: 1014 nsec/call

First try of C implementation:

gettimeofday: vdso: 1533 nsec/call
clock-getres-realtime: vdso: 853 nsec/call
clock-gettime-realtime: vdso: 1570 nsec/call
clock-getres-monotonic: vdso: 835 nsec/call
clock-gettime-monotonic: vdso: 1605 nsec/call

With this series:

gettimeofday: vdso: 1016 nsec/call
clock-getres-realtime: vdso: 560 nsec/call
clock-gettime-realtime: vdso: 1192 nsec/call
clock-getres-monotonic: vdso: 560 nsec/call
clock-gettime-monotonic: vdso: 1192 nsec/call


Changes made to other arches are untested, not even compiled.


Christophe Leroy (10):
lib: vdso: ensure all arches have 32bit fallback
lib: vdso: move call to fallback out of common code.
lib: vdso: Change __cvdso_clock_gettime/getres_common() to
__cvdso_clock_gettime/getres()
lib: vdso: get pointer to vdso data from the arch
lib: vdso: inline do_hres()
lib: vdso: make do_coarse() return 0
lib: vdso: don't use READ_ONCE() in __c_kernel_time()
lib: vdso: Avoid duplication in __cvdso_clock_getres()
powerpc/vdso32: inline __get_datapage()
powerpc/32: Switch VDSO to C implementation.

arch/arm/include/asm/vdso/gettimeofday.h | 26 +++
arch/arm/vdso/vgettimeofday.c | 32 ++-
arch/arm64/include/asm/vdso/compat_gettimeofday.h | 2 -
arch/arm64/include/asm/vdso/gettimeofday.h | 26 +++
arch/arm64/kernel/vdso/vgettimeofday.c | 24 +-
arch/arm64/kernel/vdso32/vgettimeofday.c | 39 +++-
arch/mips/include/asm/vdso/gettimeofday.h | 28 ++-
arch/mips/vdso/vgettimeofday.c | 56 ++++-
arch/powerpc/Kconfig | 2 +
arch/powerpc/include/asm/vdso/gettimeofday.h | 45 ++++
arch/powerpc/include/asm/vdso/vsyscall.h | 27 +++
arch/powerpc/include/asm/vdso_datapage.h | 28 +--
arch/powerpc/kernel/asm-offsets.c | 23 +-
arch/powerpc/kernel/time.c | 92 +-------
arch/powerpc/kernel/vdso.c | 19 +-
arch/powerpc/kernel/vdso32/Makefile | 19 +-
arch/powerpc/kernel/vdso32/cacheflush.S | 9 +-
arch/powerpc/kernel/vdso32/datapage.S | 28 +--
arch/powerpc/kernel/vdso32/gettimeofday.S | 265 +++-------------------
arch/powerpc/kernel/vdso32/vgettimeofday.c | 32 +++
arch/x86/entry/vdso/vclock_gettime.c | 52 ++++-
arch/x86/include/asm/vdso/gettimeofday.h | 28 ++-
lib/vdso/gettimeofday.c | 100 +++-----
23 files changed, 505 insertions(+), 497 deletions(-)
create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c

--
2.13.3


2019-12-23 14:33:16

by Christophe Leroy

[permalink] [raw]
Subject: [RFC PATCH v2 08/10] lib: vdso: Avoid duplication in __cvdso_clock_getres()

VDSO_HRES and VDSO_RAW clocks are handled the same way.

Don't duplicate code.

Signed-off-by: Christophe Leroy <[email protected]>
---
lib/vdso/gettimeofday.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 5a17a9d2e6cd..aa4a167bf1e0 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -172,7 +172,7 @@ int __cvdso_clock_getres(const struct vdso_data *vd, clockid_t clock,
* clocks are handled in the VDSO directly.
*/
msk = 1U << clock;
- if (msk & VDSO_HRES) {
+ if (msk & (VDSO_HRES | VDSO_RAW)) {
/*
* Preserves the behaviour of posix_get_hrtimer_res().
*/
@@ -182,11 +182,6 @@ int __cvdso_clock_getres(const struct vdso_data *vd, clockid_t clock,
* Preserves the behaviour of posix_get_coarse_res().
*/
ns = LOW_RES_NSEC;
- } else if (msk & VDSO_RAW) {
- /*
- * Preserves the behaviour of posix_get_hrtimer_res().
- */
- ns = hrtimer_res;
} else {
return -1;
}
--
2.13.3

2019-12-24 02:01:47

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v2 08/10] lib: vdso: Avoid duplication in __cvdso_clock_getres()

On Mon, Dec 23, 2019 at 6:31 AM Christophe Leroy
<[email protected]> wrote:
>
> VDSO_HRES and VDSO_RAW clocks are handled the same way.
>
> Don't duplicate code.
>
> Signed-off-by: Christophe Leroy <[email protected]>

Reviewed-by: Andy Lutomirski <[email protected]>

2020-01-09 21:03:03

by Christophe Leroy

[permalink] [raw]
Subject: Surprising code generated for vdso_read_begin()

Wondering why we get something so complicated/redundant for
vdso_read_begin() <include/vdso/helpers.h>

static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
{
u32 seq;

while ((seq = READ_ONCE(vd->seq)) & 1)
cpu_relax();

smp_rmb();
return seq;
}


6e0: 81 05 00 f0 lwz r8,240(r5)
6e4: 71 09 00 01 andi. r9,r8,1
6e8: 41 82 00 10 beq 6f8 <__c_kernel_clock_gettime+0x158>
6ec: 81 05 00 f0 lwz r8,240(r5)
6f0: 71 0a 00 01 andi. r10,r8,1
6f4: 40 82 ff f8 bne 6ec <__c_kernel_clock_gettime+0x14c>
6f8:

r5 being vd pointer

Why the first triplet, not only the second triplet ? Something wrong
with using READ_ONCE() for that ?

Christophe

2020-01-09 21:04:30

by Segher Boessenkool

[permalink] [raw]
Subject: Re: Surprising code generated for vdso_read_begin()

On Thu, Jan 09, 2020 at 05:52:34PM +0000, Christophe Leroy wrote:
> Wondering why we get something so complicated/redundant for
> vdso_read_begin() <include/vdso/helpers.h>
>
> static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
> {
> u32 seq;
>
> while ((seq = READ_ONCE(vd->seq)) & 1)
> cpu_relax();
>
> smp_rmb();
> return seq;
> }
>
>
> 6e0: 81 05 00 f0 lwz r8,240(r5)
> 6e4: 71 09 00 01 andi. r9,r8,1
> 6e8: 41 82 00 10 beq 6f8 <__c_kernel_clock_gettime+0x158>
> 6ec: 81 05 00 f0 lwz r8,240(r5)
> 6f0: 71 0a 00 01 andi. r10,r8,1
> 6f4: 40 82 ff f8 bne 6ec <__c_kernel_clock_gettime+0x14c>
> 6f8:
>
> r5 being vd pointer
>
> Why the first triplet, not only the second triplet ? Something wrong
> with using READ_ONCE() for that ?

It looks like the compiler did loop peeling. What GCC version is this?
Please try current trunk (to become GCC 10), or at least GCC 9?


Segher

2020-01-10 06:46:57

by Christophe Leroy

[permalink] [raw]
Subject: Re: Surprising code generated for vdso_read_begin()



Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> On Thu, Jan 09, 2020 at 05:52:34PM +0000, Christophe Leroy wrote:
>> Wondering why we get something so complicated/redundant for
>> vdso_read_begin() <include/vdso/helpers.h>
>>
>> static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
>> {
>> u32 seq;
>>
>> while ((seq = READ_ONCE(vd->seq)) & 1)
>> cpu_relax();
>>
>> smp_rmb();
>> return seq;
>> }
>>
>>
>> 6e0: 81 05 00 f0 lwz r8,240(r5)
>> 6e4: 71 09 00 01 andi. r9,r8,1
>> 6e8: 41 82 00 10 beq 6f8 <__c_kernel_clock_gettime+0x158>
>> 6ec: 81 05 00 f0 lwz r8,240(r5)
>> 6f0: 71 0a 00 01 andi. r10,r8,1
>> 6f4: 40 82 ff f8 bne 6ec <__c_kernel_clock_gettime+0x14c>
>> 6f8:
>>
>> r5 being vd pointer
>>
>> Why the first triplet, not only the second triplet ? Something wrong
>> with using READ_ONCE() for that ?
>
> It looks like the compiler did loop peeling. What GCC version is this?
> Please try current trunk (to become GCC 10), or at least GCC 9?
>

It is with GCC 5.5

https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
recent than 8.1

With 8.1, the problem doesn't show up.

Thanks
Christophe

2020-01-11 11:36:20

by Segher Boessenkool

[permalink] [raw]
Subject: Re: Surprising code generated for vdso_read_begin()

On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> Le 09/01/2020 ? 21:07, Segher Boessenkool a ?crit?:
> >It looks like the compiler did loop peeling. What GCC version is this?
> >Please try current trunk (to become GCC 10), or at least GCC 9?
>
> It is with GCC 5.5
>
> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> recent than 8.1

Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is
this hard and/or painful to do?

> With 8.1, the problem doesn't show up.

Good to hear that. Hrm, I wonder when it was fixed...


Segher

2020-02-16 18:12:02

by Arnd Bergmann

[permalink] [raw]
Subject: Re: Surprising code generated for vdso_read_begin()

On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
<[email protected]> wrote:
>
> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> > Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> > >It looks like the compiler did loop peeling. What GCC version is this?
> > >Please try current trunk (to become GCC 10), or at least GCC 9?
> >
> > It is with GCC 5.5
> >
> > https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> > recent than 8.1
>
> Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is
> this hard and/or painful to do?

To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
binaries, as well as a recent 10.0 snapshot.

I hope these work, let me know if there are problems.

Arnd

2020-02-19 08:46:16

by Christophe Leroy

[permalink] [raw]
Subject: Re: Surprising code generated for vdso_read_begin()



Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :
> On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> <[email protected]> wrote:
>>
>> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
>>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
>>>> It looks like the compiler did loop peeling. What GCC version is this?
>>>> Please try current trunk (to become GCC 10), or at least GCC 9?
>>>
>>> It is with GCC 5.5
>>>
>>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
>>> recent than 8.1
>>
>> Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is
>> this hard and/or painful to do?
>
> To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
> binaries, as well as a recent 10.0 snapshot.
>

Thanks Arnd,

I have built the VDSO with 9.2, I get less performant result than with
8.2 (same performance as with 5.5).

After a quick look, I see:
- Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should
avoid that.
- A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC
8.1 don't need that, all VDSO functions are frameless with 8.1

Christophe

2020-02-19 09:54:13

by Arnd Bergmann

[permalink] [raw]
Subject: Re: Surprising code generated for vdso_read_begin()

On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy
<[email protected]> wrote:
> Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :
> > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> > <[email protected]> wrote:
> >>
> >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> >>>> It looks like the compiler did loop peeling. What GCC version is this?
> >>>> Please try current trunk (to become GCC 10), or at least GCC 9?
> >>>
> >>> It is with GCC 5.5
> >>>
> >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> >>> recent than 8.1
> >>
> >> Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is
> >> this hard and/or painful to do?
> >
> > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
> > binaries, as well as a recent 10.0 snapshot.
> >
>
> Thanks Arnd,
>
> I have built the VDSO with 9.2, I get less performant result than with
> 8.2 (same performance as with 5.5).
>
> After a quick look, I see:
> - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should
> avoid that.
> - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC
> 8.1 don't need that, all VDSO functions are frameless with 8.1

If you think it should be fixed in gcc, maybe try to reproduce it in
https://godbolt.org/ and open a gcc bug against that.

Also, please try the gcc-10 snapshot, which has the highest chance
of getting fixes if it shows the same issue (or worse).

Arnd

2020-02-19 13:09:20

by Segher Boessenkool

[permalink] [raw]
Subject: Re: Surprising code generated for vdso_read_begin()

On Wed, Feb 19, 2020 at 10:52:16AM +0100, Arnd Bergmann wrote:
> On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy
> <[email protected]> wrote:
> > Le 16/02/2020 ? 19:10, Arnd Bergmann a ?crit :
> > > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> > > <[email protected]> wrote:
> > >>
> > >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> > >>> Le 09/01/2020 ? 21:07, Segher Boessenkool a ?crit :
> > >>>> It looks like the compiler did loop peeling. What GCC version is this?
> > >>>> Please try current trunk (to become GCC 10), or at least GCC 9?
> > >>>
> > >>> It is with GCC 5.5
> > >>>
> > >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> > >>> recent than 8.1
> > >>
> > >> Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is
> > >> this hard and/or painful to do?
> > >
> > > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
> > > binaries, as well as a recent 10.0 snapshot.
> > >
> >
> > Thanks Arnd,
> >
> > I have built the VDSO with 9.2, I get less performant result than with
> > 8.2 (same performance as with 5.5).
> >
> > After a quick look, I see:
> > - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should
> > avoid that.
> > - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC
> > 8.1 don't need that, all VDSO functions are frameless with 8.1
>
> If you think it should be fixed in gcc, maybe try to reproduce it in
> https://godbolt.org/

(Feel free to skip this step; and don't put links to godbolt (or anything
else external) in our bugzilla, please; such links go stale before you
know it.)

> and open a gcc bug against that.

Yes please :-)

> Also, please try the gcc-10 snapshot, which has the highest chance
> of getting fixes if it shows the same issue (or worse).

If it is a regression, chances are it will be backported. (But not to
9.3, which is due in just a few weeks, just like 8.4). If it is just a
side effect of some other change, it will probably *not* be undone, not
on trunk (GCC 10) either. It depends.

But sure, always test trunk if you can.


Segher