LinuxLists.cc - [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation

2020-01-16 18:00:18

Subject: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation

On powerpc/32, GCC (8.1) generates pretty bad code for the
ns >>= vd->shift operation taking into account that the
shift is always < 32 and the upper part of the result is
likely to be nul. GCC makes reversed assumptions considering
the shift to be likely >= 32 and the upper part to be like not nul.

unsigned long long shift(unsigned long long x, unsigned char s)
{
return x >> s;
}

results in:

00000018 <shift>:
18: 35 25 ff e0 addic. r9,r5,-32
1c: 41 80 00 10 blt 2c <shift+0x14>
20: 7c 64 4c 30 srw r4,r3,r9
24: 38 60 00 00 li r3,0
28: 4e 80 00 20 blr
2c: 54 69 08 3c rlwinm r9,r3,1,0,30
30: 21 45 00 1f subfic r10,r5,31
34: 7c 84 2c 30 srw r4,r4,r5
38: 7d 29 50 30 slw r9,r9,r10
3c: 7c 63 2c 30 srw r3,r3,r5
40: 7d 24 23 78 or r4,r9,r4
44: 4e 80 00 20 blr

Even when forcing the shift with an &= 31, it still considers
the shift as likely >= 32.

Define a vdso_shift_ns() macro that can be overriden by
arches.

Signed-off-by: Christophe Leroy <[email protected]>
---
lib/vdso/gettimeofday.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 724b45c3e8ac..9ba92058cfd7 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -39,6 +39,13 @@ u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
}
#endif

+#ifndef vdso_shift_ns
+static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
+{
+ return ns >> shift;
+}
+#endif
+
#ifndef __arch_vdso_hres_capable
static inline bool __arch_vdso_hres_capable(void)
{
@@ -148,7 +155,7 @@ static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk,
ns = vdso_ts->nsec;
last = vd->cycle_last;
ns += vdso_calc_delta(cycles, last, vd->mask, vd->mult);
- ns >>= vd->shift;
+ ns = vdso_shift_ns(ns, vd->shift);
sec = vdso_ts->sec;
} while (unlikely(vdso_read_retry(vd, seq)));

--
2.13.3

2020-01-16 23:23:54

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation

On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
<[email protected]> wrote:
>
> On powerpc/32, GCC (8.1) generates pretty bad code for the
> ns >>= vd->shift operation taking into account that the
> shift is always < 32 and the upper part of the result is
> likely to be nul. GCC makes reversed assumptions considering
> the shift to be likely >= 32 and the upper part to be like not nul.
>
> unsigned long long shift(unsigned long long x, unsigned char s)
> {
> return x >> s;
> }
>
> results in:
>
> 00000018 <shift>:
> 18: 35 25 ff e0 addic. r9,r5,-32
> 1c: 41 80 00 10 blt 2c <shift+0x14>
> 20: 7c 64 4c 30 srw r4,r3,r9
> 24: 38 60 00 00 li r3,0
> 28: 4e 80 00 20 blr
> 2c: 54 69 08 3c rlwinm r9,r3,1,0,30
> 30: 21 45 00 1f subfic r10,r5,31
> 34: 7c 84 2c 30 srw r4,r4,r5
> 38: 7d 29 50 30 slw r9,r9,r10
> 3c: 7c 63 2c 30 srw r3,r3,r5
> 40: 7d 24 23 78 or r4,r9,r4
> 44: 4e 80 00 20 blr
>
> Even when forcing the shift with an &= 31, it still considers
> the shift as likely >= 32.
>
> Define a vdso_shift_ns() macro that can be overriden by
> arches.

Would mul_u64_u64_shr() be a good alternative? Could we adjust it to
assume the shift is less than 32? That function exists to benefit
32-bit arches.

--Andy

2020-01-16 23:24:01

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation

Andy Lutomirski <[email protected]> writes:
> On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
>
> Would mul_u64_u64_shr() be a good alternative? Could we adjust it to
> assume the shift is less than 32? That function exists to benefit
> 32-bit arches.

We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:

1 <= mult <= U32_MAX

1 <= shift <= 32

If we want to enforce a shift < 32 we need to limit that conditionally
in the calculation/registration function.

Thanks,

tglx

2020-01-17 01:00:00

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation

On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner <[email protected]> wrote:
>
> Andy Lutomirski <[email protected]> writes:
> > On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
> >
> > Would mul_u64_u64_shr() be a good alternative? Could we adjust it to
> > assume the shift is less than 32? That function exists to benefit
> > 32-bit arches.
>
> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:
>

That's what I meant to type...

> 1 <= mult <= U32_MAX
>
> 1 <= shift <= 32
>
> If we want to enforce a shift < 32 we need to limit that conditionally
> in the calculation/registration function.
>
> Thanks,
>
> tglx
>

2020-01-29 07:18:21

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation

Andy Lutomirski <[email protected]> writes:

> On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner <[email protected]> wrote:
>>
>> Andy Lutomirski <[email protected]> writes:
>> > On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
>> >
>> > Would mul_u64_u64_shr() be a good alternative? Could we adjust it to
>> > assume the shift is less than 32? That function exists to benefit
>> > 32-bit arches.
>>
>> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:
>>
>
> That's what I meant to type...

Just that it does not work. The math is:

ns = d->nsecs; // That's the nsec value shifted left by d->shift

ns += ((cur - d->last) & d->mask) * mult;

ns >>= d->shift;

So we cannot use mul_u64_u32_shr() because we need the addition there
before shifting. And no, we can't drop the fractional part of
d->nsecs. Been there, done that, got sporadic time going backwards
problems as a reward. Need to look at that again as stuff has changed
over time.

On x86 we enforce that mask is 64bit, so the & operation is not there,
but due to the nasties of TSC we have that conditional

if (cur > last)
return (cur - last) * mult;
return 0;

Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as
well, so you can spare that & operation there too.

Thanks,

tglx

2020-01-29 07:28:06

by Christophe Leroy

[permalink] [raw]

Subject: Re: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation

Le 29/01/2020 à 08:14, Thomas Gleixner a écrit :
> Andy Lutomirski <[email protected]> writes:
>
>> On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner <[email protected]> wrote:
>>>
>>> Andy Lutomirski <[email protected]> writes:
>>>> On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
>>>>
>>>> Would mul_u64_u64_shr() be a good alternative? Could we adjust it to
>>>> assume the shift is less than 32? That function exists to benefit
>>>> 32-bit arches.
>>>
>>> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:
>>>
>>
>> That's what I meant to type...
>
> Just that it does not work. The math is:
>
> ns = d->nsecs; // That's the nsec value shifted left by d->shift
>
> ns += ((cur - d->last) & d->mask) * mult;
>
> ns >>= d->shift;
>
> So we cannot use mul_u64_u32_shr() because we need the addition there
> before shifting. And no, we can't drop the fractional part of
> d->nsecs. Been there, done that, got sporadic time going backwards
> problems as a reward. Need to look at that again as stuff has changed
> over time.
>
> On x86 we enforce that mask is 64bit, so the & operation is not there,
> but due to the nasties of TSC we have that conditional
>
> if (cur > last)
> return (cur - last) * mult;
> return 0;
>
> Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as
> well, so you can spare that & operation there too.
>

Yes, I did it already. It spares reading d->mast, that the main advantage.

Christophe