2020-01-30 13:27:15

by Wen Yang

[permalink] [raw]
Subject: [PATCH] x86/tsc: improve arithmetic division

do_div() does a 64-by-32 division. Use div64_ul64() or div64_ul()
instead of it if the divisor is 'ul64' or 'unsigned long', to avoid
truncation to lower 32-bit.
And as a nice side effect also cleans up the function a bit.

Signed-off-by: Wen Yang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kernel/tsc.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 7e322e2daaf5..4c0320e68699 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -357,9 +357,7 @@ static unsigned long calc_pmtimer_ref(u64 deltatsc, u64 pm1, u64 pm2)
pm2 -= pm1;
tmp = pm2 * 1000000000LL;
do_div(tmp, PMTMR_TICKS_PER_SEC);
- do_div(deltatsc, tmp);
-
- return (unsigned long) deltatsc;
+ return (unsigned long) div64_u64(deltatsc, tmp);
}

#define CAL_MS 10
@@ -778,8 +776,7 @@ static unsigned long pit_hpet_ptimer_calibrate_cpu(void)
tsc_ref_min = min(tsc_ref_min, (unsigned long) tsc2);

/* Check the reference deviation */
- delta = ((u64) tsc_pit_min) * 100;
- do_div(delta, tsc_ref_min);
+ delta = div64_ul(((u64) tsc_pit_min) * 100, tsc_ref_min);

/*
* If both calibration results are inside a 10% window
--
2.23.0


2020-02-01 22:31:25

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: improve arithmetic division

On January 30, 2020 5:08:38 AM PST, Wen Yang <[email protected]> wrote:
>do_div() does a 64-by-32 division. Use div64_ul64() or div64_ul()
>instead of it if the divisor is 'ul64' or 'unsigned long', to avoid
>truncation to lower 32-bit.
>And as a nice side effect also cleans up the function a bit.
>
>Signed-off-by: Wen Yang <[email protected]>
>Cc: Thomas Gleixner <[email protected]>
>Cc: Ingo Molnar <[email protected]>
>Cc: Borislav Petkov <[email protected]>
>Cc: "H. Peter Anvin" <[email protected]>
>Cc: [email protected]
>Cc: [email protected]
>---
> arch/x86/kernel/tsc.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
>diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
>index 7e322e2daaf5..4c0320e68699 100644
>--- a/arch/x86/kernel/tsc.c
>+++ b/arch/x86/kernel/tsc.c
>@@ -357,9 +357,7 @@ static unsigned long calc_pmtimer_ref(u64 deltatsc,
>u64 pm1, u64 pm2)
> pm2 -= pm1;
> tmp = pm2 * 1000000000LL;
> do_div(tmp, PMTMR_TICKS_PER_SEC);
>- do_div(deltatsc, tmp);
>-
>- return (unsigned long) deltatsc;
>+ return (unsigned long) div64_u64(deltatsc, tmp);
> }
>
> #define CAL_MS 10
>@@ -778,8 +776,7 @@ static unsigned long
>pit_hpet_ptimer_calibrate_cpu(void)
> tsc_ref_min = min(tsc_ref_min, (unsigned long) tsc2);
>
> /* Check the reference deviation */
>- delta = ((u64) tsc_pit_min) * 100;
>- do_div(delta, tsc_ref_min);
>+ delta = div64_ul(((u64) tsc_pit_min) * 100, tsc_ref_min);
>
> /*
> * If both calibration results are inside a 10% window

This is a *lot* more expensive on 32 bits (something like 10x) and as the output is truncated to unsigned long anyway, it is also unnecessary.

We don't use the remainder, so using do_div() is not merely unnecessary but almost certainly generates worse code: we are multiplying and then dividing by a constant, and most of the time gcc can optimize that into a single multiply/shift operation; otherwise we can do that optimization for it (see timeconst.bc.)

The one thing that gcc can't necessary do automatically is to know when a 64/32 → 32 division is safe; C semantics are truncation, but the CPU will trap. If it can turn it into a multiply then that problem obviously goes away.

So first I would test with regular / operators and see what code comes out.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.