LinuxLists.cc - [PATCH] x86/apic: Use div64_ul() instead of do

2024-02-27 11:48:09

Subject: [PATCH] x86/apic: Use div64_ul() instead of do_div()

Fixes Coccinelle/coccicheck warnings reported by do_div.cocci.

Change deltapm to unsigned long and replace do_div() with div64_ul()
which doesn't implicitly cast the divisor and doesn't unnecessarily
calculate the remainder.

Signed-off-by: Thorsten Blum <[email protected]>
---
arch/x86/kernel/apic/apic.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 4667bc4b00ab..facfb03ef5c8 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -699,7 +699,7 @@ static void __init lapic_cal_handler(struct clock_event_device *dev)
}

static int __init
-calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc)
+calibrate_by_pmtimer(unsigned long deltapm, long *delta, long *deltatsc)
{
const long pm_100ms = PMTMR_TICKS_PER_SEC / 10;
const long pm_thresh = pm_100ms / 100;
@@ -710,7 +710,7 @@ calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc)
return -1;
#endif

- apic_printk(APIC_VERBOSE, "... PM-Timer delta = %ld\n", deltapm);
+ apic_printk(APIC_VERBOSE, "... PM-Timer delta = %lu\n", deltapm);

/* Check, if the PM timer is available */
if (!deltapm)
@@ -724,14 +724,14 @@ calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc)
return 0;
}

- res = (((u64)deltapm) * mult) >> 22;
+ res = (((u64)deltapm) * mult) >> 22;
do_div(res, 1000000);
pr_warn("APIC calibration not consistent "
"with PM-Timer: %ldms instead of 100ms\n", (long)res);

/* Correct the lapic counter value */
res = (((u64)(*delta)) * pm_100ms);
- do_div(res, deltapm);
+ res = div64_ul(res, deltapm);
pr_info("APIC delta adjusted to PM-Timer: "
"%lu (%ld)\n", (unsigned long)res, *delta);
*delta = (long)res;
@@ -739,7 +739,7 @@ calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc)
/* Correct the tsc counter value */
if (boot_cpu_has(X86_FEATURE_TSC)) {
res = (((u64)(*deltatsc)) * pm_100ms);
- do_div(res, deltapm);
+ res = div64_ul(res, deltapm);
apic_printk(APIC_VERBOSE, "TSC delta adjusted to "
"PM-Timer: %lu (%ld)\n",
(unsigned long)res, *deltatsc);
--
2.43.2

2024-02-29 22:13:43

by David Laight

[permalink] [raw]

Subject: RE: [PATCH] x86/apic: Use div64_ul() instead of do_div()

From: Thorsten Blum
> Sent: 27 February 2024 11:44
>
> Fixes Coccinelle/coccicheck warnings reported by do_div.cocci.
>
> Change deltapm to unsigned long and replace do_div() with div64_ul()
> which doesn't implicitly cast the divisor and doesn't unnecessarily
> calculate the remainder.

Eh? they are entirely different beasts.

do_div() does a 64 by 32 divide that gives a 32bit quotient.
div64_ul() does a much more expensive 64 by 64 divide that
can generate a 64bit quotient.

The remainder is pretty much free in both cases.
If a cpu has a divide instruction it will almost certainly
put the result in one register and the quotient in another.

64 by 64 divides are horribly expensive on 32bit - they typically
have to be done in software even if the cpu has a divide instruction
which will (typically) do a 64 by 32 divide.

Even on Intel 64 bit x86 the 128 by 64 divide (needed to get
the 64 bit quotient) takes twice as long as the 64 by 32 one.
Even when the values are small.

The entire reason that the 'libc' divide function isn't in
the kernel is because it is slow and you really don't want
to be doing it unless the VALUES not the types require it.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2024-03-01 01:03:15

by H. Peter Anvin

[permalink] [raw]

Subject: RE: [PATCH] x86/apic: Use div64_ul() instead of do_div()

>>
>> Change deltapm to unsigned long and replace do_div() with div64_ul()
>> which doesn't implicitly cast the divisor and doesn't unnecessarily
>> calculate the remainder.
>
>Eh? they are entirely different beasts.
>
>do_div() does a 64 by 32 divide that gives a 32bit quotient.
>div64_ul() does a much more expensive 64 by 64 divide that
>can generate a 64bit quotient.
>
>The remainder is pretty much free in both cases.
>If a cpu has a divide instruction it will almost certainly
>put the result in one register and the quotient in another.
>

Not on e.g. RISC-V.

2024-03-01 08:53:13

by David Laight

[permalink] [raw]

Subject: RE: [PATCH] x86/apic: Use div64_ul() instead of do_div()

From: H. Peter Anvin
> Sent: 01 March 2024 01:02
>
> >>
> >> Change deltapm to unsigned long and replace do_div() with div64_ul()
> >> which doesn't implicitly cast the divisor and doesn't unnecessarily
> >> calculate the remainder.
> >
> >Eh? they are entirely different beasts.
> >
> >do_div() does a 64 by 32 divide that gives a 32bit quotient.
> >div64_ul() does a much more expensive 64 by 64 divide that
> >can generate a 64bit quotient.
> >
> >The remainder is pretty much free in both cases.
> >If a cpu has a divide instruction it will almost certainly
> >put the result in one register and the quotient in another.
> >
>
> Not on e.g. RISC-V.

If the remainder isn't used the compiler should optimise
away any code used to generate it.

gcc is also generating rather sub-optimal code.
On x86 it only does one divide for code that uses 'a / b' and
'a % b', but for riscv it does separate divide and remainder
instructions.
clang does a multiply and subtract for the remainder.

Compared to any form of divide, the extra multiply is noise.

gcc also pessimises attempts to calculate the remainder:
https://godbolt.org/z/Tojh1qcvs

Are the instruction weights set correctly for divide/remainder?
It is almost as though gcc thinks remainder is fast.

Actually I suspect even the 64 by 32 divide is a software loop
on riscv (32bit).
Not checked but I suspect the implementations (esp fpga ones) won't
allow 3 inputs to the ALU.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2024-03-01 15:14:57

by Thorsten Blum

[permalink] [raw]

Subject: Re: [PATCH] x86/apic: Use div64_ul() instead of do_div()

> On Feb 29, 2024, at 23:13, David Laight <[email protected]> wrote:
>
> do_div() does a 64 by 32 divide that gives a 32bit quotient.
> div64_ul() does a much more expensive 64 by 64 divide that
> can generate a 64bit quotient.

Since the dividend and the divisor could (according to the types at least) both
be 64-bit values and do_div() does a 64-by-32 division, the quotient could
potentially be wrong.

However, if the values don't require a 64-by-64 division (not even on a 64-bit
system) and the divisor (deltapm) is guaranteed to fit into 32 bits, wouldn't it
make sense to change its type from long to int?

Given that acpi_pm_read_early() returns a u32 and is then assigned to unsigned
long pm, this might be the source of the issue. Changing pm and deltapm from
long to u32 and keeping do_div() as is, would improve the types and remove the
Coccinelle warnings...but maybe I'm missing something?

Thanks,
Thorsten

2024-03-01 20:44:25

by Thorsten Blum

[permalink] [raw]

Subject: [PATCH v2] x86/apic: Use u32 instead of unsigned long

Use u32 to fix Coccinelle/coccicheck warnings reported by do_div.cocci.

Signed-off-by: Thorsten Blum <[email protected]>
---
Changes in v2:
- Revert changing do_div() to div64_ul() after feedback from David Laight
- Use u32 instead of unsigned long to fix Coccinelle warnings
- Update patch title and description
---
arch/x86/kernel/apic/apic.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 4667bc4b00ab..184e1843620d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -664,7 +664,7 @@ void lapic_update_tsc_freq(void)
static __initdata int lapic_cal_loops = -1;
static __initdata long lapic_cal_t1, lapic_cal_t2;
static __initdata unsigned long long lapic_cal_tsc1, lapic_cal_tsc2;
-static __initdata unsigned long lapic_cal_pm1, lapic_cal_pm2;
+static __initdata u32 lapic_cal_pm1, lapic_cal_pm2;
static __initdata unsigned long lapic_cal_j1, lapic_cal_j2;

/*
@@ -674,7 +674,7 @@ static void __init lapic_cal_handler(struct clock_event_device *dev)
{
unsigned long long tsc = 0;
long tapic = apic_read(APIC_TMCCT);
- unsigned long pm = acpi_pm_read_early();
+ u32 pm = acpi_pm_read_early();

if (boot_cpu_has(X86_FEATURE_TSC))
tsc = rdtsc();
@@ -699,7 +699,7 @@ static void __init lapic_cal_handler(struct clock_event_device *dev)
}

static int __init
-calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc)
+calibrate_by_pmtimer(u32 deltapm, long *delta, long *deltatsc)
{
const long pm_100ms = PMTMR_TICKS_PER_SEC / 10;
const long pm_thresh = pm_100ms / 100;
@@ -710,7 +710,7 @@ calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc)
return -1;
#endif

- apic_printk(APIC_VERBOSE, "... PM-Timer delta = %ld\n", deltapm);
+ apic_printk(APIC_VERBOSE, "... PM-Timer delta = %u\n", deltapm);

/* Check, if the PM timer is available */
if (!deltapm)
--
2.44.0

2024-03-08 12:46:36

by Thorsten Blum

[permalink] [raw]

Subject: [RESEND PATCH v2] x86/apic: Use u32 instead of unsigned long

Improve types by using u32 instead of unsigned long. Fixes two
Coccinelle/coccicheck warnings reported by do_div.cocci.

Signed-off-by: Thorsten Blum <[email protected]>
---
Changes in v2:
- Revert changing do_div() to div64_ul() after feedback from David Laight
- Use u32 instead of unsigned long to fix Coccinelle warnings
- Update patch title and description
---
arch/x86/kernel/apic/apic.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 4667bc4b00ab..184e1843620d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -664,7 +664,7 @@ void lapic_update_tsc_freq(void)
static __initdata int lapic_cal_loops = -1;
static __initdata long lapic_cal_t1, lapic_cal_t2;
static __initdata unsigned long long lapic_cal_tsc1, lapic_cal_tsc2;
-static __initdata unsigned long lapic_cal_pm1, lapic_cal_pm2;
+static __initdata u32 lapic_cal_pm1, lapic_cal_pm2;
static __initdata unsigned long lapic_cal_j1, lapic_cal_j2;

/*
@@ -674,7 +674,7 @@ static void __init lapic_cal_handler(struct clock_event_device *dev)
{
unsigned long long tsc = 0;
long tapic = apic_read(APIC_TMCCT);
- unsigned long pm = acpi_pm_read_early();
+ u32 pm = acpi_pm_read_early();

if (boot_cpu_has(X86_FEATURE_TSC))
tsc = rdtsc();
@@ -699,7 +699,7 @@ static void __init lapic_cal_handler(struct clock_event_device *dev)
}

static int __init
-calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc)
+calibrate_by_pmtimer(u32 deltapm, long *delta, long *deltatsc)
{
const long pm_100ms = PMTMR_TICKS_PER_SEC / 10;
const long pm_thresh = pm_100ms / 100;
@@ -710,7 +710,7 @@ calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc)
return -1;
#endif

- apic_printk(APIC_VERBOSE, "... PM-Timer delta = %ld\n", deltapm);
+ apic_printk(APIC_VERBOSE, "... PM-Timer delta = %u\n", deltapm);

/* Check, if the PM timer is available */
if (!deltapm)
--
2.44.0