The clock framework handles clock rates as "unsigned long", so u32 on
32-bit architectures and u64 on 64-bit architectures.
The current code pointlessly casts the dividend to u64 on 32-bit
architectures and thus pointlessly reducing the performance.
On the other hand on 64-bit architectures the divisor is masked and only
the lower 32-bit are used. Thus requesting a frequency >= 4.3GHz results
in incorrect values. For example requesting 4300000000 (4.3 GHz) will
effectively request ca. 5 MHz. Requesting clk_round_rate(clk, ULONG_MAX)
is a bit of a special case, since that still returns correct values as
long as the parent clock is below 8.5 GHz.
Signed-off-by: Sebastian Reichel <[email protected]>
---
drivers/clk/clk-divider.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
index a2c2b5203b0a..c38e8aa60e54 100644
--- a/drivers/clk/clk-divider.c
+++ b/drivers/clk/clk-divider.c
@@ -220,7 +220,7 @@ static int _div_round_up(const struct clk_div_table *table,
unsigned long parent_rate, unsigned long rate,
unsigned long flags)
{
- int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
+ int div = DIV_ROUND_UP(parent_rate, rate);
if (flags & CLK_DIVIDER_POWER_OF_TWO)
div = __roundup_pow_of_two(div);
@@ -237,7 +237,7 @@ static int _div_round_closest(const struct clk_div_table *table,
int up, down;
unsigned long up_rate, down_rate;
- up = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
+ up = DIV_ROUND_UP(parent_rate, rate);
down = parent_rate / rate;
if (flags & CLK_DIVIDER_POWER_OF_TWO) {
@@ -473,7 +473,7 @@ int divider_get_val(unsigned long rate, unsigned long parent_rate,
{
unsigned int div, value;
- div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
+ div = DIV_ROUND_UP(parent_rate, rate);
if (!_is_valid_div(table, div, flags))
return -EINVAL;
--
2.39.2
Il 26/05/23 19:10, Sebastian Reichel ha scritto:
> The clock framework handles clock rates as "unsigned long", so u32 on
> 32-bit architectures and u64 on 64-bit architectures.
>
> The current code pointlessly casts the dividend to u64 on 32-bit
> architectures and thus pointlessly reducing the performance.
>
> On the other hand on 64-bit architectures the divisor is masked and only
> the lower 32-bit are used. Thus requesting a frequency >= 4.3GHz results
> in incorrect values. For example requesting 4300000000 (4.3 GHz) will
> effectively request ca. 5 MHz. Requesting clk_round_rate(clk, ULONG_MAX)
> is a bit of a special case, since that still returns correct values as
> long as the parent clock is below 8.5 GHz.
>
> Signed-off-by: Sebastian Reichel <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Quoting Sebastian Reichel (2023-05-26 10:10:57)
> The clock framework handles clock rates as "unsigned long", so u32 on
> 32-bit architectures and u64 on 64-bit architectures.
>
> The current code pointlessly casts the dividend to u64 on 32-bit
> architectures and thus pointlessly reducing the performance.
It looks like that was done to make the DIV_ROUND_UP() macro not
overflow the dividend on 32-bit machines (from 9556f9dad8f5):
DIV_ROUND_UP(3000000000, 1500000000) = (3.0G + 1.5G - 1) / 1.5G
= OVERFLOW / 1.5G
but I agree, the u64 cast is not necessary if DIV_ROUND_UP_ULL() is
used as that macro casts the dividend to unsigned long long anyway.
>
> On the other hand on 64-bit architectures the divisor is masked and only
> the lower 32-bit are used. Thus requesting a frequency >= 4.3GHz results
> in incorrect values. For example requesting 4300000000 (4.3 GHz) will
> effectively request ca. 5 MHz.
Nice catch. But I'm concerned that the case above is broken by changing
to DIV_ROUND_UP(). As this code is generic, I fear we'll have to change
this code that divides rates to use DIV64_U64_ROUND_UP() because we
don't know how large the rate is (i.e. it could be larger than 32-bits
on a 64-bit machine).
> Requesting clk_round_rate(clk, ULONG_MAX)
> is a bit of a special case, since that still returns correct values as
> long as the parent clock is below 8.5 GHz.
>
> Signed-off-by: Sebastian Reichel <[email protected]>
> ---
> drivers/clk/clk-divider.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
> index a2c2b5203b0a..c38e8aa60e54 100644
> --- a/drivers/clk/clk-divider.c
> +++ b/drivers/clk/clk-divider.c
> @@ -220,7 +220,7 @@ static int _div_round_up(const struct clk_div_table *table,
> unsigned long parent_rate, unsigned long rate,
> unsigned long flags)
> {
> - int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
> + int div = DIV_ROUND_UP(parent_rate, rate);
>
> if (flags & CLK_DIVIDER_POWER_OF_TWO)
> div = __roundup_pow_of_two(div);
> @@ -237,7 +237,7 @@ static int _div_round_closest(const struct clk_div_table *table,
> int up, down;
> unsigned long up_rate, down_rate;
>
> - up = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
> + up = DIV_ROUND_UP(parent_rate, rate);
> down = parent_rate / rate;
>
> if (flags & CLK_DIVIDER_POWER_OF_TWO) {
> @@ -473,7 +473,7 @@ int divider_get_val(unsigned long rate, unsigned long parent_rate,
> {
> unsigned int div, value;
>
> - div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
> + div = DIV_ROUND_UP(parent_rate, rate);
>
> if (!_is_valid_div(table, div, flags))
> return -EINVAL;
This is undoing parts of commit 9556f9dad8f5 ("clk: divider: handle
integer overflow when dividing large clock rates"). Please pair this
patch with extensive kunit tests in a new test suite clk-divider_test.c
file. I don't know if UML supports changing sizeof(long), but that would
be a cool feature to tease out these sorts of issues. I suppose we'll
just have to run the kunit tests on various architectures to cover the
possibilities.
From: Stephen Boyd
> Sent: 13 June 2023 01:42
>
> Quoting Sebastian Reichel (2023-05-26 10:10:57)
> > The clock framework handles clock rates as "unsigned long", so u32 on
> > 32-bit architectures and u64 on 64-bit architectures.
> >
> > The current code pointlessly casts the dividend to u64 on 32-bit
> > architectures and thus pointlessly reducing the performance.
>
> It looks like that was done to make the DIV_ROUND_UP() macro not
> overflow the dividend on 32-bit machines (from 9556f9dad8f5):
>
> DIV_ROUND_UP(3000000000, 1500000000) = (3.0G + 1.5G - 1) / 1.5G
> = OVERFLOW / 1.5G
Maybe add:
#define DIV_ROUND_UP_NZ(x, y) (((x) - 1)/(y) + 1)
which doesn't overflow but requires x != 0.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)