2023-06-05 19:16:20

by Frank Oltmanns

[permalink] [raw]
Subject: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

In case the CLK_SET_RATE_PARENT flag is set, consider using a different
parent rate when determining a new rate.

To find the best match for the requested rate, perform the following
steps for each NKM combination:
- calculate the optimal parent rate,
- find the best parent rate that the parent clock actually supports
- use that parent rate to calculate the effective rate.

In case the clk does not support setting the parent rate, use the same
algorithm as before.

Signed-off-by: Frank Oltmanns <[email protected]>
---
drivers/clk/sunxi-ng/ccu_nkm.c | 40 ++++++++++++++++++++++++++--------
1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
index a0978a50edae..c71e237226f2 100644
--- a/drivers/clk/sunxi-ng/ccu_nkm.c
+++ b/drivers/clk/sunxi-ng/ccu_nkm.c
@@ -16,10 +16,10 @@ struct _ccu_nkm {
unsigned long m, min_m, max_m;
};

-static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
- struct _ccu_nkm *nkm)
+static unsigned long ccu_nkm_find_best(unsigned long *parent, unsigned long rate,
+ struct _ccu_nkm *nkm, struct clk_hw *parent_hw)
{
- unsigned long best_rate = 0;
+ unsigned long best_rate = 0, best_parent_rate = 0, tmp_parent = *parent;
unsigned long best_n = 0, best_k = 0, best_m = 0;
unsigned long _n, _k, _m;

@@ -28,12 +28,29 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
for (_m = nkm->min_m; _m <= nkm->max_m; _m++) {
unsigned long tmp_rate;

- tmp_rate = parent * _n * _k / _m;
+ if (parent_hw) {
+ // We must round up the desired parent rate, because the
+ // rounding down happens when calculating tmp_rate. If we
+ // round down also here, we'd round down twice.
+ unsigned long optimal_parent =
+ (rate * _m + (_n * _k - 1)) / _n / _k;
+
+ tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
+
+ // if the optimal parent rate is below the minimum rate of
+ // the parent ignore this combination
+ if (tmp_parent > optimal_parent)
+ continue;
+ tmp_rate = tmp_parent * _n * _k / _m;
+ } else {
+ tmp_rate = tmp_parent * _n * _k / _m;
+ if (tmp_rate > rate)
+ continue;
+ }

- if (tmp_rate > rate)
- continue;
if ((rate - tmp_rate) < (rate - best_rate)) {
best_rate = tmp_rate;
+ best_parent_rate = tmp_parent;
best_n = _n;
best_k = _k;
best_m = _m;
@@ -46,6 +63,8 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
nkm->k = best_k;
nkm->m = best_m;

+ *parent = best_parent_rate;
+
return best_rate;
}

@@ -106,7 +125,7 @@ static unsigned long ccu_nkm_recalc_rate(struct clk_hw *hw,
}

static unsigned long ccu_nkm_round_rate(struct ccu_mux_internal *mux,
- struct clk_hw *hw,
+ struct clk_hw *parent_hw,
unsigned long *parent_rate,
unsigned long rate,
void *data)
@@ -124,7 +143,10 @@ static unsigned long ccu_nkm_round_rate(struct ccu_mux_internal *mux,
if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV)
rate *= nkm->fixed_post_div;

- rate = ccu_nkm_find_best(*parent_rate, rate, &_nkm);
+ if (!clk_hw_can_set_rate_parent(&nkm->common.hw))
+ rate = ccu_nkm_find_best(parent_rate, rate, &_nkm, NULL);
+ else
+ rate = ccu_nkm_find_best(parent_rate, rate, &_nkm, parent_hw);

if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV)
rate /= nkm->fixed_post_div;
@@ -159,7 +181,7 @@ static int ccu_nkm_set_rate(struct clk_hw *hw, unsigned long rate,
_nkm.min_m = 1;
_nkm.max_m = nkm->m.max ?: 1 << nkm->m.width;

- ccu_nkm_find_best(parent_rate, rate, &_nkm);
+ ccu_nkm_find_best(&parent_rate, rate, &_nkm, NULL);

spin_lock_irqsave(nkm->common.lock, flags);

--
2.40.1



2023-06-07 06:52:08

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

On Mon, Jun 05, 2023 at 09:07:44PM +0200, Frank Oltmanns wrote:
> In case the CLK_SET_RATE_PARENT flag is set, consider using a different
> parent rate when determining a new rate.
>
> To find the best match for the requested rate, perform the following
> steps for each NKM combination:
> - calculate the optimal parent rate,
> - find the best parent rate that the parent clock actually supports
> - use that parent rate to calculate the effective rate.
>
> In case the clk does not support setting the parent rate, use the same
> algorithm as before.
>
> Signed-off-by: Frank Oltmanns <[email protected]>
> ---
> drivers/clk/sunxi-ng/ccu_nkm.c | 40 ++++++++++++++++++++++++++--------
> 1 file changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
> index a0978a50edae..c71e237226f2 100644
> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
> @@ -16,10 +16,10 @@ struct _ccu_nkm {
> unsigned long m, min_m, max_m;
> };
>
> -static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
> - struct _ccu_nkm *nkm)
> +static unsigned long ccu_nkm_find_best(unsigned long *parent, unsigned long rate,
> + struct _ccu_nkm *nkm, struct clk_hw *parent_hw)
> {
> - unsigned long best_rate = 0;
> + unsigned long best_rate = 0, best_parent_rate = 0, tmp_parent = *parent;
> unsigned long best_n = 0, best_k = 0, best_m = 0;
> unsigned long _n, _k, _m;
>
> @@ -28,12 +28,29 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
> for (_m = nkm->min_m; _m <= nkm->max_m; _m++) {
> unsigned long tmp_rate;
>
> - tmp_rate = parent * _n * _k / _m;
> + if (parent_hw) {

NKM clocks always have a parent

You should test if the CLK_SET_RATE_PARENT flag is set.

> + // We must round up the desired parent rate, because the
> + // rounding down happens when calculating tmp_rate. If we
> + // round down also here, we'd round down twice.
> + unsigned long optimal_parent =
> + (rate * _m + (_n * _k - 1)) / _n / _k;

I assume the addition of n * k - 1 is to round up, but I'm not sure we
should hack around like that.

You should compute the ideal parent rate for a given set of timings, and
then just call round_rate on it. If the parent wants to round it one way
or another, that's the parent concern.

Maxime


Attachments:
(No filename) (2.42 kB)
signature.asc (235.00 B)
Download all attachments

2023-06-08 09:50:55

by Frank Oltmanns

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

Hi Maxime,

On 2023-06-07 at 08:38:39 +0200, Maxime Ripard <[email protected]> wrote:
> [[PGP Signed Part:Undecided]]
> On Mon, Jun 05, 2023 at 09:07:44PM +0200, Frank Oltmanns wrote:
>> In case the CLK_SET_RATE_PARENT flag is set, consider using a different
>> parent rate when determining a new rate.
>>
>> To find the best match for the requested rate, perform the following
>> steps for each NKM combination:
>> - calculate the optimal parent rate,
>> - find the best parent rate that the parent clock actually supports
>> - use that parent rate to calculate the effective rate.
>>
>> In case the clk does not support setting the parent rate, use the same
>> algorithm as before.
>>
>> Signed-off-by: Frank Oltmanns <[email protected]>
>> ---
>> drivers/clk/sunxi-ng/ccu_nkm.c | 40 ++++++++++++++++++++++++++--------
>> 1 file changed, 31 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
>> index a0978a50edae..c71e237226f2 100644
>> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
>> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
>> @@ -16,10 +16,10 @@ struct _ccu_nkm {
>> unsigned long m, min_m, max_m;
>> };
>>
>> -static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>> - struct _ccu_nkm *nkm)
>> +static unsigned long ccu_nkm_find_best(unsigned long *parent, unsigned long rate,
>> + struct _ccu_nkm *nkm, struct clk_hw *parent_hw)
>> {
>> - unsigned long best_rate = 0;
>> + unsigned long best_rate = 0, best_parent_rate = 0, tmp_parent = *parent;
>> unsigned long best_n = 0, best_k = 0, best_m = 0;
>> unsigned long _n, _k, _m;
>>
>> @@ -28,12 +28,29 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>> for (_m = nkm->min_m; _m <= nkm->max_m; _m++) {
>> unsigned long tmp_rate;
>>
>> - tmp_rate = parent * _n * _k / _m;
>> + if (parent_hw) {
>
> NKM clocks always have a parent
>
> You should test if the CLK_SET_RATE_PARENT flag is set.

ccu_nkm_find_best is called in the following two situations:
a. from ccu_nkm_set_rate when setting the rate
b. from ccu_nkm_round_rate when determining the rate

In situation a. we never want ccu_nkm_find_best to try different parent
rates because setting the parent rate is a done deal (at least that's my
understanding).

In situation b. we only want ccu_nkm_find_best to try different parent
rates when, as you mentioned, the CLK_SET_RATE_PARENT flag is set.

So, what this patch does, it provides a NULL pointer as parent_hw when
we don't want ccu_nkm_find_best to try alternative parent rates.

Is it ok if I add a comment to ccu_nkm_find_best that explains the
function and explicitly also the parameters?

I also thought about using two different functions for the two
situations. I have no strong opinion which is better.

However, I don't think we should hand over the flags to this function,
because we'd still only need to provide the parent_hw if we want to find
the optimal parent rate, so having two parametes for the same purpose
seems redundant. Unless, there is a rule to not use NULL pointers.

>
>> + // We must round up the desired parent rate, because the
>> + // rounding down happens when calculating tmp_rate. If we
>> + // round down also here, we'd round down twice.
>> + unsigned long optimal_parent =
>> + (rate * _m + (_n * _k - 1)) / _n / _k;
>
> I assume the addition of n * k - 1 is to round up, but I'm not sure we
> should hack around like that.
>
> You should compute the ideal parent rate for a given set of timings, and
> then just call round_rate on it. If the parent wants to round it one way
> or another, that's the parent concern.

I admit that the comment explaining this is not doing the complexity of
this issue any justice. Let me try to explain:

Let's say for our panel the optimal rate for pll-mipi is 449064000. The
best closest we can get is 449035712 with a parent rate of 217714285
(n=11, k=3, m=16).

Eventually, ccu_nkm_find_best is going to be called with 449035712 as
the rate. If we don't round up, like I proposend, but instead calculate:
optimal_parent = rate * m / n / k
(which is, I think, what you you're proposing) leading to an optimal
parent of 217714284 (!). We can't get 217714284 from the parent (we
could get 217714285, but we're not asking for that) so the parent rounds
down.

To make things worse, this story continues for the new "best rate" as
well.

In the end, ccu_nkm_find_best claims:
- the optimal rate for 449064000 is 449035712 (parent=217714285, n=11,
k=3, m=16)
- but ccu_nkm_find_best would claim that the optimal rate for 449035712
is 449018181 (parent=235200000, n=7, k=3, m=11)
- and finally, the optimal rate for 449018181 is 449018180
(parent=213818181, n=7, k=3, m=10)

This doesn't seem right to me.

But you're also right, in that we can't just always round up. In a
hypothetical example that we request a parent rate of 450000000. With
rounding up, we'd get an optimal parent rate of 218181819 for n=11, k=3,
m=16. And let's now further claim that the parent could provide exactly
that rate, we'd end up with a rate of 450000001. So, we'd overshoot,
which (currently) is not acceptable.

Hmm... I currently can't think of a clever way to solve this other than
this:

optimal_parent = (rate * _m + (_n * _k - 1)) / _n / _k;
tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
tmp_rate = tmp_parent * _n * _k / _m;
if (tmp_rate > rate) {
optimal_parent = rate * m / n / k
tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
tmp_rate = tmp_parent * _n * _k / _m;
}
if (tmp_parent > optimal_parent)
continue;

This seems ugly, but at least it should work in all cases. Any opinions?

Cheers,
Frank

>
> Maxime
>
> [[End of PGP Signed Part]]

2023-06-12 12:42:03

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

On Wed, Jun 07, 2023 at 09:39:35AM +0200, Frank Oltmanns wrote:
> Hi Maxime,
>
> On 2023-06-07 at 08:38:39 +0200, Maxime Ripard <[email protected]> wrote:
> > [[PGP Signed Part:Undecided]]
> > On Mon, Jun 05, 2023 at 09:07:44PM +0200, Frank Oltmanns wrote:
> >> In case the CLK_SET_RATE_PARENT flag is set, consider using a different
> >> parent rate when determining a new rate.
> >>
> >> To find the best match for the requested rate, perform the following
> >> steps for each NKM combination:
> >> - calculate the optimal parent rate,
> >> - find the best parent rate that the parent clock actually supports
> >> - use that parent rate to calculate the effective rate.
> >>
> >> In case the clk does not support setting the parent rate, use the same
> >> algorithm as before.
> >>
> >> Signed-off-by: Frank Oltmanns <[email protected]>
> >> ---
> >> drivers/clk/sunxi-ng/ccu_nkm.c | 40 ++++++++++++++++++++++++++--------
> >> 1 file changed, 31 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
> >> index a0978a50edae..c71e237226f2 100644
> >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
> >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
> >> @@ -16,10 +16,10 @@ struct _ccu_nkm {
> >> unsigned long m, min_m, max_m;
> >> };
> >>
> >> -static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
> >> - struct _ccu_nkm *nkm)
> >> +static unsigned long ccu_nkm_find_best(unsigned long *parent, unsigned long rate,
> >> + struct _ccu_nkm *nkm, struct clk_hw *parent_hw)
> >> {
> >> - unsigned long best_rate = 0;
> >> + unsigned long best_rate = 0, best_parent_rate = 0, tmp_parent = *parent;
> >> unsigned long best_n = 0, best_k = 0, best_m = 0;
> >> unsigned long _n, _k, _m;
> >>
> >> @@ -28,12 +28,29 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
> >> for (_m = nkm->min_m; _m <= nkm->max_m; _m++) {
> >> unsigned long tmp_rate;
> >>
> >> - tmp_rate = parent * _n * _k / _m;
> >> + if (parent_hw) {
> >
> > NKM clocks always have a parent
> >
> > You should test if the CLK_SET_RATE_PARENT flag is set.
>
> ccu_nkm_find_best is called in the following two situations:
> a. from ccu_nkm_set_rate when setting the rate
> b. from ccu_nkm_round_rate when determining the rate
>
> In situation a. we never want ccu_nkm_find_best to try different parent
> rates because setting the parent rate is a done deal (at least that's my
> understanding).
>
> In situation b. we only want ccu_nkm_find_best to try different parent
> rates when, as you mentioned, the CLK_SET_RATE_PARENT flag is set.

It doesn't really matter though. The output of that function must be
stable and must return the same set of factors and parent rate for a
given target rate.

So you can call it as many times as you want, it doesn't really matter.

> So, what this patch does, it provides a NULL pointer as parent_hw when
> we don't want ccu_nkm_find_best to try alternative parent rates.

At best, the argument is misleading then. You're not passing a pointer
to the parent, you're telling it whether it should look for other
parents or not. And it's not a pointer, it's a boolean.

> Is it ok if I add a comment to ccu_nkm_find_best that explains the
> function and explicitly also the parameters?

Sure

> I also thought about using two different functions for the two
> situations. I have no strong opinion which is better.
>
> However, I don't think we should hand over the flags to this function,
> because we'd still only need to provide the parent_hw if we want to
> find the optimal parent rate, so having two parametes for the same
> purpose seems redundant. Unless, there is a rule to not use NULL
> pointers.

Again, the behaviour must be stable across all calling sites.

> >
> >> + // We must round up the desired parent rate, because the
> >> + // rounding down happens when calculating tmp_rate. If we
> >> + // round down also here, we'd round down twice.
> >> + unsigned long optimal_parent =
> >> + (rate * _m + (_n * _k - 1)) / _n / _k;
> >
> > I assume the addition of n * k - 1 is to round up, but I'm not sure we
> > should hack around like that.
> >
> > You should compute the ideal parent rate for a given set of timings, and
> > then just call round_rate on it. If the parent wants to round it one way
> > or another, that's the parent concern.
>
> I admit that the comment explaining this is not doing the complexity of
> this issue any justice. Let me try to explain:
>
> Let's say for our panel the optimal rate for pll-mipi is 449064000. The
> best closest we can get is 449035712 with a parent rate of 217714285
> (n=11, k=3, m=16).
>
> Eventually, ccu_nkm_find_best is going to be called with 449035712 as
> the rate. If we don't round up, like I proposend, but instead calculate:
> optimal_parent = rate * m / n / k
> (which is, I think, what you you're proposing) leading to an optimal
> parent of 217714284 (!). We can't get 217714284 from the parent (we
> could get 217714285, but we're not asking for that) so the parent rounds
> down.
>
> To make things worse, this story continues for the new "best rate" as
> well.
>
> In the end, ccu_nkm_find_best claims:
> - the optimal rate for 449064000 is 449035712 (parent=217714285, n=11,
> k=3, m=16)
> - but ccu_nkm_find_best would claim that the optimal rate for 449035712
> is 449018181 (parent=235200000, n=7, k=3, m=11)
> - and finally, the optimal rate for 449018181 is 449018180
> (parent=213818181, n=7, k=3, m=10)
>
> This doesn't seem right to me.
>
> But you're also right, in that we can't just always round up. In a
> hypothetical example that we request a parent rate of 450000000. With
> rounding up, we'd get an optimal parent rate of 218181819 for n=11, k=3,
> m=16. And let's now further claim that the parent could provide exactly
> that rate, we'd end up with a rate of 450000001. So, we'd overshoot,
> which (currently) is not acceptable.
>
> Hmm... I currently can't think of a clever way to solve this other than
> this:
>
> optimal_parent = (rate * _m + (_n * _k - 1)) / _n / _k;
> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
> tmp_rate = tmp_parent * _n * _k / _m;
> if (tmp_rate > rate) {
> optimal_parent = rate * m / n / k
> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
> tmp_rate = tmp_parent * _n * _k / _m;
> }
> if (tmp_parent > optimal_parent)
> continue;
>
> This seems ugly, but at least it should work in all cases. Any opinions?

Again, you shouldn't work around the issue.

It's very simple really: you already computed the optimal parent rate,
you ask the parent to compute whatever is closest to that optimal parent
rate.

It's the parent responsibility now. It's the parent decision to figure
out what "the closest" means, if it can change rate, if it has any range
limitation, etc. You can't work around that.

What you actually want there is the parent to actually provide the
closest rate, even if it means overshooting. That's fine, we have a flag
for that: CLK_(MUX|DIVIDER)_ROUND_CLOSEST. We just need to set it on the
parent and be done with it.

Maxime


Attachments:
(No filename) (7.26 kB)
signature.asc (235.00 B)
Download all attachments

2023-06-12 17:10:04

by Frank Oltmanns

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

Hi Maxime,

I'm sorry that the following mail is a bit long. I'm not sure, there is
some kind of misunderstanding/miscommunication somewhere, I'm just not
sure, where. :)

This mail is aiming at finding the exact points where we apparently go
down different paths.

On 2023-06-12 at 14:19:05 +0200, Maxime Ripard <[email protected]> wrote:
> [[PGP Signed Part:Undecided]]
> On Wed, Jun 07, 2023 at 09:39:35AM +0200, Frank Oltmanns wrote:
>> Hi Maxime,
>>
>> On 2023-06-07 at 08:38:39 +0200, Maxime Ripard <[email protected]> wrote:
>> > [[PGP Signed Part:Undecided]]
>> > On Mon, Jun 05, 2023 at 09:07:44PM +0200, Frank Oltmanns wrote:
>> >> In case the CLK_SET_RATE_PARENT flag is set, consider using a different
>> >> parent rate when determining a new rate.
>> >>
>> >> To find the best match for the requested rate, perform the following
>> >> steps for each NKM combination:
>> >> - calculate the optimal parent rate,
>> >> - find the best parent rate that the parent clock actually supports
>> >> - use that parent rate to calculate the effective rate.
>> >>
>> >> In case the clk does not support setting the parent rate, use the same
>> >> algorithm as before.
>> >>
>> >> Signed-off-by: Frank Oltmanns <[email protected]>
>> >> ---
>> >> drivers/clk/sunxi-ng/ccu_nkm.c | 40 ++++++++++++++++++++++++++--------
>> >> 1 file changed, 31 insertions(+), 9 deletions(-)
>> >>
>> >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> index a0978a50edae..c71e237226f2 100644
>> >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> @@ -16,10 +16,10 @@ struct _ccu_nkm {
>> >> unsigned long m, min_m, max_m;
>> >> };
>> >>
>> >> -static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>> >> - struct _ccu_nkm *nkm)
>> >> +static unsigned long ccu_nkm_find_best(unsigned long *parent, unsigned long rate,
>> >> + struct _ccu_nkm *nkm, struct clk_hw *parent_hw)
>> >> {
>> >> - unsigned long best_rate = 0;
>> >> + unsigned long best_rate = 0, best_parent_rate = 0, tmp_parent = *parent;
>> >> unsigned long best_n = 0, best_k = 0, best_m = 0;
>> >> unsigned long _n, _k, _m;
>> >>
>> >> @@ -28,12 +28,29 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>> >> for (_m = nkm->min_m; _m <= nkm->max_m; _m++) {
>> >> unsigned long tmp_rate;
>> >>
>> >> - tmp_rate = parent * _n * _k / _m;
>> >> + if (parent_hw) {
>> >
>> > NKM clocks always have a parent
>> >
>> > You should test if the CLK_SET_RATE_PARENT flag is set.
>>
>> ccu_nkm_find_best is called in the following two situations:
>> a. from ccu_nkm_set_rate when setting the rate
>> b. from ccu_nkm_round_rate when determining the rate
>>
>> In situation a. we never want ccu_nkm_find_best to try different parent
>> rates because setting the parent rate is a done deal (at least that's my
>> understanding).
>>
>> In situation b. we only want ccu_nkm_find_best to try different parent
>> rates when, as you mentioned, the CLK_SET_RATE_PARENT flag is set.
>
> It doesn't really matter though. The output of that function must be
> stable and must return the same set of factors and parent rate for a
> given target rate.
>

I'm not sure if we're talking about the same thing here. Of course the
set of factors and parent rate for a given target rate will be different
depending on the fact if we can or cannot adjust the parent rate,
agreed?

Let me compare my implementation to ccu_mp.

ccu_mp_round_rate either calls the function ccu_mp_find_best or
ccu_mp_find_best_with_parent_adj, depending on CLK_SET_RATE_PARENT.

I'm basically doing the same thing, but (!) ccu_nkm_find_best and
ccu_nkm_find_best_with_parent_adj would be almost identical. Therefore,
I opted to extend ccu_nkm_find_best to also support the parent
adjustment. If you look at V2 of this patch, you will see that the only
diffences are an if statement (if (parent_hw)) with two lines of code in
the if's body and the fact that we need to store the best parent rate.

If you prefer, I can split this into two separate functions like in
ccu_mp. I think all the confusion is coming from the fact that I didn't.
So apparently it was not a good idea to keep it as one function.

Should I introduce ccu_nkm_find_best_with_parent_adj instead of using
ccu_nkm_find_best for both cases?

>
> So you can call it as many times as you want, it doesn't really matter.

Of course! What did I write that made you think, I thought otherwise?

>
>> So, what this patch does, it provides a NULL pointer as parent_hw when
>> we don't want ccu_nkm_find_best to try alternative parent rates.
>
> At best, the argument is misleading then. You're not passing a pointer
> to the parent, you're telling it whether it should look for other
> parents or not. And it's not a pointer, it's a boolean.

No, I'm using parent_hw and as a pointer a few lines below when calling
clk_hw_round_rate. So I'd need a boolean AND a pointer. I always need
the pointer if the boolean is true. I never need the pointer when the
boolean is false. Therefore, I opted to choose to use the pointer for
first being a boolean (in the if) and then being a pointer (when calling
clk_hw_round_rate). This is the (hopefully easier to read) code from V2:

if (parent_hw) {
tmp_parent = optimal_parent_rate(rate, _n, _k, _m);
tmp_parent = clk_hw_round_rate(parent_hw, tmp_parent);
}


>
>> Is it ok if I add a comment to ccu_nkm_find_best that explains the
>> function and explicitly also the parameters?
>
> Sure

Done in V2.

>
>> I also thought about using two different functions for the two
>> situations. I have no strong opinion which is better.
>>
>> However, I don't think we should hand over the flags to this function,
>> because we'd still only need to provide the parent_hw if we want to
>> find the optimal parent rate, so having two parametes for the same
>> purpose seems redundant. Unless, there is a rule to not use NULL
>> pointers.
>
> Again, the behaviour must be stable across all calling sites.

No argument here.

>
>> >
>> >> + // We must round up the desired parent rate, because the
>> >> + // rounding down happens when calculating tmp_rate. If we
>> >> + // round down also here, we'd round down twice.
>> >> + unsigned long optimal_parent =
>> >> + (rate * _m + (_n * _k - 1)) / _n / _k;
>> >
>> > I assume the addition of n * k - 1 is to round up, but I'm not sure we
>> > should hack around like that.
>> >
>> > You should compute the ideal parent rate for a given set of timings, and
>> > then just call round_rate on it. If the parent wants to round it one way
>> > or another, that's the parent concern.
>>
>> I admit that the comment explaining this is not doing the complexity of
>> this issue any justice. Let me try to explain:
>>
>> Let's say for our panel the optimal rate for pll-mipi is 449064000. The
>> best closest we can get is 449035712 with a parent rate of 217714285
>> (n=11, k=3, m=16).
>>
>> Eventually, ccu_nkm_find_best is going to be called with 449035712 as
>> the rate. If we don't round up, like I proposend, but instead calculate:
>> optimal_parent = rate * m / n / k
>> (which is, I think, what you you're proposing) leading to an optimal
>> parent of 217714284 (!). We can't get 217714284 from the parent (we
>> could get 217714285, but we're not asking for that) so the parent rounds
>> down.
>>
>> To make things worse, this story continues for the new "best rate" as
>> well.
>>
>> In the end, ccu_nkm_find_best claims:
>> - the optimal rate for 449064000 is 449035712 (parent=217714285, n=11,
>> k=3, m=16)
>> - but ccu_nkm_find_best would claim that the optimal rate for 449035712
>> is 449018181 (parent=235200000, n=7, k=3, m=11)
>> - and finally, the optimal rate for 449018181 is 449018180
>> (parent=213818181, n=7, k=3, m=10)
>>
>> This doesn't seem right to me.
>>
>> But you're also right, in that we can't just always round up. In a
>> hypothetical example that we request a parent rate of 450000000. With
>> rounding up, we'd get an optimal parent rate of 218181819 for n=11, k=3,
>> m=16. And let's now further claim that the parent could provide exactly
>> that rate, we'd end up with a rate of 450000001. So, we'd overshoot,
>> which (currently) is not acceptable.
>>
>> Hmm... I currently can't think of a clever way to solve this other than
>> this:
>>
>> optimal_parent = (rate * _m + (_n * _k - 1)) / _n / _k;
>> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
>> tmp_rate = tmp_parent * _n * _k / _m;
>> if (tmp_rate > rate) {
>> optimal_parent = rate * m / n / k
>> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
>> tmp_rate = tmp_parent * _n * _k / _m;
>> }
>> if (tmp_parent > optimal_parent)
>> continue;
>>
>> This seems ugly, but at least it should work in all cases. Any opinions?
>
> Again, you shouldn't work around the issue.
>
> It's very simple really: you already computed the optimal parent rate,

No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
best rate for rate = 449035712, it should try to include 449035712 in
its candidates, agreed?

Example 1:
==========
rate=449035712, n=11, k=3, m=16:
We should as for a parent rate of 217714285, because:
217714285 * 11 * 3 / 16 = 449035712

How do we get from 449035712 to 217714285, you ask?

DIV_ROUND_UP(rate * m, n * k)

Do you agree that we should ask the parent for 217714285 in case we want
a rate of 449035712 and we're currently evaluating the case n=11, k=3,
m=16?

We should not ask for a parent rate of 217714284, because:
217714284 * 11 * 3 / 16 = 449035710

Example 2:
==========
rate=500000000, n=11, k=3, m=16:
Here we should not ask the parent for
DIV_ROUND_UP(rate * m, n * k)
because that would be 242424243.

242424243 * 11 * 3 / 16 = 500000001

We (the NKM clock, not the parent!) would overshoot (please see at the
end of this mail, why (for now) I don't want to support overshooting in
the NKM clock).

Instead we should as for a parent rate of 242424242, because:
242424242 * 11 * 3 / 16 = 499999999

In conclusion, there are cases, where we (the NKM clock) have to ask the
parent for
DIV_ROUND_UP(rate * m, n * k)
And there are also cases, where we have to ask the parent for
rate * m / (n * k)

This is what the code is trying to do. Maybe it's easier to look at V2
because I extracted the calcultion of the optimal parent rate into a
separate function hoping that this makes things clearer.

Let me stress this: When calculating the optimal rate for the parent,
I'm not making any assumptions here about how the PARENT clock rounds.
In fact, I assume that the parent could be perfect and always provides
the rate it is asked for. I only take into account how the nkm clock
rounds.

> you ask the parent to compute whatever is closest to that optimal parent
> rate.
>
> It's the parent responsibility now. It's the parent decision to figure
> out what "the closest" means, if it can change rate, if it has any range
> limitation, etc. You can't work around that.
>
> What you actually want there is the parent to actually provide the
> closest rate, even if it means overshooting.
>

I want to ask the parent for a rate, that would actually result in the
rate that nkm_find_best was asked for. Are you asking me to instead ask
the parent for a rate that doesn't fit that criterion?

> That's fine, we have a flag
> for that: CLK_(MUX|DIVIDER)_ROUND_CLOSEST. We just need to set it on the
> parent and be done with it.

This is a totally different issue. If you carefully look at ccu_mp, you
will see that it would ignore cases when its parent had rounded up.
ccu_nkm is no different. Teaching all of sunxi-ng's clocks to respect
ROUND_CLOSEST is a totally different beast. For now, sunxi-ng always
expects rounding down.

I do not like mixing the two into one patchset. I hope that's a fair
request? I tried to mix it and it was a nightmare! If you want, I can
try tackling ROUND_CLOSEST afterwards. But I don't think it would help
with this patchset, because we'd need to support both the ROUND_CLOSEST
and ~ROUND_CLOSEST case. Covering one case seems already hard enough. :)

Thank you for your patience,
Frank

>
> Maxime
>
> [[End of PGP Signed Part]]

2023-06-13 09:31:25

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

Hi,

On Mon, Jun 12, 2023 at 06:29:36PM +0200, Frank Oltmanns wrote:
> Hi Maxime,
>
> I'm sorry that the following mail is a bit long. I'm not sure, there is
> some kind of misunderstanding/miscommunication somewhere, I'm just not
> sure, where. :)
>
> This mail is aiming at finding the exact points where we apparently go
> down different paths.
>
> On 2023-06-12 at 14:19:05 +0200, Maxime Ripard <[email protected]> wrote:
> > [[PGP Signed Part:Undecided]]
> > On Wed, Jun 07, 2023 at 09:39:35AM +0200, Frank Oltmanns wrote:
> >> Hi Maxime,
> >>
> >> On 2023-06-07 at 08:38:39 +0200, Maxime Ripard <[email protected]> wrote:
> >> > [[PGP Signed Part:Undecided]]
> >> > On Mon, Jun 05, 2023 at 09:07:44PM +0200, Frank Oltmanns wrote:
> >> >> In case the CLK_SET_RATE_PARENT flag is set, consider using a different
> >> >> parent rate when determining a new rate.
> >> >>
> >> >> To find the best match for the requested rate, perform the following
> >> >> steps for each NKM combination:
> >> >> - calculate the optimal parent rate,
> >> >> - find the best parent rate that the parent clock actually supports
> >> >> - use that parent rate to calculate the effective rate.
> >> >>
> >> >> In case the clk does not support setting the parent rate, use the same
> >> >> algorithm as before.
> >> >>
> >> >> Signed-off-by: Frank Oltmanns <[email protected]>
> >> >> ---
> >> >> drivers/clk/sunxi-ng/ccu_nkm.c | 40 ++++++++++++++++++++++++++--------
> >> >> 1 file changed, 31 insertions(+), 9 deletions(-)
> >> >>
> >> >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
> >> >> index a0978a50edae..c71e237226f2 100644
> >> >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
> >> >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
> >> >> @@ -16,10 +16,10 @@ struct _ccu_nkm {
> >> >> unsigned long m, min_m, max_m;
> >> >> };
> >> >>
> >> >> -static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
> >> >> - struct _ccu_nkm *nkm)
> >> >> +static unsigned long ccu_nkm_find_best(unsigned long *parent, unsigned long rate,
> >> >> + struct _ccu_nkm *nkm, struct clk_hw *parent_hw)
> >> >> {
> >> >> - unsigned long best_rate = 0;
> >> >> + unsigned long best_rate = 0, best_parent_rate = 0, tmp_parent = *parent;
> >> >> unsigned long best_n = 0, best_k = 0, best_m = 0;
> >> >> unsigned long _n, _k, _m;
> >> >>
> >> >> @@ -28,12 +28,29 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
> >> >> for (_m = nkm->min_m; _m <= nkm->max_m; _m++) {
> >> >> unsigned long tmp_rate;
> >> >>
> >> >> - tmp_rate = parent * _n * _k / _m;
> >> >> + if (parent_hw) {
> >> >
> >> > NKM clocks always have a parent
> >> >
> >> > You should test if the CLK_SET_RATE_PARENT flag is set.
> >>
> >> ccu_nkm_find_best is called in the following two situations:
> >> a. from ccu_nkm_set_rate when setting the rate
> >> b. from ccu_nkm_round_rate when determining the rate
> >>
> >> In situation a. we never want ccu_nkm_find_best to try different parent
> >> rates because setting the parent rate is a done deal (at least that's my
> >> understanding).
> >>
> >> In situation b. we only want ccu_nkm_find_best to try different parent
> >> rates when, as you mentioned, the CLK_SET_RATE_PARENT flag is set.
> >
> > It doesn't really matter though. The output of that function must be
> > stable and must return the same set of factors and parent rate for a
> > given target rate.
> >
>
> I'm not sure if we're talking about the same thing here. Of course the
> set of factors and parent rate for a given target rate will be different
> depending on the fact if we can or cannot adjust the parent rate,
> agreed?

Yes, but here you also have a different behaviour in clk_round_rate()
and in clk_set_rate(), which isn't ok.

Basically, clk_set_rate() + clk_get_rate() must be equal to
clk_round_rate().

If you change if you look for parents depending on whether you're being
called in clk_round_rate() and clk_set_rate(), then you're breaking that
expectation.

> Let me compare my implementation to ccu_mp.
>
> ccu_mp_round_rate either calls the function ccu_mp_find_best or
> ccu_mp_find_best_with_parent_adj, depending on CLK_SET_RATE_PARENT.

Yes, and it's fine: the flag is per-clock, and the output is the same
depending on whether we're being called by clk_round_rate() and
clk_set_rate().

> I'm basically doing the same thing, but (!) ccu_nkm_find_best and
> ccu_nkm_find_best_with_parent_adj would be almost identical. Therefore,
> I opted to extend ccu_nkm_find_best to also support the parent
> adjustment. If you look at V2 of this patch, you will see that the only
> diffences are an if statement (if (parent_hw)) with two lines of code in
> the if's body and the fact that we need to store the best parent rate.
>
> If you prefer, I can split this into two separate functions like in
> ccu_mp. I think all the confusion is coming from the fact that I didn't.
> So apparently it was not a good idea to keep it as one function.
>
> Should I introduce ccu_nkm_find_best_with_parent_adj instead of using
> ccu_nkm_find_best for both cases?
>
> >
> > So you can call it as many times as you want, it doesn't really matter.
>
> Of course! What did I write that made you think, I thought otherwise?
>
> >
> >> So, what this patch does, it provides a NULL pointer as parent_hw when
> >> we don't want ccu_nkm_find_best to try alternative parent rates.
> >
> > At best, the argument is misleading then. You're not passing a pointer
> > to the parent, you're telling it whether it should look for other
> > parents or not. And it's not a pointer, it's a boolean.
>
> No, I'm using parent_hw and as a pointer a few lines below when calling
> clk_hw_round_rate. So I'd need a boolean AND a pointer. I always need
> the pointer if the boolean is true. I never need the pointer when the
> boolean is false. Therefore, I opted to choose to use the pointer for
> first being a boolean (in the if) and then being a pointer (when calling
> clk_hw_round_rate). This is the (hopefully easier to read) code from V2:
>
> if (parent_hw) {
> tmp_parent = optimal_parent_rate(rate, _n, _k, _m);
> tmp_parent = clk_hw_round_rate(parent_hw, tmp_parent);
> }

Again, that clock always has a parent. It's not the actual condition:
what you want to test is whether you want to forward the rate request to
the parent or not. So that condition is misleading.

> >> Is it ok if I add a comment to ccu_nkm_find_best that explains the
> >> function and explicitly also the parameters?
> >
> > Sure
>
> Done in V2.
>
> >
> >> I also thought about using two different functions for the two
> >> situations. I have no strong opinion which is better.
> >>
> >> However, I don't think we should hand over the flags to this function,
> >> because we'd still only need to provide the parent_hw if we want to
> >> find the optimal parent rate, so having two parametes for the same
> >> purpose seems redundant. Unless, there is a rule to not use NULL
> >> pointers.
> >
> > Again, the behaviour must be stable across all calling sites.
>
> No argument here.
>
> >
> >> >
> >> >> + // We must round up the desired parent rate, because the
> >> >> + // rounding down happens when calculating tmp_rate. If we
> >> >> + // round down also here, we'd round down twice.
> >> >> + unsigned long optimal_parent =
> >> >> + (rate * _m + (_n * _k - 1)) / _n / _k;
> >> >
> >> > I assume the addition of n * k - 1 is to round up, but I'm not sure we
> >> > should hack around like that.
> >> >
> >> > You should compute the ideal parent rate for a given set of timings, and
> >> > then just call round_rate on it. If the parent wants to round it one way
> >> > or another, that's the parent concern.
> >>
> >> I admit that the comment explaining this is not doing the complexity of
> >> this issue any justice. Let me try to explain:
> >>
> >> Let's say for our panel the optimal rate for pll-mipi is 449064000. The
> >> best closest we can get is 449035712 with a parent rate of 217714285
> >> (n=11, k=3, m=16).
> >>
> >> Eventually, ccu_nkm_find_best is going to be called with 449035712 as
> >> the rate. If we don't round up, like I proposend, but instead calculate:
> >> optimal_parent = rate * m / n / k
> >> (which is, I think, what you you're proposing) leading to an optimal
> >> parent of 217714284 (!). We can't get 217714284 from the parent (we
> >> could get 217714285, but we're not asking for that) so the parent rounds
> >> down.
> >>
> >> To make things worse, this story continues for the new "best rate" as
> >> well.
> >>
> >> In the end, ccu_nkm_find_best claims:
> >> - the optimal rate for 449064000 is 449035712 (parent=217714285, n=11,
> >> k=3, m=16)
> >> - but ccu_nkm_find_best would claim that the optimal rate for 449035712
> >> is 449018181 (parent=235200000, n=7, k=3, m=11)
> >> - and finally, the optimal rate for 449018181 is 449018180
> >> (parent=213818181, n=7, k=3, m=10)
> >>
> >> This doesn't seem right to me.
> >>
> >> But you're also right, in that we can't just always round up. In a
> >> hypothetical example that we request a parent rate of 450000000. With
> >> rounding up, we'd get an optimal parent rate of 218181819 for n=11, k=3,
> >> m=16. And let's now further claim that the parent could provide exactly
> >> that rate, we'd end up with a rate of 450000001. So, we'd overshoot,
> >> which (currently) is not acceptable.
> >>
> >> Hmm... I currently can't think of a clever way to solve this other than
> >> this:
> >>
> >> optimal_parent = (rate * _m + (_n * _k - 1)) / _n / _k;
> >> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
> >> tmp_rate = tmp_parent * _n * _k / _m;
> >> if (tmp_rate > rate) {
> >> optimal_parent = rate * m / n / k
> >> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
> >> tmp_rate = tmp_parent * _n * _k / _m;
> >> }
> >> if (tmp_parent > optimal_parent)
> >> continue;
> >>
> >> This seems ugly, but at least it should work in all cases. Any opinions?
> >
> > Again, you shouldn't work around the issue.
> >
> > It's very simple really: you already computed the optimal parent rate,
>
> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
> best rate for rate = 449035712, it should try to include 449035712 in
> its candidates, agreed?
>
> Example 1:
> ==========
> rate=449035712, n=11, k=3, m=16:
> We should as for a parent rate of 217714285, because:
> 217714285 * 11 * 3 / 16 = 449035712
>
> How do we get from 449035712 to 217714285, you ask?
>
> DIV_ROUND_UP(rate * m, n * k)

Why are we rounding up? I don't think the hardware will round up there.

> Do you agree that we should ask the parent for 217714285 in case we want
> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
> m=16?
>
> We should not ask for a parent rate of 217714284, because:
> 217714284 * 11 * 3 / 16 = 449035710
>
> Example 2:
> ==========
> rate=500000000, n=11, k=3, m=16:
> Here we should not ask the parent for
> DIV_ROUND_UP(rate * m, n * k)
> because that would be 242424243.
>
> 242424243 * 11 * 3 / 16 = 500000001
>
> We (the NKM clock, not the parent!) would overshoot (please see at the
> end of this mail, why (for now) I don't want to support overshooting in
> the NKM clock).
>
> Instead we should as for a parent rate of 242424242, because:
> 242424242 * 11 * 3 / 16 = 499999999
>
> In conclusion, there are cases, where we (the NKM clock) have to ask the
> parent for
> DIV_ROUND_UP(rate * m, n * k)
> And there are also cases, where we have to ask the parent for
> rate * m / (n * k)

I mean, I think you're overthinking this.

If you never round up and mimic how the hardware behaves, and test all
combination, then eventually you'll find the closest rate.

If you don't because the parent doesn't look for the closest rate, then
the parent should be changed too.

It really is that simple.

> This is what the code is trying to do. Maybe it's easier to look at V2
> because I extracted the calcultion of the optimal parent rate into a
> separate function hoping that this makes things clearer.
>
> Let me stress this: When calculating the optimal rate for the parent,
> I'm not making any assumptions here about how the PARENT clock rounds.
> In fact, I assume that the parent could be perfect and always provides
> the rate it is asked for. I only take into account how the nkm clock
> rounds.

At the very least, you assume that the parent rounding can be "wrong"
and try to work around that.

> > you ask the parent to compute whatever is closest to that optimal parent
> > rate.
> >
> > It's the parent responsibility now. It's the parent decision to figure
> > out what "the closest" means, if it can change rate, if it has any range
> > limitation, etc. You can't work around that.
> >
> > What you actually want there is the parent to actually provide the
> > closest rate, even if it means overshooting.
> >
>
> I want to ask the parent for a rate, that would actually result in the
> rate that nkm_find_best was asked for. Are you asking me to instead ask
> the parent for a rate that doesn't fit that criterion?

No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
and use whatever value it returned.

If it requires changing the parent clock to improve its round_rate
behaviour, then do that too.

> > That's fine, we have a flag
> > for that: CLK_(MUX|DIVIDER)_ROUND_CLOSEST. We just need to set it on the
> > parent and be done with it.
>
> This is a totally different issue.

Why?

> If you carefully look at ccu_mp, you will see that it would ignore
> cases when its parent had rounded up. ccu_nkm is no different.
> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
> totally different beast. For now, sunxi-ng always expects rounding
> down.

Then change that?

It doesn't look that bad to be honest, it's basically change the rate
comparison check by something like mux_is_better_rate() or
_is_best_div(). As far as I'm concerned, you can even do that only for
NKM and MP clocks if you don't want to change everything.

> I do not like mixing the two into one patchset. I hope that's a fair
> request? I tried to mix it and it was a nightmare! If you want, I can
> try tackling ROUND_CLOSEST afterwards. But I don't think it would help
> with this patchset, because we'd need to support both the ROUND_CLOSEST
> and ~ROUND_CLOSEST case. Covering one case seems already hard enough. :)

That's fair, but then remove the rate rounding handling entirely and
only deal with forwarding the rate to the parent if SET_RATE_PARENT is
set.

Maxime


Attachments:
(No filename) (14.79 kB)
signature.asc (235.00 B)
Download all attachments

2023-06-13 10:31:02

by Frank Oltmanns

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

Hi Maxime,

I'll now only respond to one aspect of your mail, because it's the
foundation for the whole behaviour.

On 2023-06-13 at 11:10:08 +0200, Maxime Ripard <[email protected]> wrote:
[...]
>> >> ccu_nkm_find_best is called in the following two situations:
>> >> a. from ccu_nkm_set_rate when setting the rate
>> >> b. from ccu_nkm_round_rate when determining the rate
>> >>
>> >> In situation a. we never want ccu_nkm_find_best to try different parent
>> >> rates because setting the parent rate is a done deal (at least that's my
>> >> understanding).
>> >>
>> >> In situation b. we only want ccu_nkm_find_best to try different parent
>> >> rates when, as you mentioned, the CLK_SET_RATE_PARENT flag is set.
>> >
>> > It doesn't really matter though. The output of that function must be
>> > stable and must return the same set of factors and parent rate for a
>> > given target rate.
>> >
>>
>> I'm not sure if we're talking about the same thing here. Of course the
>> set of factors and parent rate for a given target rate will be different
>> depending on the fact if we can or cannot adjust the parent rate,
>> agreed?
>
> Yes, but here you also have a different behaviour in clk_round_rate()
> and in clk_set_rate(), which isn't ok.
>
> Basically, clk_set_rate() + clk_get_rate() must be equal to
> clk_round_rate().
>
> If you change if you look for parents depending on whether you're being
> called in clk_round_rate() and clk_set_rate(), then you're breaking that
> expectation.
>
>> Let me compare my implementation to ccu_mp.
>>
>> ccu_mp_round_rate either calls the function ccu_mp_find_best or
>> ccu_mp_find_best_with_parent_adj, depending on CLK_SET_RATE_PARENT.
>
> Yes, and it's fine: the flag is per-clock, and the output is the same
> depending on whether we're being called by clk_round_rate() and
> clk_set_rate().
>

The output is really not the same.

ccu_mp_set_rate() always calls ccu_mp_find_best(). It never (!) considers
changing the parent, independent of any flags.

ccu_mp_round_rate() is calling ccu_mp_find_best() OR
ccu_mp_find_best_with_parent_adj() depending on the flag.

If I understand you correctly, you consider that a bug.

I'm doing the same and therefore that is also a bug. Did I get that right?

Best regards,
Frank

>
>> I'm basically doing the same thing, but (!) ccu_nkm_find_best and
>> ccu_nkm_find_best_with_parent_adj would be almost identical. Therefore,
>> I opted to extend ccu_nkm_find_best to also support the parent
>> adjustment. If you look at V2 of this patch, you will see that the only
>> diffences are an if statement (if (parent_hw)) with two lines of code in
>> the if's body and the fact that we need to store the best parent rate.
>>
>> If you prefer, I can split this into two separate functions like in
>> ccu_mp. I think all the confusion is coming from the fact that I didn't.
>> So apparently it was not a good idea to keep it as one function.
>>
>> Should I introduce ccu_nkm_find_best_with_parent_adj instead of using
>> ccu_nkm_find_best for both cases?
>>
>> >
>> > So you can call it as many times as you want, it doesn't really matter.
>>
>> Of course! What did I write that made you think, I thought otherwise?
>>
>> >
>> >> So, what this patch does, it provides a NULL pointer as parent_hw when
>> >> we don't want ccu_nkm_find_best to try alternative parent rates.
>> >
>> > At best, the argument is misleading then. You're not passing a pointer
>> > to the parent, you're telling it whether it should look for other
>> > parents or not. And it's not a pointer, it's a boolean.
>>
>> No, I'm using parent_hw and as a pointer a few lines below when calling
>> clk_hw_round_rate. So I'd need a boolean AND a pointer. I always need
>> the pointer if the boolean is true. I never need the pointer when the
>> boolean is false. Therefore, I opted to choose to use the pointer for
>> first being a boolean (in the if) and then being a pointer (when calling
>> clk_hw_round_rate). This is the (hopefully easier to read) code from V2:
>>
>> if (parent_hw) {
>> tmp_parent = optimal_parent_rate(rate, _n, _k, _m);
>> tmp_parent = clk_hw_round_rate(parent_hw, tmp_parent);
>> }
>
> Again, that clock always has a parent. It's not the actual condition:
> what you want to test is whether you want to forward the rate request to
> the parent or not. So that condition is misleading.
>
>> >> Is it ok if I add a comment to ccu_nkm_find_best that explains the
>> >> function and explicitly also the parameters?
>> >
>> > Sure
>>
>> Done in V2.
>>
>> >
>> >> I also thought about using two different functions for the two
>> >> situations. I have no strong opinion which is better.
>> >>
>> >> However, I don't think we should hand over the flags to this function,
>> >> because we'd still only need to provide the parent_hw if we want to
>> >> find the optimal parent rate, so having two parametes for the same
>> >> purpose seems redundant. Unless, there is a rule to not use NULL
>> >> pointers.
>> >
>> > Again, the behaviour must be stable across all calling sites.
>>
>> No argument here.
>>
>> >
>> >> >
>> >> >> + // We must round up the desired parent rate, because the
>> >> >> + // rounding down happens when calculating tmp_rate. If we
>> >> >> + // round down also here, we'd round down twice.
>> >> >> + unsigned long optimal_parent =
>> >> >> + (rate * _m + (_n * _k - 1)) / _n / _k;
>> >> >
>> >> > I assume the addition of n * k - 1 is to round up, but I'm not sure we
>> >> > should hack around like that.
>> >> >
>> >> > You should compute the ideal parent rate for a given set of timings, and
>> >> > then just call round_rate on it. If the parent wants to round it one way
>> >> > or another, that's the parent concern.
>> >>
>> >> I admit that the comment explaining this is not doing the complexity of
>> >> this issue any justice. Let me try to explain:
>> >>
>> >> Let's say for our panel the optimal rate for pll-mipi is 449064000. The
>> >> best closest we can get is 449035712 with a parent rate of 217714285
>> >> (n=11, k=3, m=16).
>> >>
>> >> Eventually, ccu_nkm_find_best is going to be called with 449035712 as
>> >> the rate. If we don't round up, like I proposend, but instead calculate:
>> >> optimal_parent = rate * m / n / k
>> >> (which is, I think, what you you're proposing) leading to an optimal
>> >> parent of 217714284 (!). We can't get 217714284 from the parent (we
>> >> could get 217714285, but we're not asking for that) so the parent rounds
>> >> down.
>> >>
>> >> To make things worse, this story continues for the new "best rate" as
>> >> well.
>> >>
>> >> In the end, ccu_nkm_find_best claims:
>> >> - the optimal rate for 449064000 is 449035712 (parent=217714285, n=11,
>> >> k=3, m=16)
>> >> - but ccu_nkm_find_best would claim that the optimal rate for 449035712
>> >> is 449018181 (parent=235200000, n=7, k=3, m=11)
>> >> - and finally, the optimal rate for 449018181 is 449018180
>> >> (parent=213818181, n=7, k=3, m=10)
>> >>
>> >> This doesn't seem right to me.
>> >>
>> >> But you're also right, in that we can't just always round up. In a
>> >> hypothetical example that we request a parent rate of 450000000. With
>> >> rounding up, we'd get an optimal parent rate of 218181819 for n=11, k=3,
>> >> m=16. And let's now further claim that the parent could provide exactly
>> >> that rate, we'd end up with a rate of 450000001. So, we'd overshoot,
>> >> which (currently) is not acceptable.
>> >>
>> >> Hmm... I currently can't think of a clever way to solve this other than
>> >> this:
>> >>
>> >> optimal_parent = (rate * _m + (_n * _k - 1)) / _n / _k;
>> >> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
>> >> tmp_rate = tmp_parent * _n * _k / _m;
>> >> if (tmp_rate > rate) {
>> >> optimal_parent = rate * m / n / k
>> >> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
>> >> tmp_rate = tmp_parent * _n * _k / _m;
>> >> }
>> >> if (tmp_parent > optimal_parent)
>> >> continue;
>> >>
>> >> This seems ugly, but at least it should work in all cases. Any opinions?
>> >
>> > Again, you shouldn't work around the issue.
>> >
>> > It's very simple really: you already computed the optimal parent rate,
>>
>> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
>> best rate for rate = 449035712, it should try to include 449035712 in
>> its candidates, agreed?
>>
>> Example 1:
>> ==========
>> rate=449035712, n=11, k=3, m=16:
>> We should as for a parent rate of 217714285, because:
>> 217714285 * 11 * 3 / 16 = 449035712
>>
>> How do we get from 449035712 to 217714285, you ask?
>>
>> DIV_ROUND_UP(rate * m, n * k)
>
> Why are we rounding up? I don't think the hardware will round up there.
>
>> Do you agree that we should ask the parent for 217714285 in case we want
>> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
>> m=16?
>>
>> We should not ask for a parent rate of 217714284, because:
>> 217714284 * 11 * 3 / 16 = 449035710
>>
>> Example 2:
>> ==========
>> rate=500000000, n=11, k=3, m=16:
>> Here we should not ask the parent for
>> DIV_ROUND_UP(rate * m, n * k)
>> because that would be 242424243.
>>
>> 242424243 * 11 * 3 / 16 = 500000001
>>
>> We (the NKM clock, not the parent!) would overshoot (please see at the
>> end of this mail, why (for now) I don't want to support overshooting in
>> the NKM clock).
>>
>> Instead we should as for a parent rate of 242424242, because:
>> 242424242 * 11 * 3 / 16 = 499999999
>>
>> In conclusion, there are cases, where we (the NKM clock) have to ask the
>> parent for
>> DIV_ROUND_UP(rate * m, n * k)
>> And there are also cases, where we have to ask the parent for
>> rate * m / (n * k)
>
> I mean, I think you're overthinking this.
>
> If you never round up and mimic how the hardware behaves, and test all
> combination, then eventually you'll find the closest rate.
>
> If you don't because the parent doesn't look for the closest rate, then
> the parent should be changed too.
>
> It really is that simple.
>
>> This is what the code is trying to do. Maybe it's easier to look at V2
>> because I extracted the calcultion of the optimal parent rate into a
>> separate function hoping that this makes things clearer.
>>
>> Let me stress this: When calculating the optimal rate for the parent,
>> I'm not making any assumptions here about how the PARENT clock rounds.
>> In fact, I assume that the parent could be perfect and always provides
>> the rate it is asked for. I only take into account how the nkm clock
>> rounds.
>
> At the very least, you assume that the parent rounding can be "wrong"
> and try to work around that.
>
>> > you ask the parent to compute whatever is closest to that optimal parent
>> > rate.
>> >
>> > It's the parent responsibility now. It's the parent decision to figure
>> > out what "the closest" means, if it can change rate, if it has any range
>> > limitation, etc. You can't work around that.
>> >
>> > What you actually want there is the parent to actually provide the
>> > closest rate, even if it means overshooting.
>> >
>>
>> I want to ask the parent for a rate, that would actually result in the
>> rate that nkm_find_best was asked for. Are you asking me to instead ask
>> the parent for a rate that doesn't fit that criterion?
>
> No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
> and use whatever value it returned.
>
> If it requires changing the parent clock to improve its round_rate
> behaviour, then do that too.
>
>> > That's fine, we have a flag
>> > for that: CLK_(MUX|DIVIDER)_ROUND_CLOSEST. We just need to set it on the
>> > parent and be done with it.
>>
>> This is a totally different issue.
>
> Why?
>
>> If you carefully look at ccu_mp, you will see that it would ignore
>> cases when its parent had rounded up. ccu_nkm is no different.
>> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
>> totally different beast. For now, sunxi-ng always expects rounding
>> down.
>
> Then change that?
>
> It doesn't look that bad to be honest, it's basically change the rate
> comparison check by something like mux_is_better_rate() or
> _is_best_div(). As far as I'm concerned, you can even do that only for
> NKM and MP clocks if you don't want to change everything.
>
>> I do not like mixing the two into one patchset. I hope that's a fair
>> request? I tried to mix it and it was a nightmare! If you want, I can
>> try tackling ROUND_CLOSEST afterwards. But I don't think it would help
>> with this patchset, because we'd need to support both the ROUND_CLOSEST
>> and ~ROUND_CLOSEST case. Covering one case seems already hard enough. :)
>
> That's fair, but then remove the rate rounding handling entirely and
> only deal with forwarding the rate to the parent if SET_RATE_PARENT is
> set.
>
> Maxime
>
> [[End of PGP Signed Part]]

2023-06-15 15:17:02

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

On Tue, Jun 13, 2023 at 12:17:06PM +0200, Frank Oltmanns wrote:
> Hi Maxime,
>
> I'll now only respond to one aspect of your mail, because it's the
> foundation for the whole behaviour.
>
> On 2023-06-13 at 11:10:08 +0200, Maxime Ripard <[email protected]> wrote:
> [...]
> >> >> ccu_nkm_find_best is called in the following two situations:
> >> >> a. from ccu_nkm_set_rate when setting the rate
> >> >> b. from ccu_nkm_round_rate when determining the rate
> >> >>
> >> >> In situation a. we never want ccu_nkm_find_best to try different parent
> >> >> rates because setting the parent rate is a done deal (at least that's my
> >> >> understanding).
> >> >>
> >> >> In situation b. we only want ccu_nkm_find_best to try different parent
> >> >> rates when, as you mentioned, the CLK_SET_RATE_PARENT flag is set.
> >> >
> >> > It doesn't really matter though. The output of that function must be
> >> > stable and must return the same set of factors and parent rate for a
> >> > given target rate.
> >> >
> >>
> >> I'm not sure if we're talking about the same thing here. Of course the
> >> set of factors and parent rate for a given target rate will be different
> >> depending on the fact if we can or cannot adjust the parent rate,
> >> agreed?
> >
> > Yes, but here you also have a different behaviour in clk_round_rate()
> > and in clk_set_rate(), which isn't ok.
> >
> > Basically, clk_set_rate() + clk_get_rate() must be equal to
> > clk_round_rate().
> >
> > If you change if you look for parents depending on whether you're being
> > called in clk_round_rate() and clk_set_rate(), then you're breaking that
> > expectation.
> >
> >> Let me compare my implementation to ccu_mp.
> >>
> >> ccu_mp_round_rate either calls the function ccu_mp_find_best or
> >> ccu_mp_find_best_with_parent_adj, depending on CLK_SET_RATE_PARENT.
> >
> > Yes, and it's fine: the flag is per-clock, and the output is the same
> > depending on whether we're being called by clk_round_rate() and
> > clk_set_rate().
> >
>
> The output is really not the same.
>
> ccu_mp_set_rate() always calls ccu_mp_find_best(). It never (!) considers
> changing the parent, independent of any flags.
>
> ccu_mp_round_rate() is calling ccu_mp_find_best() OR
> ccu_mp_find_best_with_parent_adj() depending on the flag.
>
> If I understand you correctly, you consider that a bug.

No, sorry, you're right.

clk_set_rate will call round_rate first, which will (possibly) pick up a
new parent, and by the time set_rate is called our parent will have been
changed already so we will just call find_best again considering only
that parent.

The set of factors and dividers should remain the same there, but I
don't think that's a concern.

That leaves us with the rounding stuff, and the overall function
arguments. I like the structure of ccu_mp better, is there a reason to
deviate from it?

Maxime


Attachments:
(No filename) (2.88 kB)
signature.asc (235.00 B)
Download all attachments

2023-06-15 16:17:10

by Frank Oltmanns

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate


15.06.2023 16:47:33 Maxime Ripard <[email protected]>:

> On Tue, Jun 13, 2023 at 12:17:06PM +0200, Frank Oltmanns wrote:
>> Hi Maxime,
>>
>> I'll now only respond to one aspect of your mail, because it's the
>> foundation for the whole behaviour.
>>
>> On 2023-06-13 at 11:10:08 +0200, Maxime Ripard <[email protected]> wrote:
>> [...]
>>>>>> …
>>>>>
>>>>> It doesn't really matter though. The output of that function must be
>>>>> stable and must return the same set of factors and parent rate for a
>>>>> given target rate.
>>>>>
>>>>
>>>> I'm not sure if we're talking about the same thing here. Of course the
>>>> set of factors and parent rate for a given target rate will be different
>>>> depending on the fact if we can or cannot adjust the parent rate,
>>>> agreed?
>>>
>>> Yes, but here you also have a different behaviour in clk_round_rate()
>>> and in clk_set_rate(), which isn't ok.
>>>
>>> Basically, clk_set_rate() + clk_get_rate() must be equal to
>>> clk_round_rate().
>>>
>>> If you change if you look for parents depending on whether you're being
>>> called in clk_round_rate() and clk_set_rate(), then you're breaking that
>>> expectation.
>>>
>>>> Let me compare my implementation to ccu_mp.
>>>>
>>>> ccu_mp_round_rate either calls the function ccu_mp_find_best or
>>>> ccu_mp_find_best_with_parent_adj, depending on CLK_SET_RATE_PARENT.
>>>
>>> Yes, and it's fine: the flag is per-clock, and the output is the same
>>> depending on whether we're being called by clk_round_rate() and
>>> clk_set_rate().
>>>
>>
>> The output is really not the same.
>>
>> ccu_mp_set_rate() always calls ccu_mp_find_best(). It never (!) considers
>> changing the parent, independent of any flags.
>>
>> ccu_mp_round_rate() is calling ccu_mp_find_best() OR
>> ccu_mp_find_best_with_parent_adj() depending on the flag.
>>
>> If I understand you correctly, you consider that a bug.
>
> No, sorry, you're right.
>
> clk_set_rate will call round_rate first, which will (possibly) pick up a
> new parent, and by the time set_rate is called our parent will have been
> changed already so we will just call find_best again considering only
> that parent.

Ok, no worries. That was my understanding, so your previous statement shattered my worldview. ;) That's why I may have seemed a bit alarmed.

>
> The set of factors and dividers should remain the same there, but I
> don't think that's a concern.

Ack. The output is stable when called with the same rate.

> That leaves us with the rounding stuff, and the overall function
> arguments. I like the structure of ccu_mp better, is there a reason to
> deviate from it?

I'm still pondering the rounding stuff. I'm just not sure why you are so relaxed about the fact that when calling round_rate with 449064000 we get 449035712, but when we call get round_rate with 449035712 we get 449018181, and when we call round_rate with 449018181, we get 449018180.

But ultimately, you have the final word, of course. But I need some time to be sure, that this does not become a problem in some cases.

Regarding the arguments, I can adopt ccu_mp's style. The only reason was to avoid code duplication, no other reason.

Thanks,
  Frank

> > Maxime


2023-06-19 08:34:40

by Frank Oltmanns

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

Hi Maxime,

the essence of my following ramblings:
- I do think it is reasonable that nkm is asking its parent for the
rate that nkm actually needs from said parent to fulfill the request.
- I don't think nkm should make assumptions about the rounding
behaviour of the parent.

The reason is, that I want to prevent users of ccu_nkm from making
mistakes when defining their clocks (which includes the parent of their
nkm clock).

Please read below the details on why I think that.

On 2023-06-13 at 11:10:08 +0200, Maxime Ripard <[email protected]> wrote:
> [[PGP Signed Part:Undecided]]
> Hi,
>
> On Mon, Jun 12, 2023 at 06:29:36PM +0200, Frank Oltmanns wrote:
>> Hi Maxime,
>>
>> I'm sorry that the following mail is a bit long. I'm not sure, there is
>> some kind of misunderstanding/miscommunication somewhere, I'm just not
>> sure, where. :)
>>
>> This mail is aiming at finding the exact points where we apparently go
>> down different paths.
>>
>> On 2023-06-12 at 14:19:05 +0200, Maxime Ripard <[email protected]> wrote:
>> > [[PGP Signed Part:Undecided]]
>> > On Wed, Jun 07, 2023 at 09:39:35AM +0200, Frank Oltmanns wrote:
>> >> Hi Maxime,
>> >>
>> >> On 2023-06-07 at 08:38:39 +0200, Maxime Ripard <[email protected]> wrote:
>> >> > [[PGP Signed Part:Undecided]]
>> >> > On Mon, Jun 05, 2023 at 09:07:44PM +0200, Frank Oltmanns wrote:
>> >> >> In case the CLK_SET_RATE_PARENT flag is set, consider using a different
>> >> >> parent rate when determining a new rate.
>> >> >>
>> >> >> To find the best match for the requested rate, perform the following
>> >> >> steps for each NKM combination:
>> >> >> - calculate the optimal parent rate,
>> >> >> - find the best parent rate that the parent clock actually supports
>> >> >> - use that parent rate to calculate the effective rate.
>> >> >>
>> >> >> In case the clk does not support setting the parent rate, use the same
>> >> >> algorithm as before.
>> >> >>
>> >> >> Signed-off-by: Frank Oltmanns <[email protected]>
>> >> >> ---
>> >> >> drivers/clk/sunxi-ng/ccu_nkm.c | 40 ++++++++++++++++++++++++++--------
>> >> >> 1 file changed, 31 insertions(+), 9 deletions(-)
>> >> >>
>> >> >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> >> index a0978a50edae..c71e237226f2 100644
>> >> >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> >> @@ -16,10 +16,10 @@ struct _ccu_nkm {
>> >> >> unsigned long m, min_m, max_m;
>> >> >> };
>> >> >>
>> >> >> -static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>> >> >> - struct _ccu_nkm *nkm)
>> >> >> +static unsigned long ccu_nkm_find_best(unsigned long *parent, unsigned long rate,
>> >> >> + struct _ccu_nkm *nkm, struct clk_hw *parent_hw)
>> >> >> {
>> >> >> - unsigned long best_rate = 0;
>> >> >> + unsigned long best_rate = 0, best_parent_rate = 0, tmp_parent = *parent;
>> >> >> unsigned long best_n = 0, best_k = 0, best_m = 0;
>> >> >> unsigned long _n, _k, _m;
>> >> >>
>> >> >> @@ -28,12 +28,29 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>> >> >> for (_m = nkm->min_m; _m <= nkm->max_m; _m++) {
>> >> >> unsigned long tmp_rate;
>> >> >>
>> >> >> - tmp_rate = parent * _n * _k / _m;
>> >> >> + if (parent_hw) {
>> >> >
>> >> > NKM clocks always have a parent
>> >> >
>> >> > You should test if the CLK_SET_RATE_PARENT flag is set.
>> >>
>> >> ccu_nkm_find_best is called in the following two situations:
>> >> a. from ccu_nkm_set_rate when setting the rate
>> >> b. from ccu_nkm_round_rate when determining the rate
>> >>
>> >> In situation a. we never want ccu_nkm_find_best to try different parent
>> >> rates because setting the parent rate is a done deal (at least that's my
>> >> understanding).
>> >>
>> >> In situation b. we only want ccu_nkm_find_best to try different parent
>> >> rates when, as you mentioned, the CLK_SET_RATE_PARENT flag is set.
>> >
>> > It doesn't really matter though. The output of that function must be
>> > stable and must return the same set of factors and parent rate for a
>> > given target rate.
>> >
>>
>> I'm not sure if we're talking about the same thing here. Of course the
>> set of factors and parent rate for a given target rate will be different
>> depending on the fact if we can or cannot adjust the parent rate,
>> agreed?
>
> Yes, but here you also have a different behaviour in clk_round_rate()
> and in clk_set_rate(), which isn't ok.
>
> Basically, clk_set_rate() + clk_get_rate() must be equal to
> clk_round_rate().
>
> If you change if you look for parents depending on whether you're being
> called in clk_round_rate() and clk_set_rate(), then you're breaking that
> expectation.
>
>> Let me compare my implementation to ccu_mp.
>>
>> ccu_mp_round_rate either calls the function ccu_mp_find_best or
>> ccu_mp_find_best_with_parent_adj, depending on CLK_SET_RATE_PARENT.
>
> Yes, and it's fine: the flag is per-clock, and the output is the same
> depending on whether we're being called by clk_round_rate() and
> clk_set_rate().
>
>> I'm basically doing the same thing, but (!) ccu_nkm_find_best and
>> ccu_nkm_find_best_with_parent_adj would be almost identical. Therefore,
>> I opted to extend ccu_nkm_find_best to also support the parent
>> adjustment. If you look at V2 of this patch, you will see that the only
>> diffences are an if statement (if (parent_hw)) with two lines of code in
>> the if's body and the fact that we need to store the best parent rate.
>>
>> If you prefer, I can split this into two separate functions like in
>> ccu_mp. I think all the confusion is coming from the fact that I didn't.
>> So apparently it was not a good idea to keep it as one function.
>>
>> Should I introduce ccu_nkm_find_best_with_parent_adj instead of using
>> ccu_nkm_find_best for both cases?
>>
>> >
>> > So you can call it as many times as you want, it doesn't really matter.
>>
>> Of course! What did I write that made you think, I thought otherwise?
>>
>> >
>> >> So, what this patch does, it provides a NULL pointer as parent_hw when
>> >> we don't want ccu_nkm_find_best to try alternative parent rates.
>> >
>> > At best, the argument is misleading then. You're not passing a pointer
>> > to the parent, you're telling it whether it should look for other
>> > parents or not. And it's not a pointer, it's a boolean.
>>
>> No, I'm using parent_hw and as a pointer a few lines below when calling
>> clk_hw_round_rate. So I'd need a boolean AND a pointer. I always need
>> the pointer if the boolean is true. I never need the pointer when the
>> boolean is false. Therefore, I opted to choose to use the pointer for
>> first being a boolean (in the if) and then being a pointer (when calling
>> clk_hw_round_rate). This is the (hopefully easier to read) code from V2:
>>
>> if (parent_hw) {
>> tmp_parent = optimal_parent_rate(rate, _n, _k, _m);
>> tmp_parent = clk_hw_round_rate(parent_hw, tmp_parent);
>> }
>
> Again, that clock always has a parent. It's not the actual condition:
> what you want to test is whether you want to forward the rate request to
> the parent or not. So that condition is misleading.
>
>> >> Is it ok if I add a comment to ccu_nkm_find_best that explains the
>> >> function and explicitly also the parameters?
>> >
>> > Sure
>>
>> Done in V2.
>>
>> >
>> >> I also thought about using two different functions for the two
>> >> situations. I have no strong opinion which is better.
>> >>
>> >> However, I don't think we should hand over the flags to this function,
>> >> because we'd still only need to provide the parent_hw if we want to
>> >> find the optimal parent rate, so having two parametes for the same
>> >> purpose seems redundant. Unless, there is a rule to not use NULL
>> >> pointers.
>> >
>> > Again, the behaviour must be stable across all calling sites.
>>
>> No argument here.
>>
>> >
>> >> >
>> >> >> + // We must round up the desired parent rate, because the
>> >> >> + // rounding down happens when calculating tmp_rate. If we
>> >> >> + // round down also here, we'd round down twice.
>> >> >> + unsigned long optimal_parent =
>> >> >> + (rate * _m + (_n * _k - 1)) / _n / _k;
>> >> >
>> >> > I assume the addition of n * k - 1 is to round up, but I'm not sure we
>> >> > should hack around like that.
>> >> >
>> >> > You should compute the ideal parent rate for a given set of timings, and
>> >> > then just call round_rate on it. If the parent wants to round it one way
>> >> > or another, that's the parent concern.
>> >>
>> >> I admit that the comment explaining this is not doing the complexity of
>> >> this issue any justice. Let me try to explain:
>> >>
>> >> Let's say for our panel the optimal rate for pll-mipi is 449064000. The
>> >> best closest we can get is 449035712 with a parent rate of 217714285
>> >> (n=11, k=3, m=16).
>> >>
>> >> Eventually, ccu_nkm_find_best is going to be called with 449035712 as
>> >> the rate. If we don't round up, like I proposend, but instead calculate:
>> >> optimal_parent = rate * m / n / k
>> >> (which is, I think, what you you're proposing) leading to an optimal
>> >> parent of 217714284 (!). We can't get 217714284 from the parent (we
>> >> could get 217714285, but we're not asking for that) so the parent rounds
>> >> down.
>> >>
>> >> To make things worse, this story continues for the new "best rate" as
>> >> well.
>> >>
>> >> In the end, ccu_nkm_find_best claims:
>> >> - the optimal rate for 449064000 is 449035712 (parent=217714285, n=11,
>> >> k=3, m=16)
>> >> - but ccu_nkm_find_best would claim that the optimal rate for 449035712
>> >> is 449018181 (parent=235200000, n=7, k=3, m=11)
>> >> - and finally, the optimal rate for 449018181 is 449018180
>> >> (parent=213818181, n=7, k=3, m=10)
>> >>
>> >> This doesn't seem right to me.
>> >>
>> >> But you're also right, in that we can't just always round up. In a
>> >> hypothetical example that we request a parent rate of 450000000. With
>> >> rounding up, we'd get an optimal parent rate of 218181819 for n=11, k=3,
>> >> m=16. And let's now further claim that the parent could provide exactly
>> >> that rate, we'd end up with a rate of 450000001. So, we'd overshoot,
>> >> which (currently) is not acceptable.
>> >>
>> >> Hmm... I currently can't think of a clever way to solve this other than
>> >> this:
>> >>
>> >> optimal_parent = (rate * _m + (_n * _k - 1)) / _n / _k;
>> >> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
>> >> tmp_rate = tmp_parent * _n * _k / _m;
>> >> if (tmp_rate > rate) {
>> >> optimal_parent = rate * m / n / k
>> >> tmp_parent = clk_hw_round_rate(parent_hw, optimal_parent);
>> >> tmp_rate = tmp_parent * _n * _k / _m;
>> >> }
>> >> if (tmp_parent > optimal_parent)
>> >> continue;
>> >>
>> >> This seems ugly, but at least it should work in all cases. Any opinions?
>> >
>> > Again, you shouldn't work around the issue.
>> >
>> > It's very simple really: you already computed the optimal parent rate,
>>
>> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
>> best rate for rate = 449035712, it should try to include 449035712 in
>> its candidates, agreed?
>>
>> Example 1:
>> ==========
>> rate=449035712, n=11, k=3, m=16:
>> We should as for a parent rate of 217714285, because:
>> 217714285 * 11 * 3 / 16 = 449035712
>>
>> How do we get from 449035712 to 217714285, you ask?
>>
>> DIV_ROUND_UP(rate * m, n * k)
>
> Why are we rounding up? I don't think the hardware will round up there.

Being a "software guy" it is also my understanding that the hardware
does not round up here (or round down for that matter).

But anyway, my concern is the rate's representation *in software*. The
clk drivers are using unsigned long to represent the actual rate. This
is not a lossless representation. In other words, the value (i.e. the
software representation) of that rate is, of course, a "lie". The
hardware clock is running at some rate that is hopefully close to what
we represent in software, but still it's an abstraction.

For example, the user (e.g. in my example a panel) asks us for a rate
that is represented in softwares as 449035712. Given the values n=11,
k=3, m=16, we can *only* represent this value in software if the parent
gives us a rate that is represented in software as 217714285. Therefore,
I think it is reasonable to ask the parent for that rate (217714285).

>
>> Do you agree that we should ask the parent for 217714285 in case we want
>> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
>> m=16?
>>
>> We should not ask for a parent rate of 217714284, because:
>> 217714284 * 11 * 3 / 16 = 449035710
>>
>> Example 2:
>> ==========
>> rate=500000000, n=11, k=3, m=16:
>> Here we should not ask the parent for
>> DIV_ROUND_UP(rate * m, n * k)
>> because that would be 242424243.
>>
>> 242424243 * 11 * 3 / 16 = 500000001
>>
>> We (the NKM clock, not the parent!) would overshoot (please see at the
>> end of this mail, why (for now) I don't want to support overshooting in
>> the NKM clock).
>>
>> Instead we should as for a parent rate of 242424242, because:
>> 242424242 * 11 * 3 / 16 = 499999999
>>
>> In conclusion, there are cases, where we (the NKM clock) have to ask the
>> parent for
>> DIV_ROUND_UP(rate * m, n * k)
>> And there are also cases, where we have to ask the parent for
>> rate * m / (n * k)
>
> I mean, I think you're overthinking this.
>
> If you never round up and mimic how the hardware behaves, and test all
> combination, then eventually you'll find the closest rate.
>
> If you don't because the parent doesn't look for the closest rate, then
> the parent should be changed too.
>
> It really is that simple.
>
>> This is what the code is trying to do. Maybe it's easier to look at V2
>> because I extracted the calcultion of the optimal parent rate into a
>> separate function hoping that this makes things clearer.
>>
>> Let me stress this: When calculating the optimal rate for the parent,
>> I'm not making any assumptions here about how the PARENT clock rounds.
>> In fact, I assume that the parent could be perfect and always provides
>> the rate it is asked for. I only take into account how the nkm clock
>> rounds.
>
> At the very least, you assume that the parent rounding can be "wrong"
> and try to work around that.

No. I'm not assuming anything about the parent. But I *know* that if we
(nkm) want to get a rate that is represented in softwares as 449035712
and given the values n=11, k=3, m=16, we (nkm) must get the rate from
the parent that the parent represents in software as 217714285, because
I know that we (nkm) calculate *our* (nkm) rate using
parent * n * k / m

So if (!) we want to give the user the rate that they ask for, why not
ask the parent for the rate that we need (217714285)?

I admit that I'm making assumptions here. My assumptions are that we
a. want to at least try to give the user what they asked for
b. without making assumptions about the parent's behaviour.

Those assumptions could of course be wrong, but, honestly, I would find
that confusing.

>
>> > you ask the parent to compute whatever is closest to that optimal parent
>> > rate.
>> >
>> > It's the parent responsibility now. It's the parent decision to figure
>> > out what "the closest" means, if it can change rate, if it has any range
>> > limitation, etc. You can't work around that.
>> >
>> > What you actually want there is the parent to actually provide the
>> > closest rate, even if it means overshooting.
>> >
>>
>> I want to ask the parent for a rate, that would actually result in the
>> rate that nkm_find_best was asked for. Are you asking me to instead ask
>> the parent for a rate that doesn't fit that criterion?
>
> No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
> and use whatever value it returned.
>
> If it requires changing the parent clock to improve its round_rate
> behaviour, then do that too.
>

Hmmm... Okay. So you *are* saying, that I should make changes to the
parent so that we do not need to request the exact rate we want from the
parent. But I really don't understand why.

As I wrote above, I'm not making any assumptions of how and if the
parent rounds. My code is rounding *prior* to asking the parent. Your
proposal on the other hand *requires* changing the parent to round
closest where mine does not.

My concern is, that we could then end up with the situation that someone
defines an nkm clock in their SoC which has CLK_SET_RATE_PARENT set, but
does not set the ROUND_CLOSEST flag on the parent, because it's not
immediately apparent why they should do that.

Let's assume that hypothetical board were the A64, the nkm clock were pll-mipi,
and the parent were pll-video0 and we "forget" to set ROUND_CLOSEST on
pll-video0:

When pll-mipi nkm clock is asked via determine_rate() for a rate of
449064000 it would return 449035712 and a parent rate of 217714285
(using n=11, k=3, m=16, but those details aren't returned by
determine_rate()).

Eventually, determine_rate() will be called again, but this time for a
rate of 449035712. The user already knows that we can provide that,
because we told them (see previous paragraph). But since we're
truncating when calculating the rate that we'd like the parent to
provide, we end up asking the parent for 217714284 when we actually need
it to provide 217714285. So we now *require* the parent to find the
closest and additionally we must *hope* that the parent is incapable of
providing the rate that we asked for.

It's probably a fair hope. And maybe it's a fair requirement. Above you
said "It really is that simple." But in my opinion it's simpler to ask
the parent for the rate that we actually want it to provide, because it
reduces the risk for SoC implementers to introduce undesired behaviour.

>
>> > That's fine, we have a flag
>> > for that: CLK_(MUX|DIVIDER)_ROUND_CLOSEST. We just need to set it on the
>> > parent and be done with it.
>>
>> This is a totally different issue.
>
> Why?
>
>> If you carefully look at ccu_mp, you will see that it would ignore
>> cases when its parent had rounded up. ccu_nkm is no different.
>> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
>> totally different beast. For now, sunxi-ng always expects rounding
>> down.
>
> Then change that?

You told me that both over- and undershooting are fine when
determining the rate, *but also* "it's a bit context specific which one
we should favour. If we were to do anything, it would be to support both
and let the clock driver select which behaviour it wants." (see
https://lore.kernel.org/all/flngzi4henkzcpzwdexencdkw77h52g3nduup7pwctpwfiuznk@eewnnut5mvsq/)

So, I can't just change NKM's parent's default behavior (which is an NM
clock in my case), because, if I understand correctly, I would have to
introduce a "ROUND_CLOSEST" flag for NM clocks.

But then I feel like I would have to document somewhere that when
setting CLK_SET_RATE_PARENT for an NKM clock, that the parent clock
needs to ROUND_CLOSEST, in order to avoid drifting away from the
requested rate in the successive calls that are made to
ccu_nkm_determine_rate(), which I tried to explain above and in previous
messages.

Instead we could introduce the following function, like I suggested in
V2 of this patchset. The advantage is that it both *documents* the
dilemma for developers of ccu_nkm and also *avoids* it for users of
ccu_nkm.

static unsigned long optimal_parent_rate(unsigned long rate, unsigned long n,
unsigned long k, unsigned long m)
{
unsigned long _rate, parent;

// We must first try to find the desired parent rate that is rounded up, because there are
// cases where truncating makes us miss the requested rate.
// E.g. rate=449035712, n=11, k=3, m=16
// When truncating, we'd get parent=217714284 and calculating the rate from that would give
// us 449035710. When rounding up, we get parent=217714285 which would give us the requested
// rate of 449035712.
parent = DIV_ROUND_UP(rate * m, n * k);

// But there are other cases, where rounding up the parent gives us a too high rate.
// Therefore, we need to check for this case and, if necessary, truncate the parent rate
// instead of rounding up.
_rate = parent * n * k / m;
if (_rate > rate)
parent = rate * m / (n * k);
return parent;
}

And then we ask the parent for that optimal parent rate we just
calculated. I feel like that's an easy thing to do.

Again, I'm sorry for making such a fuss about the whole topic. But I at
least want you to understand my concern.

Which, in essence, is that I do think that asking the parent for the
software represantation of the rate that we actually need is reasonable
and we should not make assumption about the rounding behaviour of the
parent. This is just to avoid future mistakes that users of ccu_nkm
could otherwise run into. That is my only concern.

Please let me know what you think.

Thank you for your patience,
Frank

>
> It doesn't look that bad to be honest, it's basically change the rate
> comparison check by something like mux_is_better_rate() or
> _is_best_div(). As far as I'm concerned, you can even do that only for
> NKM and MP clocks if you don't want to change everything.
>
>> I do not like mixing the two into one patchset. I hope that's a fair
>> request? I tried to mix it and it was a nightmare! If you want, I can
>> try tackling ROUND_CLOSEST afterwards. But I don't think it would help
>> with this patchset, because we'd need to support both the ROUND_CLOSEST
>> and ~ROUND_CLOSEST case. Covering one case seems already hard enough. :)
>
> That's fair, but then remove the rate rounding handling entirely and
> only deal with forwarding the rate to the parent if SET_RATE_PARENT is
> set.
>
> Maxime
>
> [[End of PGP Signed Part]]

2023-06-19 16:56:22

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

On Thu, Jun 15, 2023 at 06:04:53PM +0200, Frank Oltmanns wrote:
> 15.06.2023 16:47:33 Maxime Ripard <[email protected]>:
> > On Tue, Jun 13, 2023 at 12:17:06PM +0200, Frank Oltmanns wrote:
> >> Hi Maxime,
> >>
> >> I'll now only respond to one aspect of your mail, because it's the
> >> foundation for the whole behaviour.
> >>
> >> On 2023-06-13 at 11:10:08 +0200, Maxime Ripard <[email protected]> wrote:
> >> [...]
> >>>>>> …
> >>>>>
> >>>>> It doesn't really matter though. The output of that function must be
> >>>>> stable and must return the same set of factors and parent rate for a
> >>>>> given target rate.
> >>>>>
> >>>>
> >>>> I'm not sure if we're talking about the same thing here. Of course the
> >>>> set of factors and parent rate for a given target rate will be different
> >>>> depending on the fact if we can or cannot adjust the parent rate,
> >>>> agreed?
> >>>
> >>> Yes, but here you also have a different behaviour in clk_round_rate()
> >>> and in clk_set_rate(), which isn't ok.
> >>>
> >>> Basically, clk_set_rate() + clk_get_rate() must be equal to
> >>> clk_round_rate().
> >>>
> >>> If you change if you look for parents depending on whether you're being
> >>> called in clk_round_rate() and clk_set_rate(), then you're breaking that
> >>> expectation.
> >>>
> >>>> Let me compare my implementation to ccu_mp.
> >>>>
> >>>> ccu_mp_round_rate either calls the function ccu_mp_find_best or
> >>>> ccu_mp_find_best_with_parent_adj, depending on CLK_SET_RATE_PARENT.
> >>>
> >>> Yes, and it's fine: the flag is per-clock, and the output is the same
> >>> depending on whether we're being called by clk_round_rate() and
> >>> clk_set_rate().
> >>>
> >>
> >> The output is really not the same.
> >>
> >> ccu_mp_set_rate() always calls ccu_mp_find_best(). It never (!) considers
> >> changing the parent, independent of any flags.
> >>
> >> ccu_mp_round_rate() is calling ccu_mp_find_best() OR
> >> ccu_mp_find_best_with_parent_adj() depending on the flag.
> >>
> >> If I understand you correctly, you consider that a bug.
> >
> > No, sorry, you're right.
> >
> > clk_set_rate will call round_rate first, which will (possibly) pick up a
> > new parent, and by the time set_rate is called our parent will have been
> > changed already so we will just call find_best again considering only
> > that parent.
>
> Ok, no worries. That was my understanding, so your previous statement shattered my worldview. ;) That's why I may have seemed a bit alarmed.
>
> >
> > The set of factors and dividers should remain the same there, but I
> > don't think that's a concern.
>
> Ack. The output is stable when called with the same rate.
>
> > That leaves us with the rounding stuff, and the overall function
> > arguments. I like the structure of ccu_mp better, is there a reason to
> > deviate from it?
>
> I'm still pondering the rounding stuff. I'm just not sure why you are
> so relaxed about the fact that when calling round_rate with 449064000
> we get 449035712, but when we call get round_rate with 449035712 we
> get 449018181, and when we call round_rate with 449018181, we get
> 449018180.

I guess there's a couple of reasons:

- You mentioned that you were going to fix the rate issue later :)

- At the end of the day, it's not a huge offset and shouldn't cause
any big trouble

> But ultimately, you have the final word, of course. But I need some
> time to be sure, that this does not become a problem in some cases.

Don't get me wrong, it should be fixed (and ideally, we should get some
unit tests to make sure that it doesn't behave that way). I don't think
it's urgent though, or that we introduce workarounds in one particular
clock type.

So we can definitely focus on the parent stuff first, and then get the
rate stuff figured out.

Maxime


Attachments:
(No filename) (3.80 kB)
signature.asc (235.00 B)
Download all attachments

2023-06-19 18:09:06

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

On Mon, Jun 19, 2023 at 10:16:26AM +0200, Frank Oltmanns wrote:
> Hi Maxime,
>
> the essence of my following ramblings:
> - I do think it is reasonable that nkm is asking its parent for the
> rate that nkm actually needs from said parent to fulfill the request.
> - I don't think nkm should make assumptions about the rounding
> behaviour of the parent.

I guess we agree :)

And I would go even further and say that we shouldn't make *any*
assumption about the behaviour of the parent.

> The reason is, that I want to prevent users of ccu_nkm from making
> mistakes when defining their clocks (which includes the parent of their
> nkm clock).
>
> Please read below the details on why I think that.
>
> [...]
>
> >> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
> >> best rate for rate = 449035712, it should try to include 449035712 in
> >> its candidates, agreed?
> >>
> >> Example 1:
> >> ==========
> >> rate=449035712, n=11, k=3, m=16:
> >> We should as for a parent rate of 217714285, because:
> >> 217714285 * 11 * 3 / 16 = 449035712
> >>
> >> How do we get from 449035712 to 217714285, you ask?
> >>
> >> DIV_ROUND_UP(rate * m, n * k)
> >
> > Why are we rounding up? I don't think the hardware will round up there.
>
> Being a "software guy" it is also my understanding that the hardware
> does not round up here (or round down for that matter).

That's my understanding as well.

> But anyway, my concern is the rate's representation *in software*. The
> clk drivers are using unsigned long to represent the actual rate. This
> is not a lossless representation. In other words, the value (i.e. the
> software representation) of that rate is, of course, a "lie". The
> hardware clock is running at some rate that is hopefully close to what
> we represent in software, but still it's an abstraction.
>
> For example, the user (e.g. in my example a panel) asks us for a rate
> that is represented in softwares as 449035712. Given the values n=11,
> k=3, m=16, we can *only* represent this value in software if the parent
> gives us a rate that is represented in software as 217714285. Therefore,
> I think it is reasonable to ask the parent for that rate (217714285).

I somewhat agree, but I still don't think it's worth rounding up.

If we don't round up (and assuming the parent itself won't round the
clock), we end up with a rate of 449035710 using the dividers you
mentioned. It's a .0000005% deviation (I hope I didn't screw up the
number of 0s). It's negligible for all practical purposes, and it's not
worth making the code inconsistent and eyebrow raising.

> >> Do you agree that we should ask the parent for 217714285 in case we want
> >> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
> >> m=16?
> >>
> >> We should not ask for a parent rate of 217714284, because:
> >> 217714284 * 11 * 3 / 16 = 449035710
> >>
> >> Example 2:
> >> ==========
> >> rate=500000000, n=11, k=3, m=16:
> >> Here we should not ask the parent for
> >> DIV_ROUND_UP(rate * m, n * k)
> >> because that would be 242424243.
> >>
> >> 242424243 * 11 * 3 / 16 = 500000001
> >>
> >> We (the NKM clock, not the parent!) would overshoot (please see at the
> >> end of this mail, why (for now) I don't want to support overshooting in
> >> the NKM clock).
> >>
> >> Instead we should as for a parent rate of 242424242, because:
> >> 242424242 * 11 * 3 / 16 = 499999999
> >>
> >> In conclusion, there are cases, where we (the NKM clock) have to ask the
> >> parent for
> >> DIV_ROUND_UP(rate * m, n * k)
> >> And there are also cases, where we have to ask the parent for
> >> rate * m / (n * k)
> >
> > I mean, I think you're overthinking this.
> >
> > If you never round up and mimic how the hardware behaves, and test all
> > combination, then eventually you'll find the closest rate.
> >
> > If you don't because the parent doesn't look for the closest rate, then
> > the parent should be changed too.
> >
> > It really is that simple.
> >
> >> This is what the code is trying to do. Maybe it's easier to look at V2
> >> because I extracted the calcultion of the optimal parent rate into a
> >> separate function hoping that this makes things clearer.
> >>
> >> Let me stress this: When calculating the optimal rate for the parent,
> >> I'm not making any assumptions here about how the PARENT clock rounds.
> >> In fact, I assume that the parent could be perfect and always provides
> >> the rate it is asked for. I only take into account how the nkm clock
> >> rounds.
> >
> > At the very least, you assume that the parent rounding can be "wrong"
> > and try to work around that.
>
> No. I'm not assuming anything about the parent. But I *know* that if we
> (nkm) want to get a rate that is represented in softwares as 449035712
> and given the values n=11, k=3, m=16, we (nkm) must get the rate from
> the parent that the parent represents in software as 217714285, because
> I know that we (nkm) calculate *our* (nkm) rate using
> parent * n * k / m
>
> So if (!) we want to give the user the rate that they ask for, why not
> ask the parent for the rate that we need (217714285)?
>
> I admit that I'm making assumptions here. My assumptions are that we
> a. want to at least try to give the user what they asked for
> b. without making assumptions about the parent's behaviour.
>
> Those assumptions could of course be wrong, but, honestly, I would find
> that confusing.

I guess my point leans more towards the "social" side than the
mathematical one. If I followed you so far, the precision you expect to
gain is in the <1Hz range (and I've been in sick leave for a while, so
sorry if I didn't before). The rate is in the 100MHz range.

So the precision gain is pretty much nothing. Sure, it's closer from a
mathematical standpoint. But there's zero benefit from it.

However, it comes at the cost of a code that is definitely more
complicated (or less naive, depending on how you look at it I guess :))
and will be harder to figure out for someone that jumps into the driver.

So the trade-off doesn't really make fixing it worth it to me.

> >> > you ask the parent to compute whatever is closest to that optimal parent
> >> > rate.
> >> >
> >> > It's the parent responsibility now. It's the parent decision to figure
> >> > out what "the closest" means, if it can change rate, if it has any range
> >> > limitation, etc. You can't work around that.
> >> >
> >> > What you actually want there is the parent to actually provide the
> >> > closest rate, even if it means overshooting.
> >> >
> >>
> >> I want to ask the parent for a rate, that would actually result in the
> >> rate that nkm_find_best was asked for. Are you asking me to instead ask
> >> the parent for a rate that doesn't fit that criterion?
> >
> > No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
> > and use whatever value it returned.
> >
> > If it requires changing the parent clock to improve its round_rate
> > behaviour, then do that too.
> >
>
> Hmmm... Okay. So you *are* saying, that I should make changes to the
> parent so that we do not need to request the exact rate we want from the
> parent. But I really don't understand why.

No, sorry. I initially thought that you were working around "divider"
rounding issue (as opposed to integer like you mentionned above) with
the parent not providing its optimal rate, and you adjusting based on
that offset.

> As I wrote above, I'm not making any assumptions of how and if the
> parent rounds. My code is rounding *prior* to asking the parent. Your
> proposal on the other hand *requires* changing the parent to round
> closest where mine does not.
>
> My concern is, that we could then end up with the situation that someone
> defines an nkm clock in their SoC which has CLK_SET_RATE_PARENT set, but
> does not set the ROUND_CLOSEST flag on the parent, because it's not
> immediately apparent why they should do that.

It's going to happen, and probably happens at the moment already,
because not only the NKM clocks are affected, but virtually all of them,
and most don't use ROUND_CLOSEST.

And to some extent, it's fine. We would handle it like any other bug: if
we ever encounter one, we'll write a fix, backport it to stable and all
will be fine.

You can't figure out all the use-cases we'll require in the future
anyway.

> Let's assume that hypothetical board were the A64, the nkm clock were pll-mipi,
> and the parent were pll-video0 and we "forget" to set ROUND_CLOSEST on
> pll-video0:
>
> When pll-mipi nkm clock is asked via determine_rate() for a rate of
> 449064000 it would return 449035712 and a parent rate of 217714285
> (using n=11, k=3, m=16, but those details aren't returned by
> determine_rate()).
>
> Eventually, determine_rate() will be called again, but this time for a
> rate of 449035712. The user already knows that we can provide that,
> because we told them (see previous paragraph). But since we're
> truncating when calculating the rate that we'd like the parent to
> provide, we end up asking the parent for 217714284 when we actually need
> it to provide 217714285. So we now *require* the parent to find the
> closest and additionally we must *hope* that the parent is incapable of
> providing the rate that we asked for.

I mean... yeah. It's what abstraction is all about. For all we know, the
parent to pll-mipi could be a crystal that can't change its frequency
and we should deal with that. Or it could be an ideal clock that always
returns the rate you ask for. Or a firmware clock that behaves like an
ideal clock but lies about it :)

It's that clock responsibility to do its best to provide the rate we ask
for.

And if we need to make it behave better, then it's fine too. So your
example is indeed true, but it's more of a case of "let's send another
patch" rather than trying to figure out all possible cases and try to
figure things out accordingly. Because you won't be able to figure out
all possible cases for the current SoCs and the next ones, and the
workloads that people are going to run on those SoCs anyway.

> >> If you carefully look at ccu_mp, you will see that it would ignore
> >> cases when its parent had rounded up. ccu_nkm is no different.
> >> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
> >> totally different beast. For now, sunxi-ng always expects rounding
> >> down.
> >
> > Then change that?
>
> You told me that both over- and undershooting are fine when
> determining the rate, *but also* "it's a bit context specific which one
> we should favour. If we were to do anything, it would be to support both
> and let the clock driver select which behaviour it wants." (see
> https://lore.kernel.org/all/flngzi4henkzcpzwdexencdkw77h52g3nduup7pwctpwfiuznk@eewnnut5mvsq/)
>
> So, I can't just change NKM's parent's default behavior (which is an NM
> clock in my case), because, if I understand correctly, I would have to
> introduce a "ROUND_CLOSEST" flag for NM clocks.

Sure

> But then I feel like I would have to document somewhere that when
> setting CLK_SET_RATE_PARENT for an NKM clock, that the parent clock
> needs to ROUND_CLOSEST, in order to avoid drifting away from the
> requested rate in the successive calls that are made to
> ccu_nkm_determine_rate(), which I tried to explain above and in previous
> messages.

That's kind of what I meant too. Whether "drifting away" is an issue is
context specific too. for some clocks it just doesn't matter. Nobody
ever complained that the register clock of the MMC controller was
drifting away, because it doesn't affect the system in the slightest.

The video clock tree (and possibly others) will be affected though, and
we'll indeed need to add that flag. But we're doing it all the time (and
sometimes get it wrong) for things like which clocks should be left
enabled for example.

> Instead we could introduce the following function, like I suggested in
> V2 of this patchset. The advantage is that it both *documents* the
> dilemma for developers of ccu_nkm and also *avoids* it for users of
> ccu_nkm.
>
> static unsigned long optimal_parent_rate(unsigned long rate, unsigned long n,
> unsigned long k, unsigned long m)
> {
> unsigned long _rate, parent;
>
> // We must first try to find the desired parent rate that is rounded up, because there are
> // cases where truncating makes us miss the requested rate.
> // E.g. rate=449035712, n=11, k=3, m=16
> // When truncating, we'd get parent=217714284 and calculating the rate from that would give
> // us 449035710. When rounding up, we get parent=217714285 which would give us the requested
> // rate of 449035712.
> parent = DIV_ROUND_UP(rate * m, n * k);
>
> // But there are other cases, where rounding up the parent gives us a too high rate.
> // Therefore, we need to check for this case and, if necessary, truncate the parent rate
> // instead of rounding up.
> _rate = parent * n * k / m;
> if (_rate > rate)
> parent = rate * m / (n * k);
> return parent;
> }
>
> And then we ask the parent for that optimal parent rate we just
> calculated. I feel like that's an easy thing to do.

We're back to the trade-off I was mentioning earlier. I'm not against it
on principle. However, if it's not absolutely required, then I don't
think it's a good idea to merge it.

Especially if it's to workaround a parent flag missing. A clock flag
patch is easy to read, write, understand, review, merge and maintain.
It's basically a nop to merge provided the commit log is decent enough.

That function is none of those things.

Maxime


Attachments:
(No filename) (13.59 kB)
signature.asc (235.00 B)
Download all attachments

2023-06-20 19:32:53

by Frank Oltmanns

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

Hi Maxime,

thank you for your detailed response. I really appreciate it!

On 2023-06-19 at 20:05:44 +0200, Maxime Ripard <[email protected]> wrote:
> [[PGP Signed Part:Undecided]]
> On Mon, Jun 19, 2023 at 10:16:26AM +0200, Frank Oltmanns wrote:
>> Hi Maxime,
>>
>> the essence of my following ramblings:
>> - I do think it is reasonable that nkm is asking its parent for the
>> rate that nkm actually needs from said parent to fulfill the request.
>> - I don't think nkm should make assumptions about the rounding
>> behaviour of the parent.
>
> I guess we agree :)
>
> And I would go even further and say that we shouldn't make *any*
> assumption about the behaviour of the parent.
>
>> The reason is, that I want to prevent users of ccu_nkm from making
>> mistakes when defining their clocks (which includes the parent of their
>> nkm clock).
>>
>> Please read below the details on why I think that.
>>
>> [...]
>>
>> >> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
>> >> best rate for rate = 449035712, it should try to include 449035712 in
>> >> its candidates, agreed?
>> >>
>> >> Example 1:
>> >> ==========
>> >> rate=449035712, n=11, k=3, m=16:
>> >> We should as for a parent rate of 217714285, because:
>> >> 217714285 * 11 * 3 / 16 = 449035712
>> >>
>> >> How do we get from 449035712 to 217714285, you ask?
>> >>
>> >> DIV_ROUND_UP(rate * m, n * k)
>> >
>> > Why are we rounding up? I don't think the hardware will round up there.
>>
>> Being a "software guy" it is also my understanding that the hardware
>> does not round up here (or round down for that matter).
>
> That's my understanding as well.
>
>> But anyway, my concern is the rate's representation *in software*. The
>> clk drivers are using unsigned long to represent the actual rate. This
>> is not a lossless representation. In other words, the value (i.e. the
>> software representation) of that rate is, of course, a "lie". The
>> hardware clock is running at some rate that is hopefully close to what
>> we represent in software, but still it's an abstraction.
>>
>> For example, the user (e.g. in my example a panel) asks us for a rate
>> that is represented in softwares as 449035712. Given the values n=11,
>> k=3, m=16, we can *only* represent this value in software if the parent
>> gives us a rate that is represented in software as 217714285. Therefore,
>> I think it is reasonable to ask the parent for that rate (217714285).
>
> I somewhat agree, but I still don't think it's worth rounding up.
>
> If we don't round up (and assuming the parent itself won't round the
> clock), we end up with a rate of 449035710 using the dividers you
> mentioned. It's a .0000005% deviation (I hope I didn't screw up the
> number of 0s). It's negligible for all practical purposes, and it's not
> worth making the code inconsistent and eyebrow raising.
>
>> >> Do you agree that we should ask the parent for 217714285 in case we want
>> >> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
>> >> m=16?
>> >>
>> >> We should not ask for a parent rate of 217714284, because:
>> >> 217714284 * 11 * 3 / 16 = 449035710
>> >>
>> >> Example 2:
>> >> ==========
>> >> rate=500000000, n=11, k=3, m=16:
>> >> Here we should not ask the parent for
>> >> DIV_ROUND_UP(rate * m, n * k)
>> >> because that would be 242424243.
>> >>
>> >> 242424243 * 11 * 3 / 16 = 500000001
>> >>
>> >> We (the NKM clock, not the parent!) would overshoot (please see at the
>> >> end of this mail, why (for now) I don't want to support overshooting in
>> >> the NKM clock).
>> >>
>> >> Instead we should as for a parent rate of 242424242, because:
>> >> 242424242 * 11 * 3 / 16 = 499999999
>> >>
>> >> In conclusion, there are cases, where we (the NKM clock) have to ask the
>> >> parent for
>> >> DIV_ROUND_UP(rate * m, n * k)
>> >> And there are also cases, where we have to ask the parent for
>> >> rate * m / (n * k)
>> >
>> > I mean, I think you're overthinking this.
>> >
>> > If you never round up and mimic how the hardware behaves, and test all
>> > combination, then eventually you'll find the closest rate.
>> >
>> > If you don't because the parent doesn't look for the closest rate, then
>> > the parent should be changed too.
>> >
>> > It really is that simple.
>> >
>> >> This is what the code is trying to do. Maybe it's easier to look at V2
>> >> because I extracted the calcultion of the optimal parent rate into a
>> >> separate function hoping that this makes things clearer.
>> >>
>> >> Let me stress this: When calculating the optimal rate for the parent,
>> >> I'm not making any assumptions here about how the PARENT clock rounds.
>> >> In fact, I assume that the parent could be perfect and always provides
>> >> the rate it is asked for. I only take into account how the nkm clock
>> >> rounds.
>> >
>> > At the very least, you assume that the parent rounding can be "wrong"
>> > and try to work around that.
>>
>> No. I'm not assuming anything about the parent. But I *know* that if we
>> (nkm) want to get a rate that is represented in softwares as 449035712
>> and given the values n=11, k=3, m=16, we (nkm) must get the rate from
>> the parent that the parent represents in software as 217714285, because
>> I know that we (nkm) calculate *our* (nkm) rate using
>> parent * n * k / m
>>
>> So if (!) we want to give the user the rate that they ask for, why not
>> ask the parent for the rate that we need (217714285)?
>>
>> I admit that I'm making assumptions here. My assumptions are that we
>> a. want to at least try to give the user what they asked for
>> b. without making assumptions about the parent's behaviour.
>>
>> Those assumptions could of course be wrong, but, honestly, I would find
>> that confusing.
>
> I guess my point leans more towards the "social" side than the
> mathematical one. If I followed you so far, the precision you expect to
> gain is in the <1Hz range

Ok, just for the sake of correctness: Actually, I'm concerned with the
fact what happens, if someone "forgets" or doesn't know they have to set
the flag and what happens when calling round rate. Because then in the
end we get 449018180 instead of 449035712. That's more than 1 Hz. But
anyway, I understand your point that a gain of precision of the
approximately 0,004% is not worth the additional code. (Actually, I
don't know how precise the clocks are, but I'd guess it could easily be
lower than 99,996%.)

> (and I've been in sick leave for a while, so
> sorry if I didn't before).

I hope you feel better.

> The rate is in the 100MHz range.
>
> So the precision gain is pretty much nothing. Sure, it's closer from a
> mathematical standpoint. But there's zero benefit from it.
>
> However, it comes at the cost of a code that is definitely more
> complicated (or less naive, depending on how you look at it I guess :))
> and will be harder to figure out for someone that jumps into the driver.
>
> So the trade-off doesn't really make fixing it worth it to me.
>

Ok, understood.

>
>> >> > you ask the parent to compute whatever is closest to that optimal parent
>> >> > rate.
>> >> >
>> >> > It's the parent responsibility now. It's the parent decision to figure
>> >> > out what "the closest" means, if it can change rate, if it has any range
>> >> > limitation, etc. You can't work around that.
>> >> >
>> >> > What you actually want there is the parent to actually provide the
>> >> > closest rate, even if it means overshooting.
>> >> >
>> >>
>> >> I want to ask the parent for a rate, that would actually result in the
>> >> rate that nkm_find_best was asked for. Are you asking me to instead ask
>> >> the parent for a rate that doesn't fit that criterion?
>> >
>> > No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
>> > and use whatever value it returned.
>> >
>> > If it requires changing the parent clock to improve its round_rate
>> > behaviour, then do that too.
>> >
>>
>> Hmmm... Okay. So you *are* saying, that I should make changes to the
>> parent so that we do not need to request the exact rate we want from the
>> parent. But I really don't understand why.
>
> No, sorry. I initially thought that you were working around "divider"
> rounding issue (as opposed to integer like you mentionned above) with
> the parent not providing its optimal rate, and you adjusting based on
> that offset.
>
>> As I wrote above, I'm not making any assumptions of how and if the
>> parent rounds. My code is rounding *prior* to asking the parent. Your
>> proposal on the other hand *requires* changing the parent to round
>> closest where mine does not.
>>
>> My concern is, that we could then end up with the situation that someone
>> defines an nkm clock in their SoC which has CLK_SET_RATE_PARENT set, but
>> does not set the ROUND_CLOSEST flag on the parent, because it's not
>> immediately apparent why they should do that.
>
> It's going to happen, and probably happens at the moment already,
> because not only the NKM clocks are affected, but virtually all of them,
> and most don't use ROUND_CLOSEST.
>
> And to some extent, it's fine. We would handle it like any other bug: if
> we ever encounter one, we'll write a fix, backport it to stable and all
> will be fine.

Ok, understood.

>
> You can't figure out all the use-cases we'll require in the future
> anyway.

Well, that much is certain. :)

>
>> Let's assume that hypothetical board were the A64, the nkm clock were pll-mipi,
>> and the parent were pll-video0 and we "forget" to set ROUND_CLOSEST on
>> pll-video0:
>>
>> When pll-mipi nkm clock is asked via determine_rate() for a rate of
>> 449064000 it would return 449035712 and a parent rate of 217714285
>> (using n=11, k=3, m=16, but those details aren't returned by
>> determine_rate()).
>>
>> Eventually, determine_rate() will be called again, but this time for a
>> rate of 449035712. The user already knows that we can provide that,
>> because we told them (see previous paragraph). But since we're
>> truncating when calculating the rate that we'd like the parent to
>> provide, we end up asking the parent for 217714284 when we actually need
>> it to provide 217714285. So we now *require* the parent to find the
>> closest and additionally we must *hope* that the parent is incapable of
>> providing the rate that we asked for.
>
> I mean... yeah. It's what abstraction is all about. For all we know, the
> parent to pll-mipi could be a crystal that can't change its frequency
> and we should deal with that. Or it could be an ideal clock that always
> returns the rate you ask for. Or a firmware clock that behaves like an
> ideal clock but lies about it :)
>
> It's that clock responsibility to do its best to provide the rate we ask
> for.
>
> And if we need to make it behave better, then it's fine too. So your
> example is indeed true, but it's more of a case of "let's send another
> patch" rather than trying to figure out all possible cases and try to
> figure things out accordingly. Because you won't be able to figure out
> all possible cases for the current SoCs and the next ones, and the
> workloads that people are going to run on those SoCs anyway.
>
>> >> If you carefully look at ccu_mp, you will see that it would ignore
>> >> cases when its parent had rounded up. ccu_nkm is no different.
>> >> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
>> >> totally different beast. For now, sunxi-ng always expects rounding
>> >> down.
>> >
>> > Then change that?
>>
>> You told me that both over- and undershooting are fine when
>> determining the rate, *but also* "it's a bit context specific which one
>> we should favour. If we were to do anything, it would be to support both
>> and let the clock driver select which behaviour it wants." (see
>> https://lore.kernel.org/all/flngzi4henkzcpzwdexencdkw77h52g3nduup7pwctpwfiuznk@eewnnut5mvsq/)
>>
>> So, I can't just change NKM's parent's default behavior (which is an NM
>> clock in my case), because, if I understand correctly, I would have to
>> introduce a "ROUND_CLOSEST" flag for NM clocks.
>
> Sure
>
>> But then I feel like I would have to document somewhere that when
>> setting CLK_SET_RATE_PARENT for an NKM clock, that the parent clock
>> needs to ROUND_CLOSEST, in order to avoid drifting away from the
>> requested rate in the successive calls that are made to
>> ccu_nkm_determine_rate(), which I tried to explain above and in previous
>> messages.
>
> That's kind of what I meant too. Whether "drifting away" is an issue is
> context specific too. for some clocks it just doesn't matter. Nobody
> ever complained that the register clock of the MMC controller was
> drifting away, because it doesn't affect the system in the slightest.
>
> The video clock tree (and possibly others) will be affected though, and
> we'll indeed need to add that flag. But we're doing it all the time (and
> sometimes get it wrong) for things like which clocks should be left
> enabled for example.
>
>> Instead we could introduce the following function, like I suggested in
>> V2 of this patchset. The advantage is that it both *documents* the
>> dilemma for developers of ccu_nkm and also *avoids* it for users of
>> ccu_nkm.
>>
>> static unsigned long optimal_parent_rate(unsigned long rate, unsigned long n,
>> unsigned long k, unsigned long m)
>> {
>> unsigned long _rate, parent;
>>
>> // We must first try to find the desired parent rate that is rounded up, because there are
>> // cases where truncating makes us miss the requested rate.
>> // E.g. rate=449035712, n=11, k=3, m=16
>> // When truncating, we'd get parent=217714284 and calculating the rate from that would give
>> // us 449035710. When rounding up, we get parent=217714285 which would give us the requested
>> // rate of 449035712.
>> parent = DIV_ROUND_UP(rate * m, n * k);
>>
>> // But there are other cases, where rounding up the parent gives us a too high rate.
>> // Therefore, we need to check for this case and, if necessary, truncate the parent rate
>> // instead of rounding up.
>> _rate = parent * n * k / m;
>> if (_rate > rate)
>> parent = rate * m / (n * k);
>> return parent;
>> }
>>
>> And then we ask the parent for that optimal parent rate we just
>> calculated. I feel like that's an easy thing to do.
>
> We're back to the trade-off I was mentioning earlier. I'm not against it
> on principle. However, if it's not absolutely required, then I don't
> think it's a good idea to merge it.
>
> Especially if it's to workaround a parent flag missing. A clock flag
> patch is easy to read, write, understand, review, merge and maintain.
> It's basically a nop to merge provided the commit log is decent enough.
>
> That function is none of those things.
>

Ok, thank you for all the explanations! I now feel, we have a common
understanding and that's great.

I will proceed with preparing a V3 of this patchset, that will also
introduce CCU_FEATURE_CLOSEST_RATE and an implementation for ccu_nm,
ccu_nkm and the necessary updates on the (combined) mux clocks and
divisor clocks. The flag will be used in pll-video0 of the A64 and all
of its descendents.

That version will also have a separate function
ccu_nkm_find_best_with_parent_adj, so that it's more similar to the
implementation of e.g. ccu_mp.

Thanks,
Frank

>
> Maxime
>
> [[End of PGP Signed Part]]

2023-06-26 17:24:21

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

On Tue, Jun 20, 2023 at 08:51:27PM +0200, Frank Oltmanns wrote:
> Hi Maxime,
>
> thank you for your detailed response. I really appreciate it!
>
> On 2023-06-19 at 20:05:44 +0200, Maxime Ripard <[email protected]> wrote:
> > [[PGP Signed Part:Undecided]]
> > On Mon, Jun 19, 2023 at 10:16:26AM +0200, Frank Oltmanns wrote:
> >> Hi Maxime,
> >>
> >> the essence of my following ramblings:
> >> - I do think it is reasonable that nkm is asking its parent for the
> >> rate that nkm actually needs from said parent to fulfill the request.
> >> - I don't think nkm should make assumptions about the rounding
> >> behaviour of the parent.
> >
> > I guess we agree :)
> >
> > And I would go even further and say that we shouldn't make *any*
> > assumption about the behaviour of the parent.
> >
> >> The reason is, that I want to prevent users of ccu_nkm from making
> >> mistakes when defining their clocks (which includes the parent of their
> >> nkm clock).
> >>
> >> Please read below the details on why I think that.
> >>
> >> [...]
> >>
> >> >> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
> >> >> best rate for rate = 449035712, it should try to include 449035712 in
> >> >> its candidates, agreed?
> >> >>
> >> >> Example 1:
> >> >> ==========
> >> >> rate=449035712, n=11, k=3, m=16:
> >> >> We should as for a parent rate of 217714285, because:
> >> >> 217714285 * 11 * 3 / 16 = 449035712
> >> >>
> >> >> How do we get from 449035712 to 217714285, you ask?
> >> >>
> >> >> DIV_ROUND_UP(rate * m, n * k)
> >> >
> >> > Why are we rounding up? I don't think the hardware will round up there.
> >>
> >> Being a "software guy" it is also my understanding that the hardware
> >> does not round up here (or round down for that matter).
> >
> > That's my understanding as well.
> >
> >> But anyway, my concern is the rate's representation *in software*. The
> >> clk drivers are using unsigned long to represent the actual rate. This
> >> is not a lossless representation. In other words, the value (i.e. the
> >> software representation) of that rate is, of course, a "lie". The
> >> hardware clock is running at some rate that is hopefully close to what
> >> we represent in software, but still it's an abstraction.
> >>
> >> For example, the user (e.g. in my example a panel) asks us for a rate
> >> that is represented in softwares as 449035712. Given the values n=11,
> >> k=3, m=16, we can *only* represent this value in software if the parent
> >> gives us a rate that is represented in software as 217714285. Therefore,
> >> I think it is reasonable to ask the parent for that rate (217714285).
> >
> > I somewhat agree, but I still don't think it's worth rounding up.
> >
> > If we don't round up (and assuming the parent itself won't round the
> > clock), we end up with a rate of 449035710 using the dividers you
> > mentioned. It's a .0000005% deviation (I hope I didn't screw up the
> > number of 0s). It's negligible for all practical purposes, and it's not
> > worth making the code inconsistent and eyebrow raising.
> >
> >> >> Do you agree that we should ask the parent for 217714285 in case we want
> >> >> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
> >> >> m=16?
> >> >>
> >> >> We should not ask for a parent rate of 217714284, because:
> >> >> 217714284 * 11 * 3 / 16 = 449035710
> >> >>
> >> >> Example 2:
> >> >> ==========
> >> >> rate=500000000, n=11, k=3, m=16:
> >> >> Here we should not ask the parent for
> >> >> DIV_ROUND_UP(rate * m, n * k)
> >> >> because that would be 242424243.
> >> >>
> >> >> 242424243 * 11 * 3 / 16 = 500000001
> >> >>
> >> >> We (the NKM clock, not the parent!) would overshoot (please see at the
> >> >> end of this mail, why (for now) I don't want to support overshooting in
> >> >> the NKM clock).
> >> >>
> >> >> Instead we should as for a parent rate of 242424242, because:
> >> >> 242424242 * 11 * 3 / 16 = 499999999
> >> >>
> >> >> In conclusion, there are cases, where we (the NKM clock) have to ask the
> >> >> parent for
> >> >> DIV_ROUND_UP(rate * m, n * k)
> >> >> And there are also cases, where we have to ask the parent for
> >> >> rate * m / (n * k)
> >> >
> >> > I mean, I think you're overthinking this.
> >> >
> >> > If you never round up and mimic how the hardware behaves, and test all
> >> > combination, then eventually you'll find the closest rate.
> >> >
> >> > If you don't because the parent doesn't look for the closest rate, then
> >> > the parent should be changed too.
> >> >
> >> > It really is that simple.
> >> >
> >> >> This is what the code is trying to do. Maybe it's easier to look at V2
> >> >> because I extracted the calcultion of the optimal parent rate into a
> >> >> separate function hoping that this makes things clearer.
> >> >>
> >> >> Let me stress this: When calculating the optimal rate for the parent,
> >> >> I'm not making any assumptions here about how the PARENT clock rounds.
> >> >> In fact, I assume that the parent could be perfect and always provides
> >> >> the rate it is asked for. I only take into account how the nkm clock
> >> >> rounds.
> >> >
> >> > At the very least, you assume that the parent rounding can be "wrong"
> >> > and try to work around that.
> >>
> >> No. I'm not assuming anything about the parent. But I *know* that if we
> >> (nkm) want to get a rate that is represented in softwares as 449035712
> >> and given the values n=11, k=3, m=16, we (nkm) must get the rate from
> >> the parent that the parent represents in software as 217714285, because
> >> I know that we (nkm) calculate *our* (nkm) rate using
> >> parent * n * k / m
> >>
> >> So if (!) we want to give the user the rate that they ask for, why not
> >> ask the parent for the rate that we need (217714285)?
> >>
> >> I admit that I'm making assumptions here. My assumptions are that we
> >> a. want to at least try to give the user what they asked for
> >> b. without making assumptions about the parent's behaviour.
> >>
> >> Those assumptions could of course be wrong, but, honestly, I would find
> >> that confusing.
> >
> > I guess my point leans more towards the "social" side than the
> > mathematical one. If I followed you so far, the precision you expect to
> > gain is in the <1Hz range
>
> Ok, just for the sake of correctness: Actually, I'm concerned with the
> fact what happens, if someone "forgets" or doesn't know they have to set
> the flag and what happens when calling round rate.

I understand your concern, but unfortunately, there's no real way to
defend against that. You could make the same point about the parent
clock being wrong, the critical clocks not being flagged properly, etc.

At the end of the day, the clock drivers have to be properly defined to
behave properly.

We have to trust it.

> Because then in the end we get 449018180 instead of 449035712. That's
> more than 1 Hz. But anyway, I understand your point that a gain of
> precision of the approximately 0,004% is not worth the additional
> code. (Actually, I don't know how precise the clocks are, but I'd
> guess it could easily be lower than 99,996%.)
>
> > (and I've been in sick leave for a while, so sorry if I didn't
> > before).
>
> I hope you feel better.

I do, thanks for asking :)

> > The rate is in the 100MHz range.
> >
> > So the precision gain is pretty much nothing. Sure, it's closer from a
> > mathematical standpoint. But there's zero benefit from it.
> >
> > However, it comes at the cost of a code that is definitely more
> > complicated (or less naive, depending on how you look at it I guess :))
> > and will be harder to figure out for someone that jumps into the driver.
> >
> > So the trade-off doesn't really make fixing it worth it to me.
> >
>
> Ok, understood.
>
> >
> >> >> > you ask the parent to compute whatever is closest to that optimal parent
> >> >> > rate.
> >> >> >
> >> >> > It's the parent responsibility now. It's the parent decision to figure
> >> >> > out what "the closest" means, if it can change rate, if it has any range
> >> >> > limitation, etc. You can't work around that.
> >> >> >
> >> >> > What you actually want there is the parent to actually provide the
> >> >> > closest rate, even if it means overshooting.
> >> >> >
> >> >>
> >> >> I want to ask the parent for a rate, that would actually result in the
> >> >> rate that nkm_find_best was asked for. Are you asking me to instead ask
> >> >> the parent for a rate that doesn't fit that criterion?
> >> >
> >> > No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
> >> > and use whatever value it returned.
> >> >
> >> > If it requires changing the parent clock to improve its round_rate
> >> > behaviour, then do that too.
> >> >
> >>
> >> Hmmm... Okay. So you *are* saying, that I should make changes to the
> >> parent so that we do not need to request the exact rate we want from the
> >> parent. But I really don't understand why.
> >
> > No, sorry. I initially thought that you were working around "divider"
> > rounding issue (as opposed to integer like you mentionned above) with
> > the parent not providing its optimal rate, and you adjusting based on
> > that offset.
> >
> >> As I wrote above, I'm not making any assumptions of how and if the
> >> parent rounds. My code is rounding *prior* to asking the parent. Your
> >> proposal on the other hand *requires* changing the parent to round
> >> closest where mine does not.
> >>
> >> My concern is, that we could then end up with the situation that someone
> >> defines an nkm clock in their SoC which has CLK_SET_RATE_PARENT set, but
> >> does not set the ROUND_CLOSEST flag on the parent, because it's not
> >> immediately apparent why they should do that.
> >
> > It's going to happen, and probably happens at the moment already,
> > because not only the NKM clocks are affected, but virtually all of them,
> > and most don't use ROUND_CLOSEST.
> >
> > And to some extent, it's fine. We would handle it like any other bug: if
> > we ever encounter one, we'll write a fix, backport it to stable and all
> > will be fine.
>
> Ok, understood.
>
> >
> > You can't figure out all the use-cases we'll require in the future
> > anyway.
>
> Well, that much is certain. :)
>
> >
> >> Let's assume that hypothetical board were the A64, the nkm clock were pll-mipi,
> >> and the parent were pll-video0 and we "forget" to set ROUND_CLOSEST on
> >> pll-video0:
> >>
> >> When pll-mipi nkm clock is asked via determine_rate() for a rate of
> >> 449064000 it would return 449035712 and a parent rate of 217714285
> >> (using n=11, k=3, m=16, but those details aren't returned by
> >> determine_rate()).
> >>
> >> Eventually, determine_rate() will be called again, but this time for a
> >> rate of 449035712. The user already knows that we can provide that,
> >> because we told them (see previous paragraph). But since we're
> >> truncating when calculating the rate that we'd like the parent to
> >> provide, we end up asking the parent for 217714284 when we actually need
> >> it to provide 217714285. So we now *require* the parent to find the
> >> closest and additionally we must *hope* that the parent is incapable of
> >> providing the rate that we asked for.
> >
> > I mean... yeah. It's what abstraction is all about. For all we know, the
> > parent to pll-mipi could be a crystal that can't change its frequency
> > and we should deal with that. Or it could be an ideal clock that always
> > returns the rate you ask for. Or a firmware clock that behaves like an
> > ideal clock but lies about it :)
> >
> > It's that clock responsibility to do its best to provide the rate we ask
> > for.
> >
> > And if we need to make it behave better, then it's fine too. So your
> > example is indeed true, but it's more of a case of "let's send another
> > patch" rather than trying to figure out all possible cases and try to
> > figure things out accordingly. Because you won't be able to figure out
> > all possible cases for the current SoCs and the next ones, and the
> > workloads that people are going to run on those SoCs anyway.
> >
> >> >> If you carefully look at ccu_mp, you will see that it would ignore
> >> >> cases when its parent had rounded up. ccu_nkm is no different.
> >> >> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
> >> >> totally different beast. For now, sunxi-ng always expects rounding
> >> >> down.
> >> >
> >> > Then change that?
> >>
> >> You told me that both over- and undershooting are fine when
> >> determining the rate, *but also* "it's a bit context specific which one
> >> we should favour. If we were to do anything, it would be to support both
> >> and let the clock driver select which behaviour it wants." (see
> >> https://lore.kernel.org/all/flngzi4henkzcpzwdexencdkw77h52g3nduup7pwctpwfiuznk@eewnnut5mvsq/)
> >>
> >> So, I can't just change NKM's parent's default behavior (which is an NM
> >> clock in my case), because, if I understand correctly, I would have to
> >> introduce a "ROUND_CLOSEST" flag for NM clocks.
> >
> > Sure
> >
> >> But then I feel like I would have to document somewhere that when
> >> setting CLK_SET_RATE_PARENT for an NKM clock, that the parent clock
> >> needs to ROUND_CLOSEST, in order to avoid drifting away from the
> >> requested rate in the successive calls that are made to
> >> ccu_nkm_determine_rate(), which I tried to explain above and in previous
> >> messages.
> >
> > That's kind of what I meant too. Whether "drifting away" is an issue is
> > context specific too. for some clocks it just doesn't matter. Nobody
> > ever complained that the register clock of the MMC controller was
> > drifting away, because it doesn't affect the system in the slightest.
> >
> > The video clock tree (and possibly others) will be affected though, and
> > we'll indeed need to add that flag. But we're doing it all the time (and
> > sometimes get it wrong) for things like which clocks should be left
> > enabled for example.
> >
> >> Instead we could introduce the following function, like I suggested in
> >> V2 of this patchset. The advantage is that it both *documents* the
> >> dilemma for developers of ccu_nkm and also *avoids* it for users of
> >> ccu_nkm.
> >>
> >> static unsigned long optimal_parent_rate(unsigned long rate, unsigned long n,
> >> unsigned long k, unsigned long m)
> >> {
> >> unsigned long _rate, parent;
> >>
> >> // We must first try to find the desired parent rate that is rounded up, because there are
> >> // cases where truncating makes us miss the requested rate.
> >> // E.g. rate=449035712, n=11, k=3, m=16
> >> // When truncating, we'd get parent=217714284 and calculating the rate from that would give
> >> // us 449035710. When rounding up, we get parent=217714285 which would give us the requested
> >> // rate of 449035712.
> >> parent = DIV_ROUND_UP(rate * m, n * k);
> >>
> >> // But there are other cases, where rounding up the parent gives us a too high rate.
> >> // Therefore, we need to check for this case and, if necessary, truncate the parent rate
> >> // instead of rounding up.
> >> _rate = parent * n * k / m;
> >> if (_rate > rate)
> >> parent = rate * m / (n * k);
> >> return parent;
> >> }
> >>
> >> And then we ask the parent for that optimal parent rate we just
> >> calculated. I feel like that's an easy thing to do.
> >
> > We're back to the trade-off I was mentioning earlier. I'm not against it
> > on principle. However, if it's not absolutely required, then I don't
> > think it's a good idea to merge it.
> >
> > Especially if it's to workaround a parent flag missing. A clock flag
> > patch is easy to read, write, understand, review, merge and maintain.
> > It's basically a nop to merge provided the commit log is decent enough.
> >
> > That function is none of those things.
> >
>
> Ok, thank you for all the explanations! I now feel, we have a common
> understanding and that's great.
>
> I will proceed with preparing a V3 of this patchset, that will also
> introduce CCU_FEATURE_CLOSEST_RATE and an implementation for ccu_nm,
> ccu_nkm and the necessary updates on the (combined) mux clocks and
> divisor clocks. The flag will be used in pll-video0 of the A64 and all
> of its descendents.
>
> That version will also have a separate function
> ccu_nkm_find_best_with_parent_adj, so that it's more similar to the
> implementation of e.g. ccu_mp.

Awesome, thanks

Maxime


Attachments:
(No filename) (16.42 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-12 05:16:04

by Frank Oltmanns

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

Hi Maxime,

On 2023-06-19 at 20:05:44 +0200, Maxime Ripard <[email protected]> wrote:
> [[PGP Signed Part:Undecided]]
> On Mon, Jun 19, 2023 at 10:16:26AM +0200, Frank Oltmanns wrote:
>> Hi Maxime,
>>
>> the essence of my following ramblings:
>> - I do think it is reasonable that nkm is asking its parent for the
>> rate that nkm actually needs from said parent to fulfill the request.
>> - I don't think nkm should make assumptions about the rounding
>> behaviour of the parent.
>
> I guess we agree :)
>
> And I would go even further and say that we shouldn't make *any*
> assumption about the behaviour of the parent.
>
>> The reason is, that I want to prevent users of ccu_nkm from making
>> mistakes when defining their clocks (which includes the parent of their
>> nkm clock).
>>
>> Please read below the details on why I think that.
>>
>> [...]
>>
>> >> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
>> >> best rate for rate = 449035712, it should try to include 449035712 in
>> >> its candidates, agreed?
>> >>
>> >> Example 1:
>> >> ==========
>> >> rate=449035712, n=11, k=3, m=16:
>> >> We should as for a parent rate of 217714285, because:
>> >> 217714285 * 11 * 3 / 16 = 449035712
>> >>
>> >> How do we get from 449035712 to 217714285, you ask?
>> >>
>> >> DIV_ROUND_UP(rate * m, n * k)
>> >
>> > Why are we rounding up? I don't think the hardware will round up there.
>>
>> Being a "software guy" it is also my understanding that the hardware
>> does not round up here (or round down for that matter).
>
> That's my understanding as well.
>
>> But anyway, my concern is the rate's representation *in software*. The
>> clk drivers are using unsigned long to represent the actual rate. This
>> is not a lossless representation. In other words, the value (i.e. the
>> software representation) of that rate is, of course, a "lie". The
>> hardware clock is running at some rate that is hopefully close to what
>> we represent in software, but still it's an abstraction.
>>
>> For example, the user (e.g. in my example a panel) asks us for a rate
>> that is represented in softwares as 449035712. Given the values n=11,
>> k=3, m=16, we can *only* represent this value in software if the parent
>> gives us a rate that is represented in software as 217714285. Therefore,
>> I think it is reasonable to ask the parent for that rate (217714285).
>
> I somewhat agree, but I still don't think it's worth rounding up.
>
> If we don't round up (and assuming the parent itself won't round the
> clock), we end up with a rate of 449035710 using the dividers you
> mentioned. It's a .0000005% deviation (I hope I didn't screw up the
> number of 0s). It's negligible for all practical purposes, and it's not
> worth making the code inconsistent and eyebrow raising.
>
>> >> Do you agree that we should ask the parent for 217714285 in case we want
>> >> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
>> >> m=16?
>> >>
>> >> We should not ask for a parent rate of 217714284, because:
>> >> 217714284 * 11 * 3 / 16 = 449035710
>> >>
>> >> Example 2:
>> >> ==========
>> >> rate=500000000, n=11, k=3, m=16:
>> >> Here we should not ask the parent for
>> >> DIV_ROUND_UP(rate * m, n * k)
>> >> because that would be 242424243.
>> >>
>> >> 242424243 * 11 * 3 / 16 = 500000001
>> >>
>> >> We (the NKM clock, not the parent!) would overshoot (please see at the
>> >> end of this mail, why (for now) I don't want to support overshooting in
>> >> the NKM clock).
>> >>
>> >> Instead we should as for a parent rate of 242424242, because:
>> >> 242424242 * 11 * 3 / 16 = 499999999
>> >>
>> >> In conclusion, there are cases, where we (the NKM clock) have to ask the
>> >> parent for
>> >> DIV_ROUND_UP(rate * m, n * k)
>> >> And there are also cases, where we have to ask the parent for
>> >> rate * m / (n * k)
>> >
>> > I mean, I think you're overthinking this.
>> >
>> > If you never round up and mimic how the hardware behaves, and test all
>> > combination, then eventually you'll find the closest rate.
>> >
>> > If you don't because the parent doesn't look for the closest rate, then
>> > the parent should be changed too.
>> >
>> > It really is that simple.
>> >
>> >> This is what the code is trying to do. Maybe it's easier to look at V2
>> >> because I extracted the calcultion of the optimal parent rate into a
>> >> separate function hoping that this makes things clearer.
>> >>
>> >> Let me stress this: When calculating the optimal rate for the parent,
>> >> I'm not making any assumptions here about how the PARENT clock rounds.
>> >> In fact, I assume that the parent could be perfect and always provides
>> >> the rate it is asked for. I only take into account how the nkm clock
>> >> rounds.
>> >
>> > At the very least, you assume that the parent rounding can be "wrong"
>> > and try to work around that.
>>
>> No. I'm not assuming anything about the parent. But I *know* that if we
>> (nkm) want to get a rate that is represented in softwares as 449035712
>> and given the values n=11, k=3, m=16, we (nkm) must get the rate from
>> the parent that the parent represents in software as 217714285, because
>> I know that we (nkm) calculate *our* (nkm) rate using
>> parent * n * k / m
>>
>> So if (!) we want to give the user the rate that they ask for, why not
>> ask the parent for the rate that we need (217714285)?
>>
>> I admit that I'm making assumptions here. My assumptions are that we
>> a. want to at least try to give the user what they asked for
>> b. without making assumptions about the parent's behaviour.
>>
>> Those assumptions could of course be wrong, but, honestly, I would find
>> that confusing.
>
> I guess my point leans more towards the "social" side than the
> mathematical one. If I followed you so far, the precision you expect to
> gain is in the <1Hz range (and I've been in sick leave for a while, so
> sorry if I didn't before). The rate is in the 100MHz range.
>
> So the precision gain is pretty much nothing. Sure, it's closer from a
> mathematical standpoint. But there's zero benefit from it.
>
> However, it comes at the cost of a code that is definitely more
> complicated (or less naive, depending on how you look at it I guess :))
> and will be harder to figure out for someone that jumps into the driver.
>
> So the trade-off doesn't really make fixing it worth it to me.
>
>> >> > you ask the parent to compute whatever is closest to that optimal parent
>> >> > rate.
>> >> >
>> >> > It's the parent responsibility now. It's the parent decision to figure
>> >> > out what "the closest" means, if it can change rate, if it has any range
>> >> > limitation, etc. You can't work around that.
>> >> >
>> >> > What you actually want there is the parent to actually provide the
>> >> > closest rate, even if it means overshooting.
>> >> >
>> >>
>> >> I want to ask the parent for a rate, that would actually result in the
>> >> rate that nkm_find_best was asked for. Are you asking me to instead ask
>> >> the parent for a rate that doesn't fit that criterion?
>> >
>> > No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
>> > and use whatever value it returned.
>> >
>> > If it requires changing the parent clock to improve its round_rate
>> > behaviour, then do that too.
>> >
>>
>> Hmmm... Okay. So you *are* saying, that I should make changes to the
>> parent so that we do not need to request the exact rate we want from the
>> parent. But I really don't understand why.
>
> No, sorry. I initially thought that you were working around "divider"
> rounding issue (as opposed to integer like you mentionned above) with
> the parent not providing its optimal rate, and you adjusting based on
> that offset.
>
>> As I wrote above, I'm not making any assumptions of how and if the
>> parent rounds. My code is rounding *prior* to asking the parent. Your
>> proposal on the other hand *requires* changing the parent to round
>> closest where mine does not.
>>
>> My concern is, that we could then end up with the situation that someone
>> defines an nkm clock in their SoC which has CLK_SET_RATE_PARENT set, but
>> does not set the ROUND_CLOSEST flag on the parent, because it's not
>> immediately apparent why they should do that.
>
> It's going to happen, and probably happens at the moment already,
> because not only the NKM clocks are affected, but virtually all of them,
> and most don't use ROUND_CLOSEST.
>
> And to some extent, it's fine. We would handle it like any other bug: if
> we ever encounter one, we'll write a fix, backport it to stable and all
> will be fine.
>
> You can't figure out all the use-cases we'll require in the future
> anyway.
>
>> Let's assume that hypothetical board were the A64, the nkm clock were pll-mipi,
>> and the parent were pll-video0 and we "forget" to set ROUND_CLOSEST on
>> pll-video0:
>>
>> When pll-mipi nkm clock is asked via determine_rate() for a rate of
>> 449064000 it would return 449035712 and a parent rate of 217714285
>> (using n=11, k=3, m=16, but those details aren't returned by
>> determine_rate()).
>>
>> Eventually, determine_rate() will be called again, but this time for a
>> rate of 449035712. The user already knows that we can provide that,
>> because we told them (see previous paragraph). But since we're
>> truncating when calculating the rate that we'd like the parent to
>> provide, we end up asking the parent for 217714284 when we actually need
>> it to provide 217714285. So we now *require* the parent to find the
>> closest and additionally we must *hope* that the parent is incapable of
>> providing the rate that we asked for.
>
> I mean... yeah. It's what abstraction is all about. For all we know, the
> parent to pll-mipi could be a crystal that can't change its frequency
> and we should deal with that. Or it could be an ideal clock that always
> returns the rate you ask for. Or a firmware clock that behaves like an
> ideal clock but lies about it :)
>
> It's that clock responsibility to do its best to provide the rate we ask
> for.
>
> And if we need to make it behave better, then it's fine too. So your
> example is indeed true, but it's more of a case of "let's send another
> patch" rather than trying to figure out all possible cases and try to
> figure things out accordingly. Because you won't be able to figure out
> all possible cases for the current SoCs and the next ones, and the
> workloads that people are going to run on those SoCs anyway.
>
>> >> If you carefully look at ccu_mp, you will see that it would ignore
>> >> cases when its parent had rounded up. ccu_nkm is no different.
>> >> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
>> >> totally different beast. For now, sunxi-ng always expects rounding
>> >> down.
>> >
>> > Then change that?
>>
>> You told me that both over- and undershooting are fine when
>> determining the rate, *but also* "it's a bit context specific which one
>> we should favour. If we were to do anything, it would be to support both
>> and let the clock driver select which behaviour it wants." (see
>> https://lore.kernel.org/all/flngzi4henkzcpzwdexencdkw77h52g3nduup7pwctpwfiuznk@eewnnut5mvsq/)
>>
>> So, I can't just change NKM's parent's default behavior (which is an NM
>> clock in my case), because, if I understand correctly, I would have to
>> introduce a "ROUND_CLOSEST" flag for NM clocks.
>
> Sure
>
>> But then I feel like I would have to document somewhere that when
>> setting CLK_SET_RATE_PARENT for an NKM clock, that the parent clock
>> needs to ROUND_CLOSEST, in order to avoid drifting away from the
>> requested rate in the successive calls that are made to
>> ccu_nkm_determine_rate(), which I tried to explain above and in previous
>> messages.
>
> That's kind of what I meant too. Whether "drifting away" is an issue is
> context specific too. for some clocks it just doesn't matter. Nobody
> ever complained that the register clock of the MMC controller was
> drifting away, because it doesn't affect the system in the slightest.
>
> The video clock tree (and possibly others) will be affected though, and
> we'll indeed need to add that flag. But we're doing it all the time (and
> sometimes get it wrong) for things like which clocks should be left
> enabled for example.

I'm afraid we have to re-visit this decision. I found a case, where the
drifting causes a problem.

Setting pll-mipi's SET_PARENT_RATE flag, but not setting the tree's
CCU_FEATURE_CLOSEST_RATE flag results in the following tree:

clock rate
-----------------------------------
pll-video0 201000000
hdmi-phy-clk 50250000
hdmi 201000000
tcon1 201000000
pll-mipi 414562500
tcon0 414562500
tcon-data-clock 138187500

Note, that tcon-data-clock's rate is garbage. It should be tcon0/4, but
it is tcon0/3.

I added some logging to ccu_find_best*() to understand, as to why that
is:

ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
ccu_nkm_find_best_with_parent_adj: rate=414577776, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
ccu_nkm_find_best_with_parent_adj: rate=414562500, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
ccu_nkm_find_best: rate=414562500, best_rate=414562500, parent_rate=201000000, n=11, k=3, m=16

We can see that the rate is drifting over the successive calls. We've
seen it before and deemed it no big deal.

To highlight the issue a bit more, I added some logging at the end of
sun4i_dclk_round_rate() and sun4i_dclk_set_rate.

ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4
ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4

Here we can see that sun4i_dclk now has determined that 103644444 is its
best rate, based on the parent rate of 414577776.

But now, the nkm clock pll-mipi changes its mind and thinks that it
cannot provide 414577776 any more, instead it wants to provide
414562500. Unfortunately, the dclk is never informed about this
new rate...

ccu_nkm_find_best_with_parent_adj: rate=414577776, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
ccu_nkm_find_best_with_parent_adj: rate=414562500, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
ccu_nkm_find_best: rate=414562500, best_rate=414562500, parent_rate=201000000, n=11, k=3, m=16
sun4i_dclk_set_rate: rate=103644444, parent=414562500, div=3

... so, when setting the rate, the dclk still thinks it should set to
103644444, but with a parent of 414562500 instead of 414577776 this
results in a divisor of 3 instead of 4.

The worst part is: Even though you and me discussed in length about how
the nkm's parent must support rounding to the closest rate, I still
didn't realize that this was the reason that tcon-data-clock was so far
from it's target rate. Luckily, I knew where to add my debug statements,
because I knew what I changed. I would expect someone else, who wasn't
involved in this, to waste hours on finding "their" mistake (I really
think it's *our* mistake).

I guess, what I'm trying to say here is: I *really*, *really* think that
setting an nkm clock's SET_PARENT_RATE flag should not depend on the
parent to have CCU_FEATURE_CLOSEST_RATE set. Instead we should use the
function optimal_parent_rate() that I proposed below - or something with
the same effect.

Best regards,
Frank

>
>> Instead we could introduce the following function, like I suggested in
>> V2 of this patchset. The advantage is that it both *documents* the
>> dilemma for developers of ccu_nkm and also *avoids* it for users of
>> ccu_nkm.
>>
>> static unsigned long optimal_parent_rate(unsigned long rate, unsigned long n,
>> unsigned long k, unsigned long m)
>> {
>> unsigned long _rate, parent;
>>
>> // We must first try to find the desired parent rate that is rounded up, because there are
>> // cases where truncating makes us miss the requested rate.
>> // E.g. rate=449035712, n=11, k=3, m=16
>> // When truncating, we'd get parent=217714284 and calculating the rate from that would give
>> // us 449035710. When rounding up, we get parent=217714285 which would give us the requested
>> // rate of 449035712.
>> parent = DIV_ROUND_UP(rate * m, n * k);
>>
>> // But there are other cases, where rounding up the parent gives us a too high rate.
>> // Therefore, we need to check for this case and, if necessary, truncate the parent rate
>> // instead of rounding up.
>> _rate = parent * n * k / m;
>> if (_rate > rate)
>> parent = rate * m / (n * k);
>> return parent;
>> }
>>
>> And then we ask the parent for that optimal parent rate we just
>> calculated. I feel like that's an easy thing to do.
>
> We're back to the trade-off I was mentioning earlier. I'm not against it
> on principle. However, if it's not absolutely required, then I don't
> think it's a good idea to merge it.
>
> Especially if it's to workaround a parent flag missing. A clock flag
> patch is easy to read, write, understand, review, merge and maintain.
> It's basically a nop to merge provided the commit log is decent enough.
>
> That function is none of those things.
>
> Maxime
>
> [[End of PGP Signed Part]]

2023-07-17 14:59:44

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

On Wed, Jul 12, 2023 at 06:39:56AM +0200, Frank Oltmanns wrote:
> Hi Maxime,
>
> On 2023-06-19 at 20:05:44 +0200, Maxime Ripard <[email protected]> wrote:
> > [[PGP Signed Part:Undecided]]
> > On Mon, Jun 19, 2023 at 10:16:26AM +0200, Frank Oltmanns wrote:
> >> Hi Maxime,
> >>
> >> the essence of my following ramblings:
> >> - I do think it is reasonable that nkm is asking its parent for the
> >> rate that nkm actually needs from said parent to fulfill the request.
> >> - I don't think nkm should make assumptions about the rounding
> >> behaviour of the parent.
> >
> > I guess we agree :)
> >
> > And I would go even further and say that we shouldn't make *any*
> > assumption about the behaviour of the parent.
> >
> >> The reason is, that I want to prevent users of ccu_nkm from making
> >> mistakes when defining their clocks (which includes the parent of their
> >> nkm clock).
> >>
> >> Please read below the details on why I think that.
> >>
> >> [...]
> >>
> >> >> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
> >> >> best rate for rate = 449035712, it should try to include 449035712 in
> >> >> its candidates, agreed?
> >> >>
> >> >> Example 1:
> >> >> ==========
> >> >> rate=449035712, n=11, k=3, m=16:
> >> >> We should as for a parent rate of 217714285, because:
> >> >> 217714285 * 11 * 3 / 16 = 449035712
> >> >>
> >> >> How do we get from 449035712 to 217714285, you ask?
> >> >>
> >> >> DIV_ROUND_UP(rate * m, n * k)
> >> >
> >> > Why are we rounding up? I don't think the hardware will round up there.
> >>
> >> Being a "software guy" it is also my understanding that the hardware
> >> does not round up here (or round down for that matter).
> >
> > That's my understanding as well.
> >
> >> But anyway, my concern is the rate's representation *in software*. The
> >> clk drivers are using unsigned long to represent the actual rate. This
> >> is not a lossless representation. In other words, the value (i.e. the
> >> software representation) of that rate is, of course, a "lie". The
> >> hardware clock is running at some rate that is hopefully close to what
> >> we represent in software, but still it's an abstraction.
> >>
> >> For example, the user (e.g. in my example a panel) asks us for a rate
> >> that is represented in softwares as 449035712. Given the values n=11,
> >> k=3, m=16, we can *only* represent this value in software if the parent
> >> gives us a rate that is represented in software as 217714285. Therefore,
> >> I think it is reasonable to ask the parent for that rate (217714285).
> >
> > I somewhat agree, but I still don't think it's worth rounding up.
> >
> > If we don't round up (and assuming the parent itself won't round the
> > clock), we end up with a rate of 449035710 using the dividers you
> > mentioned. It's a .0000005% deviation (I hope I didn't screw up the
> > number of 0s). It's negligible for all practical purposes, and it's not
> > worth making the code inconsistent and eyebrow raising.
> >
> >> >> Do you agree that we should ask the parent for 217714285 in case we want
> >> >> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
> >> >> m=16?
> >> >>
> >> >> We should not ask for a parent rate of 217714284, because:
> >> >> 217714284 * 11 * 3 / 16 = 449035710
> >> >>
> >> >> Example 2:
> >> >> ==========
> >> >> rate=500000000, n=11, k=3, m=16:
> >> >> Here we should not ask the parent for
> >> >> DIV_ROUND_UP(rate * m, n * k)
> >> >> because that would be 242424243.
> >> >>
> >> >> 242424243 * 11 * 3 / 16 = 500000001
> >> >>
> >> >> We (the NKM clock, not the parent!) would overshoot (please see at the
> >> >> end of this mail, why (for now) I don't want to support overshooting in
> >> >> the NKM clock).
> >> >>
> >> >> Instead we should as for a parent rate of 242424242, because:
> >> >> 242424242 * 11 * 3 / 16 = 499999999
> >> >>
> >> >> In conclusion, there are cases, where we (the NKM clock) have to ask the
> >> >> parent for
> >> >> DIV_ROUND_UP(rate * m, n * k)
> >> >> And there are also cases, where we have to ask the parent for
> >> >> rate * m / (n * k)
> >> >
> >> > I mean, I think you're overthinking this.
> >> >
> >> > If you never round up and mimic how the hardware behaves, and test all
> >> > combination, then eventually you'll find the closest rate.
> >> >
> >> > If you don't because the parent doesn't look for the closest rate, then
> >> > the parent should be changed too.
> >> >
> >> > It really is that simple.
> >> >
> >> >> This is what the code is trying to do. Maybe it's easier to look at V2
> >> >> because I extracted the calcultion of the optimal parent rate into a
> >> >> separate function hoping that this makes things clearer.
> >> >>
> >> >> Let me stress this: When calculating the optimal rate for the parent,
> >> >> I'm not making any assumptions here about how the PARENT clock rounds.
> >> >> In fact, I assume that the parent could be perfect and always provides
> >> >> the rate it is asked for. I only take into account how the nkm clock
> >> >> rounds.
> >> >
> >> > At the very least, you assume that the parent rounding can be "wrong"
> >> > and try to work around that.
> >>
> >> No. I'm not assuming anything about the parent. But I *know* that if we
> >> (nkm) want to get a rate that is represented in softwares as 449035712
> >> and given the values n=11, k=3, m=16, we (nkm) must get the rate from
> >> the parent that the parent represents in software as 217714285, because
> >> I know that we (nkm) calculate *our* (nkm) rate using
> >> parent * n * k / m
> >>
> >> So if (!) we want to give the user the rate that they ask for, why not
> >> ask the parent for the rate that we need (217714285)?
> >>
> >> I admit that I'm making assumptions here. My assumptions are that we
> >> a. want to at least try to give the user what they asked for
> >> b. without making assumptions about the parent's behaviour.
> >>
> >> Those assumptions could of course be wrong, but, honestly, I would find
> >> that confusing.
> >
> > I guess my point leans more towards the "social" side than the
> > mathematical one. If I followed you so far, the precision you expect to
> > gain is in the <1Hz range (and I've been in sick leave for a while, so
> > sorry if I didn't before). The rate is in the 100MHz range.
> >
> > So the precision gain is pretty much nothing. Sure, it's closer from a
> > mathematical standpoint. But there's zero benefit from it.
> >
> > However, it comes at the cost of a code that is definitely more
> > complicated (or less naive, depending on how you look at it I guess :))
> > and will be harder to figure out for someone that jumps into the driver.
> >
> > So the trade-off doesn't really make fixing it worth it to me.
> >
> >> >> > you ask the parent to compute whatever is closest to that optimal parent
> >> >> > rate.
> >> >> >
> >> >> > It's the parent responsibility now. It's the parent decision to figure
> >> >> > out what "the closest" means, if it can change rate, if it has any range
> >> >> > limitation, etc. You can't work around that.
> >> >> >
> >> >> > What you actually want there is the parent to actually provide the
> >> >> > closest rate, even if it means overshooting.
> >> >> >
> >> >>
> >> >> I want to ask the parent for a rate, that would actually result in the
> >> >> rate that nkm_find_best was asked for. Are you asking me to instead ask
> >> >> the parent for a rate that doesn't fit that criterion?
> >> >
> >> > No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
> >> > and use whatever value it returned.
> >> >
> >> > If it requires changing the parent clock to improve its round_rate
> >> > behaviour, then do that too.
> >> >
> >>
> >> Hmmm... Okay. So you *are* saying, that I should make changes to the
> >> parent so that we do not need to request the exact rate we want from the
> >> parent. But I really don't understand why.
> >
> > No, sorry. I initially thought that you were working around "divider"
> > rounding issue (as opposed to integer like you mentionned above) with
> > the parent not providing its optimal rate, and you adjusting based on
> > that offset.
> >
> >> As I wrote above, I'm not making any assumptions of how and if the
> >> parent rounds. My code is rounding *prior* to asking the parent. Your
> >> proposal on the other hand *requires* changing the parent to round
> >> closest where mine does not.
> >>
> >> My concern is, that we could then end up with the situation that someone
> >> defines an nkm clock in their SoC which has CLK_SET_RATE_PARENT set, but
> >> does not set the ROUND_CLOSEST flag on the parent, because it's not
> >> immediately apparent why they should do that.
> >
> > It's going to happen, and probably happens at the moment already,
> > because not only the NKM clocks are affected, but virtually all of them,
> > and most don't use ROUND_CLOSEST.
> >
> > And to some extent, it's fine. We would handle it like any other bug: if
> > we ever encounter one, we'll write a fix, backport it to stable and all
> > will be fine.
> >
> > You can't figure out all the use-cases we'll require in the future
> > anyway.
> >
> >> Let's assume that hypothetical board were the A64, the nkm clock were pll-mipi,
> >> and the parent were pll-video0 and we "forget" to set ROUND_CLOSEST on
> >> pll-video0:
> >>
> >> When pll-mipi nkm clock is asked via determine_rate() for a rate of
> >> 449064000 it would return 449035712 and a parent rate of 217714285
> >> (using n=11, k=3, m=16, but those details aren't returned by
> >> determine_rate()).
> >>
> >> Eventually, determine_rate() will be called again, but this time for a
> >> rate of 449035712. The user already knows that we can provide that,
> >> because we told them (see previous paragraph). But since we're
> >> truncating when calculating the rate that we'd like the parent to
> >> provide, we end up asking the parent for 217714284 when we actually need
> >> it to provide 217714285. So we now *require* the parent to find the
> >> closest and additionally we must *hope* that the parent is incapable of
> >> providing the rate that we asked for.
> >
> > I mean... yeah. It's what abstraction is all about. For all we know, the
> > parent to pll-mipi could be a crystal that can't change its frequency
> > and we should deal with that. Or it could be an ideal clock that always
> > returns the rate you ask for. Or a firmware clock that behaves like an
> > ideal clock but lies about it :)
> >
> > It's that clock responsibility to do its best to provide the rate we ask
> > for.
> >
> > And if we need to make it behave better, then it's fine too. So your
> > example is indeed true, but it's more of a case of "let's send another
> > patch" rather than trying to figure out all possible cases and try to
> > figure things out accordingly. Because you won't be able to figure out
> > all possible cases for the current SoCs and the next ones, and the
> > workloads that people are going to run on those SoCs anyway.
> >
> >> >> If you carefully look at ccu_mp, you will see that it would ignore
> >> >> cases when its parent had rounded up. ccu_nkm is no different.
> >> >> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
> >> >> totally different beast. For now, sunxi-ng always expects rounding
> >> >> down.
> >> >
> >> > Then change that?
> >>
> >> You told me that both over- and undershooting are fine when
> >> determining the rate, *but also* "it's a bit context specific which one
> >> we should favour. If we were to do anything, it would be to support both
> >> and let the clock driver select which behaviour it wants." (see
> >> https://lore.kernel.org/all/flngzi4henkzcpzwdexencdkw77h52g3nduup7pwctpwfiuznk@eewnnut5mvsq/)
> >>
> >> So, I can't just change NKM's parent's default behavior (which is an NM
> >> clock in my case), because, if I understand correctly, I would have to
> >> introduce a "ROUND_CLOSEST" flag for NM clocks.
> >
> > Sure
> >
> >> But then I feel like I would have to document somewhere that when
> >> setting CLK_SET_RATE_PARENT for an NKM clock, that the parent clock
> >> needs to ROUND_CLOSEST, in order to avoid drifting away from the
> >> requested rate in the successive calls that are made to
> >> ccu_nkm_determine_rate(), which I tried to explain above and in previous
> >> messages.
> >
> > That's kind of what I meant too. Whether "drifting away" is an issue is
> > context specific too. for some clocks it just doesn't matter. Nobody
> > ever complained that the register clock of the MMC controller was
> > drifting away, because it doesn't affect the system in the slightest.
> >
> > The video clock tree (and possibly others) will be affected though, and
> > we'll indeed need to add that flag. But we're doing it all the time (and
> > sometimes get it wrong) for things like which clocks should be left
> > enabled for example.
>
> I'm afraid we have to re-visit this decision. I found a case, where the
> drifting causes a problem.

I'm sure it can cause a lot of issues everywhere. My point was that the
solution is to add the flag so the issue goes away, and not to try to
workaround a driver that might or not have the flag. We should assume
it's properly set, and properly set it.

> Setting pll-mipi's SET_PARENT_RATE flag, but not setting the tree's
> CCU_FEATURE_CLOSEST_RATE flag results in the following tree:
>
> clock rate
> -----------------------------------
> pll-video0 201000000
> hdmi-phy-clk 50250000
> hdmi 201000000
> tcon1 201000000
> pll-mipi 414562500
> tcon0 414562500
> tcon-data-clock 138187500
>
> Note, that tcon-data-clock's rate is garbage. It should be tcon0/4, but
> it is tcon0/3.
>
> I added some logging to ccu_find_best*() to understand, as to why that
> is:
>
> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
> ccu_nkm_find_best_with_parent_adj: rate=414577776, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
> ccu_nkm_find_best_with_parent_adj: rate=414562500, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
> ccu_nkm_find_best: rate=414562500, best_rate=414562500, parent_rate=201000000, n=11, k=3, m=16
>
> We can see that the rate is drifting over the successive calls. We've
> seen it before and deemed it no big deal.
>
> To highlight the issue a bit more, I added some logging at the end of
> sun4i_dclk_round_rate() and sun4i_dclk_set_rate.
>
> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
> sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4
> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
> sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4
>
> Here we can see that sun4i_dclk now has determined that 103644444 is its
> best rate, based on the parent rate of 414577776.
>
> But now, the nkm clock pll-mipi changes its mind and thinks that it
> cannot provide 414577776 any more, instead it wants to provide
> 414562500.

That's a bit surprising, but not entirely. For example, one of the
parent clock of our parent might have changed rate between our
round_rate and set_rate calls.

Why does it change its mind?

Maxime


Attachments:
(No filename) (15.68 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-23 09:23:44

by Frank Oltmanns

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

Hi Maxime,

On 2023-07-17 at 16:06:02 +0200, Maxime Ripard <[email protected]> wrote:
> [[PGP Signed Part:Undecided]]
> On Wed, Jul 12, 2023 at 06:39:56AM +0200, Frank Oltmanns wrote:
>> Hi Maxime,
>>
>> On 2023-06-19 at 20:05:44 +0200, Maxime Ripard <[email protected]> wrote:
>> > [[PGP Signed Part:Undecided]]
>> > On Mon, Jun 19, 2023 at 10:16:26AM +0200, Frank Oltmanns wrote:
>> >> Hi Maxime,
>> >>
>> >> the essence of my following ramblings:
>> >> - I do think it is reasonable that nkm is asking its parent for the
>> >> rate that nkm actually needs from said parent to fulfill the request.
>> >> - I don't think nkm should make assumptions about the rounding
>> >> behaviour of the parent.
>> >
>> > I guess we agree :)
>> >
>> > And I would go even further and say that we shouldn't make *any*
>> > assumption about the behaviour of the parent.
>> >
>> >> The reason is, that I want to prevent users of ccu_nkm from making
>> >> mistakes when defining their clocks (which includes the parent of their
>> >> nkm clock).
>> >>
>> >> Please read below the details on why I think that.
>> >>
>> >> [...]
>> >>
>> >> >> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
>> >> >> best rate for rate = 449035712, it should try to include 449035712 in
>> >> >> its candidates, agreed?
>> >> >>
>> >> >> Example 1:
>> >> >> ==========
>> >> >> rate=449035712, n=11, k=3, m=16:
>> >> >> We should as for a parent rate of 217714285, because:
>> >> >> 217714285 * 11 * 3 / 16 = 449035712
>> >> >>
>> >> >> How do we get from 449035712 to 217714285, you ask?
>> >> >>
>> >> >> DIV_ROUND_UP(rate * m, n * k)
>> >> >
>> >> > Why are we rounding up? I don't think the hardware will round up there.
>> >>
>> >> Being a "software guy" it is also my understanding that the hardware
>> >> does not round up here (or round down for that matter).
>> >
>> > That's my understanding as well.
>> >
>> >> But anyway, my concern is the rate's representation *in software*. The
>> >> clk drivers are using unsigned long to represent the actual rate. This
>> >> is not a lossless representation. In other words, the value (i.e. the
>> >> software representation) of that rate is, of course, a "lie". The
>> >> hardware clock is running at some rate that is hopefully close to what
>> >> we represent in software, but still it's an abstraction.
>> >>
>> >> For example, the user (e.g. in my example a panel) asks us for a rate
>> >> that is represented in softwares as 449035712. Given the values n=11,
>> >> k=3, m=16, we can *only* represent this value in software if the parent
>> >> gives us a rate that is represented in software as 217714285. Therefore,
>> >> I think it is reasonable to ask the parent for that rate (217714285).
>> >
>> > I somewhat agree, but I still don't think it's worth rounding up.
>> >
>> > If we don't round up (and assuming the parent itself won't round the
>> > clock), we end up with a rate of 449035710 using the dividers you
>> > mentioned. It's a .0000005% deviation (I hope I didn't screw up the
>> > number of 0s). It's negligible for all practical purposes, and it's not
>> > worth making the code inconsistent and eyebrow raising.
>> >
>> >> >> Do you agree that we should ask the parent for 217714285 in case we want
>> >> >> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
>> >> >> m=16?
>> >> >>
>> >> >> We should not ask for a parent rate of 217714284, because:
>> >> >> 217714284 * 11 * 3 / 16 = 449035710
>> >> >>
>> >> >> Example 2:
>> >> >> ==========
>> >> >> rate=500000000, n=11, k=3, m=16:
>> >> >> Here we should not ask the parent for
>> >> >> DIV_ROUND_UP(rate * m, n * k)
>> >> >> because that would be 242424243.
>> >> >>
>> >> >> 242424243 * 11 * 3 / 16 = 500000001
>> >> >>
>> >> >> We (the NKM clock, not the parent!) would overshoot (please see at the
>> >> >> end of this mail, why (for now) I don't want to support overshooting in
>> >> >> the NKM clock).
>> >> >>
>> >> >> Instead we should as for a parent rate of 242424242, because:
>> >> >> 242424242 * 11 * 3 / 16 = 499999999
>> >> >>
>> >> >> In conclusion, there are cases, where we (the NKM clock) have to ask the
>> >> >> parent for
>> >> >> DIV_ROUND_UP(rate * m, n * k)
>> >> >> And there are also cases, where we have to ask the parent for
>> >> >> rate * m / (n * k)
>> >> >
>> >> > I mean, I think you're overthinking this.
>> >> >
>> >> > If you never round up and mimic how the hardware behaves, and test all
>> >> > combination, then eventually you'll find the closest rate.
>> >> >
>> >> > If you don't because the parent doesn't look for the closest rate, then
>> >> > the parent should be changed too.
>> >> >
>> >> > It really is that simple.
>> >> >
>> >> >> This is what the code is trying to do. Maybe it's easier to look at V2
>> >> >> because I extracted the calcultion of the optimal parent rate into a
>> >> >> separate function hoping that this makes things clearer.
>> >> >>
>> >> >> Let me stress this: When calculating the optimal rate for the parent,
>> >> >> I'm not making any assumptions here about how the PARENT clock rounds.
>> >> >> In fact, I assume that the parent could be perfect and always provides
>> >> >> the rate it is asked for. I only take into account how the nkm clock
>> >> >> rounds.
>> >> >
>> >> > At the very least, you assume that the parent rounding can be "wrong"
>> >> > and try to work around that.
>> >>
>> >> No. I'm not assuming anything about the parent. But I *know* that if we
>> >> (nkm) want to get a rate that is represented in softwares as 449035712
>> >> and given the values n=11, k=3, m=16, we (nkm) must get the rate from
>> >> the parent that the parent represents in software as 217714285, because
>> >> I know that we (nkm) calculate *our* (nkm) rate using
>> >> parent * n * k / m
>> >>
>> >> So if (!) we want to give the user the rate that they ask for, why not
>> >> ask the parent for the rate that we need (217714285)?
>> >>
>> >> I admit that I'm making assumptions here. My assumptions are that we
>> >> a. want to at least try to give the user what they asked for
>> >> b. without making assumptions about the parent's behaviour.
>> >>
>> >> Those assumptions could of course be wrong, but, honestly, I would find
>> >> that confusing.
>> >
>> > I guess my point leans more towards the "social" side than the
>> > mathematical one. If I followed you so far, the precision you expect to
>> > gain is in the <1Hz range (and I've been in sick leave for a while, so
>> > sorry if I didn't before). The rate is in the 100MHz range.
>> >
>> > So the precision gain is pretty much nothing. Sure, it's closer from a
>> > mathematical standpoint. But there's zero benefit from it.
>> >
>> > However, it comes at the cost of a code that is definitely more
>> > complicated (or less naive, depending on how you look at it I guess :))
>> > and will be harder to figure out for someone that jumps into the driver.
>> >
>> > So the trade-off doesn't really make fixing it worth it to me.
>> >
>> >> >> > you ask the parent to compute whatever is closest to that optimal parent
>> >> >> > rate.
>> >> >> >
>> >> >> > It's the parent responsibility now. It's the parent decision to figure
>> >> >> > out what "the closest" means, if it can change rate, if it has any range
>> >> >> > limitation, etc. You can't work around that.
>> >> >> >
>> >> >> > What you actually want there is the parent to actually provide the
>> >> >> > closest rate, even if it means overshooting.
>> >> >> >
>> >> >>
>> >> >> I want to ask the parent for a rate, that would actually result in the
>> >> >> rate that nkm_find_best was asked for. Are you asking me to instead ask
>> >> >> the parent for a rate that doesn't fit that criterion?
>> >> >
>> >> > No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
>> >> > and use whatever value it returned.
>> >> >
>> >> > If it requires changing the parent clock to improve its round_rate
>> >> > behaviour, then do that too.
>> >> >
>> >>
>> >> Hmmm... Okay. So you *are* saying, that I should make changes to the
>> >> parent so that we do not need to request the exact rate we want from the
>> >> parent. But I really don't understand why.
>> >
>> > No, sorry. I initially thought that you were working around "divider"
>> > rounding issue (as opposed to integer like you mentionned above) with
>> > the parent not providing its optimal rate, and you adjusting based on
>> > that offset.
>> >
>> >> As I wrote above, I'm not making any assumptions of how and if the
>> >> parent rounds. My code is rounding *prior* to asking the parent. Your
>> >> proposal on the other hand *requires* changing the parent to round
>> >> closest where mine does not.
>> >>
>> >> My concern is, that we could then end up with the situation that someone
>> >> defines an nkm clock in their SoC which has CLK_SET_RATE_PARENT set, but
>> >> does not set the ROUND_CLOSEST flag on the parent, because it's not
>> >> immediately apparent why they should do that.
>> >
>> > It's going to happen, and probably happens at the moment already,
>> > because not only the NKM clocks are affected, but virtually all of them,
>> > and most don't use ROUND_CLOSEST.
>> >
>> > And to some extent, it's fine. We would handle it like any other bug: if
>> > we ever encounter one, we'll write a fix, backport it to stable and all
>> > will be fine.
>> >
>> > You can't figure out all the use-cases we'll require in the future
>> > anyway.
>> >
>> >> Let's assume that hypothetical board were the A64, the nkm clock were pll-mipi,
>> >> and the parent were pll-video0 and we "forget" to set ROUND_CLOSEST on
>> >> pll-video0:
>> >>
>> >> When pll-mipi nkm clock is asked via determine_rate() for a rate of
>> >> 449064000 it would return 449035712 and a parent rate of 217714285
>> >> (using n=11, k=3, m=16, but those details aren't returned by
>> >> determine_rate()).
>> >>
>> >> Eventually, determine_rate() will be called again, but this time for a
>> >> rate of 449035712. The user already knows that we can provide that,
>> >> because we told them (see previous paragraph). But since we're
>> >> truncating when calculating the rate that we'd like the parent to
>> >> provide, we end up asking the parent for 217714284 when we actually need
>> >> it to provide 217714285. So we now *require* the parent to find the
>> >> closest and additionally we must *hope* that the parent is incapable of
>> >> providing the rate that we asked for.
>> >
>> > I mean... yeah. It's what abstraction is all about. For all we know, the
>> > parent to pll-mipi could be a crystal that can't change its frequency
>> > and we should deal with that. Or it could be an ideal clock that always
>> > returns the rate you ask for. Or a firmware clock that behaves like an
>> > ideal clock but lies about it :)
>> >
>> > It's that clock responsibility to do its best to provide the rate we ask
>> > for.
>> >
>> > And if we need to make it behave better, then it's fine too. So your
>> > example is indeed true, but it's more of a case of "let's send another
>> > patch" rather than trying to figure out all possible cases and try to
>> > figure things out accordingly. Because you won't be able to figure out
>> > all possible cases for the current SoCs and the next ones, and the
>> > workloads that people are going to run on those SoCs anyway.
>> >
>> >> >> If you carefully look at ccu_mp, you will see that it would ignore
>> >> >> cases when its parent had rounded up. ccu_nkm is no different.
>> >> >> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
>> >> >> totally different beast. For now, sunxi-ng always expects rounding
>> >> >> down.
>> >> >
>> >> > Then change that?
>> >>
>> >> You told me that both over- and undershooting are fine when
>> >> determining the rate, *but also* "it's a bit context specific which one
>> >> we should favour. If we were to do anything, it would be to support both
>> >> and let the clock driver select which behaviour it wants." (see
>> >> https://lore.kernel.org/all/flngzi4henkzcpzwdexencdkw77h52g3nduup7pwctpwfiuznk@eewnnut5mvsq/)
>> >>
>> >> So, I can't just change NKM's parent's default behavior (which is an NM
>> >> clock in my case), because, if I understand correctly, I would have to
>> >> introduce a "ROUND_CLOSEST" flag for NM clocks.
>> >
>> > Sure
>> >
>> >> But then I feel like I would have to document somewhere that when
>> >> setting CLK_SET_RATE_PARENT for an NKM clock, that the parent clock
>> >> needs to ROUND_CLOSEST, in order to avoid drifting away from the
>> >> requested rate in the successive calls that are made to
>> >> ccu_nkm_determine_rate(), which I tried to explain above and in previous
>> >> messages.
>> >
>> > That's kind of what I meant too. Whether "drifting away" is an issue is
>> > context specific too. for some clocks it just doesn't matter. Nobody
>> > ever complained that the register clock of the MMC controller was
>> > drifting away, because it doesn't affect the system in the slightest.
>> >
>> > The video clock tree (and possibly others) will be affected though, and
>> > we'll indeed need to add that flag. But we're doing it all the time (and
>> > sometimes get it wrong) for things like which clocks should be left
>> > enabled for example.
>>
>> I'm afraid we have to re-visit this decision. I found a case, where the
>> drifting causes a problem.
>
> I'm sure it can cause a lot of issues everywhere. My point was that the
> solution is to add the flag so the issue goes away, and not to try to
> workaround a driver that might or not have the flag. We should assume
> it's properly set, and properly set it.

I want you to be aware that this might result in a situation that will
waste many hours of development time for people trying to use the
SET_PARENT_RATE flag on NKM clocks in their SoCs. Because it has
surprising side effects that I lay out below.

How do we tell developers that they *must* use CCU_ROUND closest on the
whole clk branch starting at whe nkm clock's parent if they want to use
SET_PARENT_RATE on a NKM clock. They must do it from the beginning or
they will need to start chasing errors. This could easily prevented by
applying this patch [1].

Because, if they don't the following will happen:

>> Setting pll-mipi's SET_PARENT_RATE flag, but not setting the tree's
>> CCU_FEATURE_CLOSEST_RATE flag results in the following tree:
>>
>> clock rate
>> -----------------------------------
>> pll-video0 201000000
>> hdmi-phy-clk 50250000
>> hdmi 201000000
>> tcon1 201000000
>> pll-mipi 414562500
>> tcon0 414562500
>> tcon-data-clock 138187500
>>
>> Note, that tcon-data-clock's rate is garbage. It should be tcon0/4, but
>> it is tcon0/3.

^
|
Now, *that* is surprising, isn't it? Well at least for me it was. I set
the SET_PARENT_RATE on pll-mipi and suddenly tcon-data-clock is garbage.
How can that be?

Ok, let's start debugging:

>>
>> I added some logging to ccu_find_best*() to understand, as to why that
>> is:
>>
>> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
>> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
>> ccu_nkm_find_best_with_parent_adj: rate=414577776, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
>> ccu_nkm_find_best_with_parent_adj: rate=414562500, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
>> ccu_nkm_find_best: rate=414562500, best_rate=414562500, parent_rate=201000000, n=11, k=3, m=16
>>
>> We can see that the rate is drifting over the successive calls. We've
>> seen it before and deemed it no big deal.
>>
>> To highlight the issue a bit more, I added some logging at the end of
>> sun4i_dclk_round_rate() and sun4i_dclk_set_rate.
>>
>> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
>> sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4
>> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
>> sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4
>>
>> Here we can see that sun4i_dclk now has determined that 103644444 is its
>> best rate, based on the parent rate of 414577776.
>>
>> But now, the nkm clock pll-mipi changes its mind and thinks that it
>> cannot provide 414577776 any more, instead it wants to provide
>> 414562500.
>
> That's a bit surprising, but not entirely.

The drifting itself is not suprising at all to me. It is the topic of
this whole mail thread. :-)

> For example, one of the
> parent clock of our parent might have changed rate between our
> round_rate and set_rate calls.

That's not the reason here. The reason is that I added the
SET_PARENT_RATE flag, but did not add CCU_FEATURE_ROUND_CLOSEST to
pll-mipi's parent and all its descendants.

*I* know, that I shouldn't do that. But will Jane and Joe SoC-Developer?
How do we tell them? Will the necessary documentation really be easier
to maintain than the 24 line patch [1] I submitted?

Again, without the patch NKM clocks are making assumptions about the
parent clock. They now depend on:
a. The parent clock rounding to the closest requested rate.
b. The parent clock not supporting the rate the NKM requests.

If either of the two is not true, it will break tcon-data-clock.

> Why does it change its mind?

Because the parent does not round to the closest rate.

This is what happens, if the parent does not fulfill the two
requirements:

1. tcon0 (or someone else, this is just an example) asks the NKM clock
(pll-mipi) for rate A (e.g. 414600000).
2. The NKM clock tries out all combinations of N, K, and M and each
time asks the parent for a rate, that is close to the "optimal"
parent rate.
3. At some point the NKM clock asks its parent for rate B (282681818).
4. The parent will respond that it doesn't support rate B, but rate B'
(282666666) could be used.
5. Using that information, NKM tells the clk framework that it can't
provide rate A (414600000), but it could provide A' (414577776).
6. The clk framework tells tcon0 that. tcon0 agrees and says: "Fine,
clk framework, let's go with rate A'."
7. clk framework asks the NKM clock, for rate A' (414577776).
8. Like in step 2 the NKM goes through all combinations and asks the
parent for a rate that is optimal to the "optimal" parent rate.
9. Without the patch [1] at some point the NKM clock will ask the
parent for B'' (282666665) (!) due to integer rounding.
11. The parent will respond that it doesn't support rate B'', but rate
B''' could be used. (I don't know what rate that is, but it's now
so bad that the previous combination of N, K, and M is no longer
the best combination.)
12. Using that information, NKM tells the clk framework that it can't
provide rate A' (414577776), but it could provide A'' (414562500).
That's what I meant, that the nkm clock "changed its mind".

Actually, steps 7-12 are performed a few times and each time the rate
could get a little bit worse. Not by much, so you said it's not worth
adding patch [1], because "there's zero benefit from it". [2]

The new information I'm trying to convey here, is that tcon0 has based
its round_rate result on A' because the clk framework told it so.
Apparently, the clk framework expects that when a clock claims it can
provide rate A' when asked for rate A, that it will also respond with A'
when asked for A'. I don't think that's an unreasonable requirement for
clocks [footnote A]. But the NKM clock does not always fulfill it, if it
can set its parent rate but doesn't have patch [1].

This is not some kind of race condition as you described above. It
happens 100% of the time when a clock (tcon0) asks an nkm clock
(pll-mipi) for an "unfortunate" rate and the nkm's parent (pll-video0)
does not support rounding up.

"Unfortunate" is a rate where the best parent rate can not be calculated
by the simple formula:
rate * m / (n * k)

You wrote in [2]:
> We're back to the trade-off I was mentioning earlier. I'm not against
> it on principle. However, if it's not absolutely required, then I
> don't think it's a good idea to merge it.
>
> Especially if it's to workaround a parent flag missing. A clock flag
> patch is easy to read, write, understand, review, merge and maintain.
> It's basically a nop to merge provided the commit log is decent
> enough.

I'm at a point where I have to disagree that a clock flag patch is easy
to write and understand. You asked me, why the clock is changing its
mind and we had already discussed it a lot. I don't see why this
discussion would be different in 6 months or 2 years. IMO the discussion
we're having is not a "nop". And for a developer to find out that they
need to set the parent's ROUND_CLOSEST flag is not a "nop" either.

The patch [1] makes the life of SoC developers easier while it makes the
nkm clock maintainer's life harder. So the question is, how much easier
is the developer's life (a lot, IMO) and how much harder is the
maintainer's life (a little, IMO).

In conclusion, for me applying patch [1] is the best option we have. It
not only prevents but also documents the issue in a clean way.

Thanks again for your time. And again I'm sorry for being so annoyingly
persistent and verbose.

BR,
Frank

[1]: https://lore.kernel.org/all/[email protected]/
[2]: https://lore.kernel.org/all/yj6ss64s7p2uaslodj5zklrwhegz54bgh4l4wmldv6cccggepz@yombds4hij3c/

[footnote A]: I assume, this is a design choice and not a bug. If it's a
bug, we should of course fix it.

>
> Maxime
>
> [[End of PGP Signed Part]]

2023-07-25 14:22:10

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

On Sun, Jul 23, 2023 at 10:59:03AM +0200, Frank Oltmanns wrote:
> Hi Maxime,
>
> On 2023-07-17 at 16:06:02 +0200, Maxime Ripard <[email protected]> wrote:
> > [[PGP Signed Part:Undecided]]
> > On Wed, Jul 12, 2023 at 06:39:56AM +0200, Frank Oltmanns wrote:
> >> Hi Maxime,
> >>
> >> On 2023-06-19 at 20:05:44 +0200, Maxime Ripard <[email protected]> wrote:
> >> > [[PGP Signed Part:Undecided]]
> >> > On Mon, Jun 19, 2023 at 10:16:26AM +0200, Frank Oltmanns wrote:
> >> >> Hi Maxime,
> >> >>
> >> >> the essence of my following ramblings:
> >> >> - I do think it is reasonable that nkm is asking its parent for the
> >> >> rate that nkm actually needs from said parent to fulfill the request.
> >> >> - I don't think nkm should make assumptions about the rounding
> >> >> behaviour of the parent.
> >> >
> >> > I guess we agree :)
> >> >
> >> > And I would go even further and say that we shouldn't make *any*
> >> > assumption about the behaviour of the parent.
> >> >
> >> >> The reason is, that I want to prevent users of ccu_nkm from making
> >> >> mistakes when defining their clocks (which includes the parent of their
> >> >> nkm clock).
> >> >>
> >> >> Please read below the details on why I think that.
> >> >>
> >> >> [...]
> >> >>
> >> >> >> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
> >> >> >> best rate for rate = 449035712, it should try to include 449035712 in
> >> >> >> its candidates, agreed?
> >> >> >>
> >> >> >> Example 1:
> >> >> >> ==========
> >> >> >> rate=449035712, n=11, k=3, m=16:
> >> >> >> We should as for a parent rate of 217714285, because:
> >> >> >> 217714285 * 11 * 3 / 16 = 449035712
> >> >> >>
> >> >> >> How do we get from 449035712 to 217714285, you ask?
> >> >> >>
> >> >> >> DIV_ROUND_UP(rate * m, n * k)
> >> >> >
> >> >> > Why are we rounding up? I don't think the hardware will round up there.
> >> >>
> >> >> Being a "software guy" it is also my understanding that the hardware
> >> >> does not round up here (or round down for that matter).
> >> >
> >> > That's my understanding as well.
> >> >
> >> >> But anyway, my concern is the rate's representation *in software*. The
> >> >> clk drivers are using unsigned long to represent the actual rate. This
> >> >> is not a lossless representation. In other words, the value (i.e. the
> >> >> software representation) of that rate is, of course, a "lie". The
> >> >> hardware clock is running at some rate that is hopefully close to what
> >> >> we represent in software, but still it's an abstraction.
> >> >>
> >> >> For example, the user (e.g. in my example a panel) asks us for a rate
> >> >> that is represented in softwares as 449035712. Given the values n=11,
> >> >> k=3, m=16, we can *only* represent this value in software if the parent
> >> >> gives us a rate that is represented in software as 217714285. Therefore,
> >> >> I think it is reasonable to ask the parent for that rate (217714285).
> >> >
> >> > I somewhat agree, but I still don't think it's worth rounding up.
> >> >
> >> > If we don't round up (and assuming the parent itself won't round the
> >> > clock), we end up with a rate of 449035710 using the dividers you
> >> > mentioned. It's a .0000005% deviation (I hope I didn't screw up the
> >> > number of 0s). It's negligible for all practical purposes, and it's not
> >> > worth making the code inconsistent and eyebrow raising.
> >> >
> >> >> >> Do you agree that we should ask the parent for 217714285 in case we want
> >> >> >> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
> >> >> >> m=16?
> >> >> >>
> >> >> >> We should not ask for a parent rate of 217714284, because:
> >> >> >> 217714284 * 11 * 3 / 16 = 449035710
> >> >> >>
> >> >> >> Example 2:
> >> >> >> ==========
> >> >> >> rate=500000000, n=11, k=3, m=16:
> >> >> >> Here we should not ask the parent for
> >> >> >> DIV_ROUND_UP(rate * m, n * k)
> >> >> >> because that would be 242424243.
> >> >> >>
> >> >> >> 242424243 * 11 * 3 / 16 = 500000001
> >> >> >>
> >> >> >> We (the NKM clock, not the parent!) would overshoot (please see at the
> >> >> >> end of this mail, why (for now) I don't want to support overshooting in
> >> >> >> the NKM clock).
> >> >> >>
> >> >> >> Instead we should as for a parent rate of 242424242, because:
> >> >> >> 242424242 * 11 * 3 / 16 = 499999999
> >> >> >>
> >> >> >> In conclusion, there are cases, where we (the NKM clock) have to ask the
> >> >> >> parent for
> >> >> >> DIV_ROUND_UP(rate * m, n * k)
> >> >> >> And there are also cases, where we have to ask the parent for
> >> >> >> rate * m / (n * k)
> >> >> >
> >> >> > I mean, I think you're overthinking this.
> >> >> >
> >> >> > If you never round up and mimic how the hardware behaves, and test all
> >> >> > combination, then eventually you'll find the closest rate.
> >> >> >
> >> >> > If you don't because the parent doesn't look for the closest rate, then
> >> >> > the parent should be changed too.
> >> >> >
> >> >> > It really is that simple.
> >> >> >
> >> >> >> This is what the code is trying to do. Maybe it's easier to look at V2
> >> >> >> because I extracted the calcultion of the optimal parent rate into a
> >> >> >> separate function hoping that this makes things clearer.
> >> >> >>
> >> >> >> Let me stress this: When calculating the optimal rate for the parent,
> >> >> >> I'm not making any assumptions here about how the PARENT clock rounds.
> >> >> >> In fact, I assume that the parent could be perfect and always provides
> >> >> >> the rate it is asked for. I only take into account how the nkm clock
> >> >> >> rounds.
> >> >> >
> >> >> > At the very least, you assume that the parent rounding can be "wrong"
> >> >> > and try to work around that.
> >> >>
> >> >> No. I'm not assuming anything about the parent. But I *know* that if we
> >> >> (nkm) want to get a rate that is represented in softwares as 449035712
> >> >> and given the values n=11, k=3, m=16, we (nkm) must get the rate from
> >> >> the parent that the parent represents in software as 217714285, because
> >> >> I know that we (nkm) calculate *our* (nkm) rate using
> >> >> parent * n * k / m
> >> >>
> >> >> So if (!) we want to give the user the rate that they ask for, why not
> >> >> ask the parent for the rate that we need (217714285)?
> >> >>
> >> >> I admit that I'm making assumptions here. My assumptions are that we
> >> >> a. want to at least try to give the user what they asked for
> >> >> b. without making assumptions about the parent's behaviour.
> >> >>
> >> >> Those assumptions could of course be wrong, but, honestly, I would find
> >> >> that confusing.
> >> >
> >> > I guess my point leans more towards the "social" side than the
> >> > mathematical one. If I followed you so far, the precision you expect to
> >> > gain is in the <1Hz range (and I've been in sick leave for a while, so
> >> > sorry if I didn't before). The rate is in the 100MHz range.
> >> >
> >> > So the precision gain is pretty much nothing. Sure, it's closer from a
> >> > mathematical standpoint. But there's zero benefit from it.
> >> >
> >> > However, it comes at the cost of a code that is definitely more
> >> > complicated (or less naive, depending on how you look at it I guess :))
> >> > and will be harder to figure out for someone that jumps into the driver.
> >> >
> >> > So the trade-off doesn't really make fixing it worth it to me.
> >> >
> >> >> >> > you ask the parent to compute whatever is closest to that optimal parent
> >> >> >> > rate.
> >> >> >> >
> >> >> >> > It's the parent responsibility now. It's the parent decision to figure
> >> >> >> > out what "the closest" means, if it can change rate, if it has any range
> >> >> >> > limitation, etc. You can't work around that.
> >> >> >> >
> >> >> >> > What you actually want there is the parent to actually provide the
> >> >> >> > closest rate, even if it means overshooting.
> >> >> >> >
> >> >> >>
> >> >> >> I want to ask the parent for a rate, that would actually result in the
> >> >> >> rate that nkm_find_best was asked for. Are you asking me to instead ask
> >> >> >> the parent for a rate that doesn't fit that criterion?
> >> >> >
> >> >> > No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
> >> >> > and use whatever value it returned.
> >> >> >
> >> >> > If it requires changing the parent clock to improve its round_rate
> >> >> > behaviour, then do that too.
> >> >> >
> >> >>
> >> >> Hmmm... Okay. So you *are* saying, that I should make changes to the
> >> >> parent so that we do not need to request the exact rate we want from the
> >> >> parent. But I really don't understand why.
> >> >
> >> > No, sorry. I initially thought that you were working around "divider"
> >> > rounding issue (as opposed to integer like you mentionned above) with
> >> > the parent not providing its optimal rate, and you adjusting based on
> >> > that offset.
> >> >
> >> >> As I wrote above, I'm not making any assumptions of how and if the
> >> >> parent rounds. My code is rounding *prior* to asking the parent. Your
> >> >> proposal on the other hand *requires* changing the parent to round
> >> >> closest where mine does not.
> >> >>
> >> >> My concern is, that we could then end up with the situation that someone
> >> >> defines an nkm clock in their SoC which has CLK_SET_RATE_PARENT set, but
> >> >> does not set the ROUND_CLOSEST flag on the parent, because it's not
> >> >> immediately apparent why they should do that.
> >> >
> >> > It's going to happen, and probably happens at the moment already,
> >> > because not only the NKM clocks are affected, but virtually all of them,
> >> > and most don't use ROUND_CLOSEST.
> >> >
> >> > And to some extent, it's fine. We would handle it like any other bug: if
> >> > we ever encounter one, we'll write a fix, backport it to stable and all
> >> > will be fine.
> >> >
> >> > You can't figure out all the use-cases we'll require in the future
> >> > anyway.
> >> >
> >> >> Let's assume that hypothetical board were the A64, the nkm clock were pll-mipi,
> >> >> and the parent were pll-video0 and we "forget" to set ROUND_CLOSEST on
> >> >> pll-video0:
> >> >>
> >> >> When pll-mipi nkm clock is asked via determine_rate() for a rate of
> >> >> 449064000 it would return 449035712 and a parent rate of 217714285
> >> >> (using n=11, k=3, m=16, but those details aren't returned by
> >> >> determine_rate()).
> >> >>
> >> >> Eventually, determine_rate() will be called again, but this time for a
> >> >> rate of 449035712. The user already knows that we can provide that,
> >> >> because we told them (see previous paragraph). But since we're
> >> >> truncating when calculating the rate that we'd like the parent to
> >> >> provide, we end up asking the parent for 217714284 when we actually need
> >> >> it to provide 217714285. So we now *require* the parent to find the
> >> >> closest and additionally we must *hope* that the parent is incapable of
> >> >> providing the rate that we asked for.
> >> >
> >> > I mean... yeah. It's what abstraction is all about. For all we know, the
> >> > parent to pll-mipi could be a crystal that can't change its frequency
> >> > and we should deal with that. Or it could be an ideal clock that always
> >> > returns the rate you ask for. Or a firmware clock that behaves like an
> >> > ideal clock but lies about it :)
> >> >
> >> > It's that clock responsibility to do its best to provide the rate we ask
> >> > for.
> >> >
> >> > And if we need to make it behave better, then it's fine too. So your
> >> > example is indeed true, but it's more of a case of "let's send another
> >> > patch" rather than trying to figure out all possible cases and try to
> >> > figure things out accordingly. Because you won't be able to figure out
> >> > all possible cases for the current SoCs and the next ones, and the
> >> > workloads that people are going to run on those SoCs anyway.
> >> >
> >> >> >> If you carefully look at ccu_mp, you will see that it would ignore
> >> >> >> cases when its parent had rounded up. ccu_nkm is no different.
> >> >> >> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
> >> >> >> totally different beast. For now, sunxi-ng always expects rounding
> >> >> >> down.
> >> >> >
> >> >> > Then change that?
> >> >>
> >> >> You told me that both over- and undershooting are fine when
> >> >> determining the rate, *but also* "it's a bit context specific which one
> >> >> we should favour. If we were to do anything, it would be to support both
> >> >> and let the clock driver select which behaviour it wants." (see
> >> >> https://lore.kernel.org/all/flngzi4henkzcpzwdexencdkw77h52g3nduup7pwctpwfiuznk@eewnnut5mvsq/)
> >> >>
> >> >> So, I can't just change NKM's parent's default behavior (which is an NM
> >> >> clock in my case), because, if I understand correctly, I would have to
> >> >> introduce a "ROUND_CLOSEST" flag for NM clocks.
> >> >
> >> > Sure
> >> >
> >> >> But then I feel like I would have to document somewhere that when
> >> >> setting CLK_SET_RATE_PARENT for an NKM clock, that the parent clock
> >> >> needs to ROUND_CLOSEST, in order to avoid drifting away from the
> >> >> requested rate in the successive calls that are made to
> >> >> ccu_nkm_determine_rate(), which I tried to explain above and in previous
> >> >> messages.
> >> >
> >> > That's kind of what I meant too. Whether "drifting away" is an issue is
> >> > context specific too. for some clocks it just doesn't matter. Nobody
> >> > ever complained that the register clock of the MMC controller was
> >> > drifting away, because it doesn't affect the system in the slightest.
> >> >
> >> > The video clock tree (and possibly others) will be affected though, and
> >> > we'll indeed need to add that flag. But we're doing it all the time (and
> >> > sometimes get it wrong) for things like which clocks should be left
> >> > enabled for example.
> >>
> >> I'm afraid we have to re-visit this decision. I found a case, where the
> >> drifting causes a problem.
> >
> > I'm sure it can cause a lot of issues everywhere. My point was that the
> > solution is to add the flag so the issue goes away, and not to try to
> > workaround a driver that might or not have the flag. We should assume
> > it's properly set, and properly set it.
>
> I want you to be aware that this might result in a situation that will
> waste many hours of development time for people trying to use the
> SET_PARENT_RATE flag on NKM clocks in their SoCs. Because it has
> surprising side effects that I lay out below.
>
> How do we tell developers that they *must* use CCU_ROUND closest on the
> whole clk branch starting at whe nkm clock's parent if they want to use
> SET_PARENT_RATE on a NKM clock.

That's exagerating the issue a bit. They will get from their parent the
lower, closest, rate than they ask for that it can meet. This is what
the framework has always guaranteed.

You want to change that, fine. But now of course the parent might have
to overshoot its target to meet that requirement. So the parent driver
needs to be modified as such. That's very much the expectation.

> They must do it from the beginning or they will need to start chasing
> errors. This could easily prevented by applying this patch [1].
>
> Because, if they don't the following will happen:
>
> >> Setting pll-mipi's SET_PARENT_RATE flag, but not setting the tree's
> >> CCU_FEATURE_CLOSEST_RATE flag results in the following tree:
> >>
> >> clock rate
> >> -----------------------------------
> >> pll-video0 201000000
> >> hdmi-phy-clk 50250000
> >> hdmi 201000000
> >> tcon1 201000000
> >> pll-mipi 414562500
> >> tcon0 414562500
> >> tcon-data-clock 138187500
> >>
> >> Note, that tcon-data-clock's rate is garbage. It should be tcon0/4, but
> >> it is tcon0/3.
>
> ^
> |
> Now, *that* is surprising, isn't it? Well at least for me it was. I set
> the SET_PARENT_RATE on pll-mipi and suddenly tcon-data-clock is garbage.
> How can that be?
>
> Ok, let's start debugging:
>
> >>
> >> I added some logging to ccu_find_best*() to understand, as to why that
> >> is:
> >>
> >> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
> >> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
> >> ccu_nkm_find_best_with_parent_adj: rate=414577776, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
> >> ccu_nkm_find_best_with_parent_adj: rate=414562500, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
> >> ccu_nkm_find_best: rate=414562500, best_rate=414562500, parent_rate=201000000, n=11, k=3, m=16
> >>
> >> We can see that the rate is drifting over the successive calls. We've
> >> seen it before and deemed it no big deal.
> >>
> >> To highlight the issue a bit more, I added some logging at the end of
> >> sun4i_dclk_round_rate() and sun4i_dclk_set_rate.
> >>
> >> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
> >> sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4
> >> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
> >> sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4
> >>
> >> Here we can see that sun4i_dclk now has determined that 103644444 is its
> >> best rate, based on the parent rate of 414577776.
> >>
> >> But now, the nkm clock pll-mipi changes its mind and thinks that it
> >> cannot provide 414577776 any more, instead it wants to provide
> >> 414562500.
> >
> > That's a bit surprising, but not entirely.
>
> The drifting itself is not suprising at all to me. It is the topic of
> this whole mail thread. :-)
>
> > For example, one of the
> > parent clock of our parent might have changed rate between our
> > round_rate and set_rate calls.
>
> That's not the reason here. The reason is that I added the
> SET_PARENT_RATE flag, but did not add CCU_FEATURE_ROUND_CLOSEST to
> pll-mipi's parent and all its descendants.
>
> *I* know, that I shouldn't do that. But will Jane and Joe SoC-Developer?

How many Jane and Joe SoC-developer are writing a clock driver without
any knowledge of how the clock framework operates?

> How do we tell them? Will the necessary documentation really be easier
> to maintain than the 24 line patch [1] I submitted?

We have different perspectives on the issue. Yours is that it should be
absolutely correct even if it isn't. I get where you're coming from.

The thing is, from mine, yes, it's going to be much easier to maintain.
Those 24 lines of code are fairly intricate, I had to exchange literally
dozen of mails to get what it was about. Do you really expect us + Jane
and Joe kernel developer to do as much when they read and try to
understand that code? Let alone review another patch that would affect
that code.

And for all I know, you might be gone as soon as that patch is merged so
we wouldn't get your help either.

FWIW, I'm open to adding that flag everywhere it's needed, not just on
the A64.

> Again, without the patch NKM clocks are making assumptions about the
> parent clock. They now depend on:
> a. The parent clock rounding to the closest requested rate.

It doesn't depend on anything, really. It depends on the parent to
provide a rate as close as possible to what was requested, but that was
always the case.

> b. The parent clock not supporting the rate the NKM requests.

I'm not sure what you mean by that.

> If either of the two is not true, it will break tcon-data-clock.
>
> > Why does it change its mind?
>
> Because the parent does not round to the closest rate.

And it doesn't round to the closest rate because it's not expected to,
or asked to.

> This is what happens, if the parent does not fulfill the two
> requirements:
>
> 1. tcon0 (or someone else, this is just an example) asks the NKM clock
> (pll-mipi) for rate A (e.g. 414600000).
> 2. The NKM clock tries out all combinations of N, K, and M and each
> time asks the parent for a rate, that is close to the "optimal"
> parent rate.
> 3. At some point the NKM clock asks its parent for rate B (282681818).
> 4. The parent will respond that it doesn't support rate B, but rate B'
> (282666666) could be used.
> 5. Using that information, NKM tells the clk framework that it can't
> provide rate A (414600000), but it could provide A' (414577776).
> 6. The clk framework tells tcon0 that. tcon0 agrees and says: "Fine,
> clk framework, let's go with rate A'."
> 7. clk framework asks the NKM clock, for rate A' (414577776).
> 8. Like in step 2 the NKM goes through all combinations and asks the
> parent for a rate that is optimal to the "optimal" parent rate.
> 9. Without the patch [1] at some point the NKM clock will ask the
> parent for B'' (282666665) (!) due to integer rounding.
> 11. The parent will respond that it doesn't support rate B'', but rate
> B''' could be used. (I don't know what rate that is, but it's now
> so bad that the previous combination of N, K, and M is no longer
> the best combination.)
> 12. Using that information, NKM tells the clk framework that it can't
> provide rate A' (414577776), but it could provide A'' (414562500).
> That's what I meant, that the nkm clock "changed its mind".
>
> Actually, steps 7-12 are performed a few times and each time the rate
> could get a little bit worse. Not by much, so you said it's not worth
> adding patch [1], because "there's zero benefit from it". [2]
>
> The new information I'm trying to convey here, is that tcon0 has based
> its round_rate result on A' because the clk framework told it so.
> Apparently, the clk framework expects that when a clock claims it can
> provide rate A' when asked for rate A, that it will also respond with A'
> when asked for A'.

Yeah, that sounds reasonable. I'm afraid tackling this for all our
clocks would be a major overhaul so I would consider this to be out of
scope if we ever want to get this series merged.

> I don't think that's an unreasonable requirement for clocks [footnote
> A]. But the NKM clock does not always fulfill it, if it can set its
> parent rate but doesn't have patch [1].
>
> This is not some kind of race condition as you described above. It
> happens 100% of the time when a clock (tcon0) asks an nkm clock
> (pll-mipi) for an "unfortunate" rate and the nkm's parent (pll-video0)
> does not support rounding up.

Which, again, has a super simple fix now.

> "Unfortunate" is a rate where the best parent rate can not be calculated
> by the simple formula:
> rate * m / (n * k)
>
> You wrote in [2]:
> > We're back to the trade-off I was mentioning earlier. I'm not against
> > it on principle. However, if it's not absolutely required, then I
> > don't think it's a good idea to merge it.
> >
> > Especially if it's to workaround a parent flag missing. A clock flag
> > patch is easy to read, write, understand, review, merge and maintain.
> > It's basically a nop to merge provided the commit log is decent
> > enough.
>
> I'm at a point where I have to disagree that a clock flag patch is easy
> to write and understand. You asked me, why the clock is changing its
> mind and we had already discussed it a lot. I don't see why this
> discussion would be different in 6 months or 2 years. IMO the discussion
> we're having is not a "nop". And for a developer to find out that they
> need to set the parent's ROUND_CLOSEST flag is not a "nop" either.

You're talking about identifying a given problem. I was talking about
finding, writing, merging and maintaining a solution for it. I know all
too well how difficult identifying clock issues can be.

> The patch [1] makes the life of SoC developers easier

Unless they have to debug or modify it.

> while it makes the nkm clock maintainer's life harder.

Let's not create any undue opposition, there's no need to. Developers
end up doing some part of the maintenance when they identify and fix
bugs and maintainers are usually the most active developers. It's not
like either are in some kind of ivory tower.

> So the question is, how much easier is the developer's life (a lot,
> IMO) and how much harder is the maintainer's life (a little, IMO).

We strongly disagree on that.

> In conclusion, for me applying patch [1] is the best option we have. It
> not only prevents but also documents the issue in a clean way.

And I disagree with that too. It might be for you, but all that is subjective.

Maxime


Attachments:
(No filename) (24.84 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-30 17:09:03

by Frank Oltmanns

[permalink] [raw]
Subject: Re: [PATCH 1/2] clk: sunxi-ng: nkm: consider alternative parent rates when finding rate

Hi Maxime,

First of all, I appreciate that you take time out of your schedule to
get to the bottom of this.

I understand that you want to go without the patch and that's okay. I
just don't want to trick you into accepting the patchset (minus the
patch under discussion [1]) without fully grasping the consequences.

On 2023-07-25 at 15:55:39 +0200, Maxime Ripard <[email protected]> wrote:
> [[PGP Signed Part:Undecided]]
> On Sun, Jul 23, 2023 at 10:59:03AM +0200, Frank Oltmanns wrote:
>> Hi Maxime,
>>
>> On 2023-07-17 at 16:06:02 +0200, Maxime Ripard <[email protected]> wrote:
>> > [[PGP Signed Part:Undecided]]
>> > On Wed, Jul 12, 2023 at 06:39:56AM +0200, Frank Oltmanns wrote:
>> >> Hi Maxime,
>> >>
>> >> On 2023-06-19 at 20:05:44 +0200, Maxime Ripard <[email protected]> wrote:
>> >> > [[PGP Signed Part:Undecided]]
>> >> > On Mon, Jun 19, 2023 at 10:16:26AM +0200, Frank Oltmanns wrote:
>> >> >> Hi Maxime,
>> >> >>
>> >> >> the essence of my following ramblings:
>> >> >> - I do think it is reasonable that nkm is asking its parent for the
>> >> >> rate that nkm actually needs from said parent to fulfill the request.
>> >> >> - I don't think nkm should make assumptions about the rounding
>> >> >> behaviour of the parent.
>> >> >
>> >> > I guess we agree :)
>> >> >
>> >> > And I would go even further and say that we shouldn't make *any*
>> >> > assumption about the behaviour of the parent.

Just to make sure that we are on the same page: This is no longer a
requirement, correct?

My understanding is that you accept that pll-mipi (an nkm clock) relies
on the fact that the parent clock rounds up if pll-mipi requests a rate
that is 1 Hz lower than what pll-mipi actually needs. The patch we're
discussing [1] makes sure, that pll-mipi (or any other nkm clock) does
not have to "make any assumption about the behaviour of the parent."

As this apparently is no longer a requirement, I can (having protested
enough) remove the patch [1] from my series. I'd prefer not to, but I
will do it, if you ask me to.

>> >> >> The reason is, that I want to prevent users of ccu_nkm from making
>> >> >> mistakes when defining their clocks (which includes the parent of their
>> >> >> nkm clock).
>> >> >>
>> >> >> Please read below the details on why I think that.
>> >> >>
>> >> >> [...]
>> >> >>
>> >> >> >> No. I didn't. My assumption is: If ccu_nkm_find_best is asked for the
>> >> >> >> best rate for rate = 449035712, it should try to include 449035712 in
>> >> >> >> its candidates, agreed?
>> >> >> >>
>> >> >> >> Example 1:
>> >> >> >> ==========
>> >> >> >> rate=449035712, n=11, k=3, m=16:
>> >> >> >> We should as for a parent rate of 217714285, because:
>> >> >> >> 217714285 * 11 * 3 / 16 = 449035712
>> >> >> >>
>> >> >> >> How do we get from 449035712 to 217714285, you ask?
>> >> >> >>
>> >> >> >> DIV_ROUND_UP(rate * m, n * k)
>> >> >> >
>> >> >> > Why are we rounding up? I don't think the hardware will round up there.
>> >> >>
>> >> >> Being a "software guy" it is also my understanding that the hardware
>> >> >> does not round up here (or round down for that matter).
>> >> >
>> >> > That's my understanding as well.
>> >> >
>> >> >> But anyway, my concern is the rate's representation *in software*. The
>> >> >> clk drivers are using unsigned long to represent the actual rate. This
>> >> >> is not a lossless representation. In other words, the value (i.e. the
>> >> >> software representation) of that rate is, of course, a "lie". The
>> >> >> hardware clock is running at some rate that is hopefully close to what
>> >> >> we represent in software, but still it's an abstraction.
>> >> >>
>> >> >> For example, the user (e.g. in my example a panel) asks us for a rate
>> >> >> that is represented in softwares as 449035712. Given the values n=11,
>> >> >> k=3, m=16, we can *only* represent this value in software if the parent
>> >> >> gives us a rate that is represented in software as 217714285. Therefore,
>> >> >> I think it is reasonable to ask the parent for that rate (217714285).
>> >> >
>> >> > I somewhat agree, but I still don't think it's worth rounding up.
>> >> >
>> >> > If we don't round up (and assuming the parent itself won't round the
>> >> > clock), we end up with a rate of 449035710 using the dividers you
>> >> > mentioned. It's a .0000005% deviation (I hope I didn't screw up the
>> >> > number of 0s). It's negligible for all practical purposes, and it's not
>> >> > worth making the code inconsistent and eyebrow raising.
>> >> >
>> >> >> >> Do you agree that we should ask the parent for 217714285 in case we want
>> >> >> >> a rate of 449035712 and we're currently evaluating the case n=11, k=3,
>> >> >> >> m=16?
>> >> >> >>
>> >> >> >> We should not ask for a parent rate of 217714284, because:
>> >> >> >> 217714284 * 11 * 3 / 16 = 449035710
>> >> >> >>
>> >> >> >> Example 2:
>> >> >> >> ==========
>> >> >> >> rate=500000000, n=11, k=3, m=16:
>> >> >> >> Here we should not ask the parent for
>> >> >> >> DIV_ROUND_UP(rate * m, n * k)
>> >> >> >> because that would be 242424243.
>> >> >> >>
>> >> >> >> 242424243 * 11 * 3 / 16 = 500000001
>> >> >> >>
>> >> >> >> We (the NKM clock, not the parent!) would overshoot (please see at the
>> >> >> >> end of this mail, why (for now) I don't want to support overshooting in
>> >> >> >> the NKM clock).
>> >> >> >>
>> >> >> >> Instead we should as for a parent rate of 242424242, because:
>> >> >> >> 242424242 * 11 * 3 / 16 = 499999999
>> >> >> >>
>> >> >> >> In conclusion, there are cases, where we (the NKM clock) have to ask the
>> >> >> >> parent for
>> >> >> >> DIV_ROUND_UP(rate * m, n * k)
>> >> >> >> And there are also cases, where we have to ask the parent for
>> >> >> >> rate * m / (n * k)
>> >> >> >
>> >> >> > I mean, I think you're overthinking this.
>> >> >> >
>> >> >> > If you never round up and mimic how the hardware behaves, and test all
>> >> >> > combination, then eventually you'll find the closest rate.
>> >> >> >
>> >> >> > If you don't because the parent doesn't look for the closest rate, then
>> >> >> > the parent should be changed too.
>> >> >> >
>> >> >> > It really is that simple.
>> >> >> >
>> >> >> >> This is what the code is trying to do. Maybe it's easier to look at V2
>> >> >> >> because I extracted the calcultion of the optimal parent rate into a
>> >> >> >> separate function hoping that this makes things clearer.
>> >> >> >>
>> >> >> >> Let me stress this: When calculating the optimal rate for the parent,
>> >> >> >> I'm not making any assumptions here about how the PARENT clock rounds.
>> >> >> >> In fact, I assume that the parent could be perfect and always provides
>> >> >> >> the rate it is asked for. I only take into account how the nkm clock
>> >> >> >> rounds.
>> >> >> >
>> >> >> > At the very least, you assume that the parent rounding can be "wrong"
>> >> >> > and try to work around that.
>> >> >>
>> >> >> No. I'm not assuming anything about the parent. But I *know* that if we
>> >> >> (nkm) want to get a rate that is represented in softwares as 449035712
>> >> >> and given the values n=11, k=3, m=16, we (nkm) must get the rate from
>> >> >> the parent that the parent represents in software as 217714285, because
>> >> >> I know that we (nkm) calculate *our* (nkm) rate using
>> >> >> parent * n * k / m
>> >> >>
>> >> >> So if (!) we want to give the user the rate that they ask for, why not
>> >> >> ask the parent for the rate that we need (217714285)?
>> >> >>
>> >> >> I admit that I'm making assumptions here. My assumptions are that we
>> >> >> a. want to at least try to give the user what they asked for
>> >> >> b. without making assumptions about the parent's behaviour.
>> >> >>
>> >> >> Those assumptions could of course be wrong, but, honestly, I would find
>> >> >> that confusing.
>> >> >
>> >> > I guess my point leans more towards the "social" side than the
>> >> > mathematical one. If I followed you so far, the precision you expect to
>> >> > gain is in the <1Hz range (and I've been in sick leave for a while, so
>> >> > sorry if I didn't before). The rate is in the 100MHz range.
>> >> >
>> >> > So the precision gain is pretty much nothing. Sure, it's closer from a
>> >> > mathematical standpoint. But there's zero benefit from it.
>> >> >
>> >> > However, it comes at the cost of a code that is definitely more
>> >> > complicated (or less naive, depending on how you look at it I guess :))
>> >> > and will be harder to figure out for someone that jumps into the driver.
>> >> >
>> >> > So the trade-off doesn't really make fixing it worth it to me.
>> >> >
>> >> >> >> > you ask the parent to compute whatever is closest to that optimal parent
>> >> >> >> > rate.
>> >> >> >> >
>> >> >> >> > It's the parent responsibility now. It's the parent decision to figure
>> >> >> >> > out what "the closest" means, if it can change rate, if it has any range
>> >> >> >> > limitation, etc. You can't work around that.
>> >> >> >> >
>> >> >> >> > What you actually want there is the parent to actually provide the
>> >> >> >> > closest rate, even if it means overshooting.
>> >> >> >> >
>> >> >> >>
>> >> >> >> I want to ask the parent for a rate, that would actually result in the
>> >> >> >> rate that nkm_find_best was asked for. Are you asking me to instead ask
>> >> >> >> the parent for a rate that doesn't fit that criterion?
>> >> >> >
>> >> >> > No. I'm asking to call clk_hw_round_rate(parent_hw, rate * m / (n * k))
>> >> >> > and use whatever value it returned.
>> >> >> >
>> >> >> > If it requires changing the parent clock to improve its round_rate
>> >> >> > behaviour, then do that too.
>> >> >> >
>> >> >>
>> >> >> Hmmm... Okay. So you *are* saying, that I should make changes to the
>> >> >> parent so that we do not need to request the exact rate we want from the
>> >> >> parent. But I really don't understand why.
>> >> >
>> >> > No, sorry. I initially thought that you were working around "divider"
>> >> > rounding issue (as opposed to integer like you mentionned above) with
>> >> > the parent not providing its optimal rate, and you adjusting based on
>> >> > that offset.
>> >> >
>> >> >> As I wrote above, I'm not making any assumptions of how and if the
>> >> >> parent rounds. My code is rounding *prior* to asking the parent. Your
>> >> >> proposal on the other hand *requires* changing the parent to round
>> >> >> closest where mine does not.
>> >> >>
>> >> >> My concern is, that we could then end up with the situation that someone
>> >> >> defines an nkm clock in their SoC which has CLK_SET_RATE_PARENT set, but
>> >> >> does not set the ROUND_CLOSEST flag on the parent, because it's not
>> >> >> immediately apparent why they should do that.
>> >> >
>> >> > It's going to happen, and probably happens at the moment already,
>> >> > because not only the NKM clocks are affected, but virtually all of them,
>> >> > and most don't use ROUND_CLOSEST.
>> >> >
>> >> > And to some extent, it's fine. We would handle it like any other bug: if
>> >> > we ever encounter one, we'll write a fix, backport it to stable and all
>> >> > will be fine.
>> >> >
>> >> > You can't figure out all the use-cases we'll require in the future
>> >> > anyway.
>> >> >
>> >> >> Let's assume that hypothetical board were the A64, the nkm clock were pll-mipi,
>> >> >> and the parent were pll-video0 and we "forget" to set ROUND_CLOSEST on
>> >> >> pll-video0:
>> >> >>
>> >> >> When pll-mipi nkm clock is asked via determine_rate() for a rate of
>> >> >> 449064000 it would return 449035712 and a parent rate of 217714285
>> >> >> (using n=11, k=3, m=16, but those details aren't returned by
>> >> >> determine_rate()).
>> >> >>
>> >> >> Eventually, determine_rate() will be called again, but this time for a
>> >> >> rate of 449035712. The user already knows that we can provide that,
>> >> >> because we told them (see previous paragraph). But since we're
>> >> >> truncating when calculating the rate that we'd like the parent to
>> >> >> provide, we end up asking the parent for 217714284 when we actually need
>> >> >> it to provide 217714285. So we now *require* the parent to find the
>> >> >> closest and additionally we must *hope* that the parent is incapable of
>> >> >> providing the rate that we asked for.
>> >> >
>> >> > I mean... yeah. It's what abstraction is all about. For all we know, the
>> >> > parent to pll-mipi could be a crystal that can't change its frequency
>> >> > and we should deal with that. Or it could be an ideal clock that always
>> >> > returns the rate you ask for. Or a firmware clock that behaves like an
>> >> > ideal clock but lies about it :)
>> >> >
>> >> > It's that clock responsibility to do its best to provide the rate we ask
>> >> > for.
>> >> >
>> >> > And if we need to make it behave better, then it's fine too. So your
>> >> > example is indeed true, but it's more of a case of "let's send another
>> >> > patch" rather than trying to figure out all possible cases and try to
>> >> > figure things out accordingly. Because you won't be able to figure out
>> >> > all possible cases for the current SoCs and the next ones, and the
>> >> > workloads that people are going to run on those SoCs anyway.
>> >> >
>> >> >> >> If you carefully look at ccu_mp, you will see that it would ignore
>> >> >> >> cases when its parent had rounded up. ccu_nkm is no different.
>> >> >> >> Teaching all of sunxi-ng's clocks to respect ROUND_CLOSEST is a
>> >> >> >> totally different beast. For now, sunxi-ng always expects rounding
>> >> >> >> down.
>> >> >> >
>> >> >> > Then change that?
>> >> >>
>> >> >> You told me that both over- and undershooting are fine when
>> >> >> determining the rate, *but also* "it's a bit context specific which one
>> >> >> we should favour. If we were to do anything, it would be to support both
>> >> >> and let the clock driver select which behaviour it wants." (see
>> >> >> https://lore.kernel.org/all/flngzi4henkzcpzwdexencdkw77h52g3nduup7pwctpwfiuznk@eewnnut5mvsq/)
>> >> >>
>> >> >> So, I can't just change NKM's parent's default behavior (which is an NM
>> >> >> clock in my case), because, if I understand correctly, I would have to
>> >> >> introduce a "ROUND_CLOSEST" flag for NM clocks.
>> >> >
>> >> > Sure
>> >> >
>> >> >> But then I feel like I would have to document somewhere that when
>> >> >> setting CLK_SET_RATE_PARENT for an NKM clock, that the parent clock
>> >> >> needs to ROUND_CLOSEST, in order to avoid drifting away from the
>> >> >> requested rate in the successive calls that are made to
>> >> >> ccu_nkm_determine_rate(), which I tried to explain above and in previous
>> >> >> messages.
>> >> >
>> >> > That's kind of what I meant too. Whether "drifting away" is an issue is
>> >> > context specific too. for some clocks it just doesn't matter. Nobody
>> >> > ever complained that the register clock of the MMC controller was
>> >> > drifting away, because it doesn't affect the system in the slightest.
>> >> >
>> >> > The video clock tree (and possibly others) will be affected though, and
>> >> > we'll indeed need to add that flag. But we're doing it all the time (and
>> >> > sometimes get it wrong) for things like which clocks should be left
>> >> > enabled for example.
>> >>
>> >> I'm afraid we have to re-visit this decision. I found a case, where the
>> >> drifting causes a problem.
>> >
>> > I'm sure it can cause a lot of issues everywhere. My point was that the
>> > solution is to add the flag so the issue goes away, and not to try to
>> > workaround a driver that might or not have the flag. We should assume
>> > it's properly set, and properly set it.
>>
>> I want you to be aware that this might result in a situation that will
>> waste many hours of development time for people trying to use the
>> SET_PARENT_RATE flag on NKM clocks in their SoCs. Because it has
>> surprising side effects that I lay out below.
>>
>> How do we tell developers that they *must* use CCU_ROUND closest on the
>> whole clk branch starting at whe nkm clock's parent if they want to use
>> SET_PARENT_RATE on a NKM clock.
>
> That's exagerating the issue a bit.

Where exactly do I exagerate? I don't see it. Don't you think it could
take hours?

> They will get from their parent the
> lower, closest, rate than they ask for that it can meet. This is what
> the framework has always guaranteed.

Yes, of course. But the consequences that come from the lower rate are
new. And really, *I* wouldn't expect a clock to ask its parent for a
rate that in reality doesn't want. That's what I find surprising. But
you already said you are fine with that. So I let it go, because it
meant that the rate of the nkm clock (and in consequence its children)
is just a little bit lower than expected. It's likely that no developer
(and certainly no end-user) would notice, so they might not chase a bug,
because they didn't get the exact rate they were expecting.

But then I discovered that the rate of tcon-data-clock is off by 33%. I
just wanted to let you know that it's no longer the tiny percentage that
we discussed previously. So people will notice and start (IMO
unnecessarily) chasing bugs. I'm still convinced that the 24 lines will
make those people's lifes easier.

But I accept that you don't see it that way.

> You want to change that, fine.

I want a good refresh rate for my pinephone's panel without relying on
third party patches. :) Therefore, I asked if it was okay to set the
rate of pll-video0 to 297MHz.

This was refused by Jernej who said it would be better if the rate
selection was immune to initial values [5]. My conclusion was that the
nkm clock should set its parent's rate. So, here we are.

> But now of course the parent might have
> to overshoot its target to meet that requirement. So the parent driver
> needs to be modified as such. That's very much the expectation.

Ok, that is of course your decision, not mine. I may not like it, but I
can certainly accept it.

>> They must do it from the beginning or they will need to start chasing
>> errors. This could easily prevented by applying this patch [1].
>>
>> Because, if they don't the following will happen:
>>
>> >> Setting pll-mipi's SET_PARENT_RATE flag, but not setting the tree's
>> >> CCU_FEATURE_CLOSEST_RATE flag results in the following tree:
>> >>
>> >> clock rate
>> >> -----------------------------------
>> >> pll-video0 201000000
>> >> hdmi-phy-clk 50250000
>> >> hdmi 201000000
>> >> tcon1 201000000
>> >> pll-mipi 414562500
>> >> tcon0 414562500
>> >> tcon-data-clock 138187500
>> >>
>> >> Note, that tcon-data-clock's rate is garbage. It should be tcon0/4, but
>> >> it is tcon0/3.
>>
>> ^
>> |
>> Now, *that* is surprising, isn't it? Well at least for me it was. I set
>> the SET_PARENT_RATE on pll-mipi and suddenly tcon-data-clock is garbage.
>> How can that be?
>>
>> Ok, let's start debugging:
>>
>> >>
>> >> I added some logging to ccu_find_best*() to understand, as to why that
>> >> is:
>> >>
>> >> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
>> >> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
>> >> ccu_nkm_find_best_with_parent_adj: rate=414577776, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
>> >> ccu_nkm_find_best_with_parent_adj: rate=414562500, best_rate=414562500, best_parent_rate=201000000, n=11, k=3, m=16
>> >> ccu_nkm_find_best: rate=414562500, best_rate=414562500, parent_rate=201000000, n=11, k=3, m=16
>> >>
>> >> We can see that the rate is drifting over the successive calls. We've
>> >> seen it before and deemed it no big deal.
>> >>
>> >> To highlight the issue a bit more, I added some logging at the end of
>> >> sun4i_dclk_round_rate() and sun4i_dclk_set_rate.
>> >>
>> >> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
>> >> sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4
>> >> ccu_nkm_find_best_with_parent_adj: rate=414600000, best_rate=414577776, best_parent_rate=282666666, n=11, k=2, m=15
>> >> sun4i_dclk_round_rate: rate=103650000, best_rate=103644444, best_parent=414577776, best_div=4
>> >>
>> >> Here we can see that sun4i_dclk now has determined that 103644444 is its
>> >> best rate, based on the parent rate of 414577776.
>> >>
>> >> But now, the nkm clock pll-mipi changes its mind and thinks that it
>> >> cannot provide 414577776 any more, instead it wants to provide
>> >> 414562500.
>> >
>> > That's a bit surprising, but not entirely.
>>
>> The drifting itself is not suprising at all to me. It is the topic of
>> this whole mail thread. :-)
>>
>> > For example, one of the
>> > parent clock of our parent might have changed rate between our
>> > round_rate and set_rate calls.
>>
>> That's not the reason here. The reason is that I added the
>> SET_PARENT_RATE flag, but did not add CCU_FEATURE_ROUND_CLOSEST to
>> pll-mipi's parent and all its descendants.
>>
>> *I* know, that I shouldn't do that. But will Jane and Joe SoC-Developer?
>
> How many Jane and Joe SoC-developer are writing a clock driver without
> any knowledge of how the clock framework operates?
>
>> How do we tell them? Will the necessary documentation really be easier
>> to maintain than the 24 line patch [1] I submitted?
>
> We have different perspectives on the issue. Yours is that it should be
> absolutely correct even if it isn't. I get where you're coming from.
>
> The thing is, from mine, yes, it's going to be much easier to maintain.
> Those 24 lines of code are fairly intricate, I had to exchange literally
> dozen of mails to get what it was about.

Yes, that's on me and I'm really sorry. But I do think, it's properly
documented now and has a good commit message as well. But ok, let's drop
it and instead document the fact that an nkm clock needs a parent that
is able to round up.

> Do you really expect us + Jane
> and Joe kernel developer to do as much when they read and try to
> understand that code? Let alone review another patch that would affect
> that code.
>
> And for all I know, you might be gone as soon as that patch is merged so
> we wouldn't get your help either.

Fair point. I hereby offer that you may include me in discussions about
all code in my patch series.

> FWIW, I'm open to adding that flag everywhere it's needed, not just on
> the A64.

Cool! I don't know where it's needed, as I don't have any other sunxi
boards, but still cool. :)

>> Again, without the patch NKM clocks are making assumptions about the
>> parent clock. They now depend on:
>> a. The parent clock rounding to the closest requested rate.
>
> It doesn't depend on anything, really. It depends on the parent to
> provide a rate as close as possible to what was requested, but that was
> always the case.

No. As described below: If the panel requests 103650000 from tcon0, it
will get 138187500 if pll-video0 does not have the
CCU_FEATURE_CLOSEST_RATE flag set. Before my patch series that was not
the case. Therefore it now depends on the parent rounding to the closest
rate.

>> b. The parent clock not supporting the rate the NKM requests.
>
> I'm not sure what you mean by that.

In the example below in step 9 the nkm will ask the parent for 282666665
Hz even though the nkm clock needs 282666666 Hz from its parent. So,
without the patch the nkm clock depends on the fact the parent will
round to 282666666 and not provide 282666665 Hz. That's what I mean when
I wrote that the nkm clock depends on "the parent clock not supporting
the rate the NKM requests". If the parent were an ideal clock that
always provides the rate that is requested, it breaks the nkm clock's
children (tcon-data-clock in case of the A64).

>> If either of the two is not true, it will break tcon-data-clock.
>>
>> > Why does it change its mind?
>>
>> Because the parent does not round to the closest rate.
>
> And it doesn't round to the closest rate because it's not expected to,
> or asked to.

No, the nkm clock does in fact *expect* the parent to round to the
closest rate. But if the parent is not *asked* to, it will break
tcon-data-clock.

>> This is what happens, if the parent does not fulfill the two
>> requirements:
>>
>> 1. tcon0 (or someone else, this is just an example) asks the NKM clock
>> (pll-mipi) for rate A (e.g. 414600000).
>> 2. The NKM clock tries out all combinations of N, K, and M and each
>> time asks the parent for a rate, that is close to the "optimal"
>> parent rate.
>> 3. At some point the NKM clock asks its parent for rate B (282681818).
>> 4. The parent will respond that it doesn't support rate B, but rate B'
>> (282666666) could be used.
>> 5. Using that information, NKM tells the clk framework that it can't
>> provide rate A (414600000), but it could provide A' (414577776).
>> 6. The clk framework tells tcon0 that. tcon0 agrees and says: "Fine,
>> clk framework, let's go with rate A'."
>> 7. clk framework asks the NKM clock, for rate A' (414577776).
>> 8. Like in step 2 the NKM goes through all combinations and asks the
>> parent for a rate that is optimal to the "optimal" parent rate.
>> 9. Without the patch [1] at some point the NKM clock will ask the
>> parent for B'' (282666665) (!) due to integer rounding.
>> 11. The parent will respond that it doesn't support rate B'', but rate
>> B''' could be used. (I don't know what rate that is, but it's now
>> so bad that the previous combination of N, K, and M is no longer
>> the best combination.)
>> 12. Using that information, NKM tells the clk framework that it can't
>> provide rate A' (414577776), but it could provide A'' (414562500).
>> That's what I meant, that the nkm clock "changed its mind".
>>
>> Actually, steps 7-12 are performed a few times and each time the rate
>> could get a little bit worse. Not by much, so you said it's not worth
>> adding patch [1], because "there's zero benefit from it". [2]
>>
>> The new information I'm trying to convey here, is that tcon0 has based
>> its round_rate result on A' because the clk framework told it so.
>> Apparently, the clk framework expects that when a clock claims it can
>> provide rate A' when asked for rate A, that it will also respond with A'
>> when asked for A'.
>
> Yeah, that sounds reasonable. I'm afraid tackling this for all our
> clocks would be a major overhaul so I would consider this to be out of
> scope if we ever want to get this series merged.

Okay, I wasn't aware of that. I'm fairly certain that at least the nkm
clock provides rate A' when asked for A' when the SET_PARENT_RATE flag
is not set. So, this behavior is something that is introduced by my
patch series (if you don't take PATCH 4 [1]).

I've submitted a patch [4] with integration tests that tests all
possible exact rates of an nkm clock with a parent that runs at 192MHz
(just as an example). This is a patch on top of sunxi/for-next, i.e. it
does not contain the patch series we're currently discussing [3]. The
nkm clock consistently provides the requested rate if the requested rate
is something that the parent can provide. That means, up until today's
sunxi/for-next the nkm clock fulfills the reasonable requirenment.

In conclusion, the nkm clock becomes a bad player within the clk
framework if the series [3] is applied and [4] is dropped in cases where
the nkm clock's parent does not have the CCU_FEATURE_CLOSEST_RATE set.
This should be documented somewhere.

>> I don't think that's an unreasonable requirement for clocks [footnote
>> A]. But the NKM clock does not always fulfill it, if it can set its
>> parent rate but doesn't have patch [1].
>>
>> This is not some kind of race condition as you described above. It
>> happens 100% of the time when a clock (tcon0) asks an nkm clock
>> (pll-mipi) for an "unfortunate" rate and the nkm's parent (pll-video0)
>> does not support rounding up.
>
> Which, again, has a super simple fix now.

Sure! The fix is to set CCU_FEATURE_CLOSEST_RATE on the parent, right?
But we can't say that the SET_PARENT_FLAG for nkm clocks works without
the parent having CCU_FEATURE_CLOSEST_RATE.

>> "Unfortunate" is a rate where the best parent rate can not be calculated
>> by the simple formula:
>> rate * m / (n * k)
>>
>> You wrote in [2]:
>> > We're back to the trade-off I was mentioning earlier. I'm not against
>> > it on principle. However, if it's not absolutely required, then I
>> > don't think it's a good idea to merge it.
>> >
>> > Especially if it's to workaround a parent flag missing. A clock flag
>> > patch is easy to read, write, understand, review, merge and maintain.
>> > It's basically a nop to merge provided the commit log is decent
>> > enough.
>>
>> I'm at a point where I have to disagree that a clock flag patch is easy
>> to write and understand. You asked me, why the clock is changing its
>> mind and we had already discussed it a lot. I don't see why this
>> discussion would be different in 6 months or 2 years. IMO the discussion
>> we're having is not a "nop". And for a developer to find out that they
>> need to set the parent's ROUND_CLOSEST flag is not a "nop" either.
>
> You're talking about identifying a given problem. I was talking about
> finding, writing, merging and maintaining a solution for it. I know all
> too well how difficult identifying clock issues can be.
>
>> The patch [1] makes the life of SoC developers easier
>
> Unless they have to debug or modify it.
>
>> while it makes the nkm clock maintainer's life harder.
>
> Let's not create any undue opposition, there's no need to. Developers
> end up doing some part of the maintenance when they identify and fix
> bugs and maintainers are usually the most active developers. It's not
> like either are in some kind of ivory tower.
>
>> So the question is, how much easier is the developer's life (a lot,
>> IMO) and how much harder is the maintainer's life (a little, IMO).
>
> We strongly disagree on that.

Yes, we do.

>> In conclusion, for me applying patch [1] is the best option we have. It
>> not only prevents but also documents the issue in a clean way.
>
> And I disagree with that too. It might be for you, but all that is subjective.

Allright. I understand that you are fine that the nkm's SET_PARENT_RATE
depends on the parent having CCU_FEATURE_CLOSEST_RATE.

Do you have a suggestion, where I could document that
expectation? I'm thinking about adding a few sentences to the
existing description of struct ccu_nkm in ccu_nkm.h.

Thanks,
Frank

[1]: https://lore.kernel.org/all/[email protected]/
[2]: https://lore.kernel.org/all/yj6ss64s7p2uaslodj5zklrwhegz54bgh4l4wmldv6cccggepz@yombds4hij3c/
[3]: https://lore.kernel.org/all/[email protected]/
[4]: https://lore.kernel.org/all/[email protected]/
[5]: https://lore.kernel.org/all/4831731.31r3eYUQgx@jernej-laptop/

> Maxime
>
> [[End of PGP Signed Part]]