I would like to share a patchset that enables the NKM clock in pll-video0 to
consider alternative parent rates. I have found this feature particularly useful
to adjust the pll-video0's clock on Allwinner A64, as it allows me to achieve an
optimal rate for driving the board's panel (in my case, the Pinephone).
To provide some context, the clock structure involved in this process is as follows:
clock clock type
--------------------------------------
pll-video0 ccu_nm
pll-mipi ccu_nkm
tcon0 ccu_mux
tcon-data-clock sun4i_dclk
The divider between tcon0 and tcon-data-clock is fixed at 4. Therefore, in order
to achieve a rate that closely matches the desired rate of the panel, I need
pll-mipi to operate at a specific rate.
However, I must emphasize that setting the parent's rate for NKM clocks results
in a significant increase in the time required to find the optimal rate. For
instance, setting DCLK on the pinephone has seen a 60-fold increase in the time
taken, from approximately 0.5 ms to around 30 ms. These figures were obtained
through informal measurements on my pinephone, involving kernel logging and a
few reboots. The worst-case scenario observed was approximately 37 ms, while the
majority of cases were just under 30 ms.
The reason for this considerable increase in time is that the code now iterates
over all combinations of NKM for pll-mipi. For each combination, it subsequently
iterates over all combinations of NM for pll-video0.
I greatly appreciate your feedback and suggestions for further improving this
patchset.
Thanks,
Frank
Frank Oltmanns (2):
clk: sunxi-ng: nkm: consider alternative parent rates when finding
rate
clk: sunxi-ng: a64: allow pll-mipi to set parent's rate
drivers/clk/sunxi-ng/ccu-sun50i-a64.c | 3 +-
drivers/clk/sunxi-ng/ccu_nkm.c | 40 +++++++++++++++++++++------
2 files changed, 33 insertions(+), 10 deletions(-)
--
2.40.1
The nkm clock now supports setting the parent's rate. Utilize this
option to find the optimal rate for pll-mipi.
Signed-off-by: Frank Oltmanns <[email protected]>
---
drivers/clk/sunxi-ng/ccu-sun50i-a64.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-a64.c b/drivers/clk/sunxi-ng/ccu-sun50i-a64.c
index eb36f8f77d55..125ae097d96c 100644
--- a/drivers/clk/sunxi-ng/ccu-sun50i-a64.c
+++ b/drivers/clk/sunxi-ng/ccu-sun50i-a64.c
@@ -179,7 +179,8 @@ static struct ccu_nkm pll_mipi_clk = {
.common = {
.reg = 0x040,
.hw.init = CLK_HW_INIT("pll-mipi", "pll-video0",
- &ccu_nkm_ops, CLK_SET_RATE_UNGATE),
+ &ccu_nkm_ops,
+ CLK_SET_RATE_UNGATE | CLK_SET_RATE_PARENT),
},
};
--
2.40.1
On 2023-06-05 at 21:07:43 +0200, Frank Oltmanns <[email protected]> wrote:
> I would like to share a patchset that enables the NKM clock in pll-video0 to
> consider alternative parent rates. I have found this feature particularly useful
> to adjust the pll-video0's clock on Allwinner A64, as it allows me to achieve an
> optimal rate for driving the board's panel (in my case, the Pinephone).
>
> To provide some context, the clock structure involved in this process is as follows:
> clock clock type
> --------------------------------------
> pll-video0 ccu_nm
> pll-mipi ccu_nkm
> tcon0 ccu_mux
> tcon-data-clock sun4i_dclk
>
> The divider between tcon0 and tcon-data-clock is fixed at 4. Therefore, in order
> to achieve a rate that closely matches the desired rate of the panel, I need
> pll-mipi to operate at a specific rate.
>
> However, I must emphasize that setting the parent's rate for NKM clocks results
> in a significant increase in the time required to find the optimal rate. For
> instance, setting DCLK on the pinephone has seen a 60-fold increase in the time
> taken, from approximately 0.5 ms to around 30 ms. These figures were obtained
> through informal measurements on my pinephone, involving kernel logging and a
> few reboots. The worst-case scenario observed was approximately 37 ms, while the
> majority of cases were just under 30 ms.
>
> The reason for this considerable increase in time is that the code now iterates
> over all combinations of NKM for pll-mipi. For each combination, it subsequently
> iterates over all combinations of NM for pll-video0.
I don't know if this increase is problematic, but if it is, there are
options to mitigate it:
a. Binary search for finding parent rate:
Precalculate an _ordered_ table of meaningful combinations of NM for
pll-video0. That means that combinations that result in the same
clock rate shall only be listed once. For example, since both
{ .n = 8, .m = 1} and { .n = 16, .m = 2 }
are valid values for n and m, only one of them would be allowed
because both result in a factor of 8.
Furthermore, the table should only contain combinations that can
result in a valid clock rate. I.e., since the parent is a fixed rate
clock at 24 MHz and the ccu_nm's min_rate is 192 MHz, the
combination
{ .n = 1, .m = 1 }
should _not_ appear in the table as the resulting rate would be 96
MHz.
Utilizing this table, we can then do a binary search in
ccu_nm_find_best instead of iterating over all combinations of NM.
b. rational_best_approximation for finding parent rate:
Using the rational best approximation from linux/rational.h for the
parent's rate also significantly reduces the time. Instead of
iterating over all combinations of NM, the algorithm uses continued
fractions to "calculate best rational approximation for a given
fraction taking into account restricted register size" (quote from
the function's description).
This reduces the ccu_nm_find_best function to the following two
lines:
rational_best_approximation(rate, parent, nm->max_n, nm->max_m, &nm->n, &nm->m);
return ccu_nm_calc_rate(parent, nm->n, nm->m);
I did a rough implementation of both and found that both approaches
reduce the time spent to set DCLK to less than 2 ms, i.e., not quite the
original 0.5 ms, but not as bad as 30 ms.
Option a. either requires addional ROM space if we generate the table
off-line or some code to generate the table during initialization (which
of course would also require time to execute, but only once).
Option b. finds the closest approximation, whereas the current
implementation of ccu_nm_find_rate finds the closest rate that is less
than the requested rate. This means that option b. requires that all of
pll-video0's children (and grand-children a.s.o.) support rates higher
than the requested rate. Currently, the sunxi-ng driver in a lot of
places only expects rates that are less than the requested rate, so all
of these would need to be changed.
So, my question: Is spending the 30 ms fine or do I need to optimize for
speed in order for this patchset to be accepted? Or is 2 ms also too
much of an increase, in which case I'm out of ideas. :-)
I'm looking forward to receiving your feedback.
Thanks,
Frank
> I greatly appreciate your feedback and suggestions for further improving this
> patchset.
>
> Thanks,
> Frank
>
> Frank Oltmanns (2):
> clk: sunxi-ng: nkm: consider alternative parent rates when finding
> rate
> clk: sunxi-ng: a64: allow pll-mipi to set parent's rate
>
> drivers/clk/sunxi-ng/ccu-sun50i-a64.c | 3 +-
> drivers/clk/sunxi-ng/ccu_nkm.c | 40 +++++++++++++++++++++------
> 2 files changed, 33 insertions(+), 10 deletions(-)
On Wed, Jun 07, 2023 at 09:35:20AM +0200, Frank Oltmanns wrote:
> So, my question: Is spending the 30 ms fine or do I need to optimize for
> speed in order for this patchset to be accepted? Or is 2 ms also too
> much of an increase, in which case I'm out of ideas. :-)
You keep mentioning it, but it's really not clear to me why you think
that both are intertwined, or depend on one another?
What's wrong with just merging (some later version of) this series?
Maxime
On 2023-06-07 at 14:27:46 +0200, Maxime Ripard <[email protected]> wrote:
> [[PGP Signed Part:Undecided]]
> On Wed, Jun 07, 2023 at 09:35:20AM +0200, Frank Oltmanns wrote:
>> So, my question: Is spending the 30 ms fine or do I need to optimize for
>> speed in order for this patchset to be accepted? Or is 2 ms also too
>> much of an increase, in which case I'm out of ideas. :-)
>
> You keep mentioning it, but it's really not clear to me why you think
> that both are intertwined, or depend on one another?
I'm sorry about that. I guess, I got carried away. And furthermore I
took your mentioning of all bets being of how often setting rates
happens when HDMI comes into play as encouragement to optimize for
speed, which it clearly wasn't.
I saw the increase in time as a regression, because it might break
boards that I don't have access to. But since you say it's fine, I'll
speak no more of it.
> What's wrong with just merging (some later version of) this series?
Nothing. :-)
Thanks,
Frank
>
> Maxime
>
> [[End of PGP Signed Part]]
On Thu, Jun 08, 2023 at 11:29:05AM +0200, Frank Oltmanns wrote:
> On 2023-06-07 at 14:27:46 +0200, Maxime Ripard <[email protected]> wrote:
> > [[PGP Signed Part:Undecided]]
> > On Wed, Jun 07, 2023 at 09:35:20AM +0200, Frank Oltmanns wrote:
> >> So, my question: Is spending the 30 ms fine or do I need to optimize for
> >> speed in order for this patchset to be accepted? Or is 2 ms also too
> >> much of an increase, in which case I'm out of ideas. :-)
> >
> > You keep mentioning it, but it's really not clear to me why you think
> > that both are intertwined, or depend on one another?
>
> I'm sorry about that. I guess, I got carried away. And furthermore I
> took your mentioning of all bets being of how often setting rates
> happens when HDMI comes into play as encouragement to optimize for
> speed, which it clearly wasn't.
>
> I saw the increase in time as a regression, because it might break
> boards that I don't have access to. But since you say it's fine, I'll
> speak no more of it.
I'm not really saying it's fine, I guess I'm saying it's fine until
proven otherwise :)
Maxime