2022-03-07 09:17:06

by Vincent Whitchurch

[permalink] [raw]
Subject: [PATCH] clocksource/drivers/exynos_mct: Support using only local timer

The ARTPEC-8 SoC has a quad-core Cortex-A53 and a single-core Cortex-A5
which share one MCT with one global and eight local timers.

The Cortex-A53 boots first and starts the global FRC and also registers
a clock events device using the global timer. (This global timer clock
events is usually replaced by arch timer clock events for each of the
cores.)

When the A5 boots, we should not use the global timer interrupts or
write to the global timer registers. This is because even if there are
four global comparators, the control bits for all four are in the same
registers, and we would need to synchronize between the cpus. Instead,
the global timer FRC (already started by the A53) should be used as the
clock source, and one of the local timers which are not used by the A53
can be used for clock events on the A5.

To support this, add a module param to set the local timer starting
index. If this parameter is non-zero, the global timer clock events
device is not registered and we don't write to the global FRC if it is
already started.

Signed-off-by: Vincent Whitchurch <[email protected]>
---
drivers/clocksource/exynos_mct.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index f29c812b70c9..7ea2919b1808 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -33,7 +33,7 @@
#define EXYNOS4_MCT_G_INT_ENB EXYNOS4_MCTREG(0x248)
#define EXYNOS4_MCT_G_WSTAT EXYNOS4_MCTREG(0x24C)
#define _EXYNOS4_MCT_L_BASE EXYNOS4_MCTREG(0x300)
-#define EXYNOS4_MCT_L_BASE(x) (_EXYNOS4_MCT_L_BASE + (0x100 * x))
+#define EXYNOS4_MCT_L_BASE(x) (_EXYNOS4_MCT_L_BASE + (0x100 * (x)))
#define EXYNOS4_MCT_L_MASK (0xffffff00)

#define MCT_L_TCNTB_OFFSET (0x00)
@@ -77,6 +77,13 @@ static unsigned long clk_rate;
static unsigned int mct_int_type;
static int mct_irqs[MCT_NR_IRQS];

+/*
+ * First local timer index to use. If non-zero, global
+ * timer is not written to.
+ */
+static unsigned int mct_local_idx;
+module_param_named(local_idx, mct_local_idx, int, 0);
+
struct mct_clock_event_device {
struct clock_event_device evt;
unsigned long base;
@@ -157,6 +164,17 @@ static void exynos4_mct_frc_start(void)
u32 reg;

reg = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
+
+ /*
+ * If the FRC is already running, we don't need to start it again. We
+ * could probably just do this on all systems, but, to avoid any risk
+ * for regressions, we only do it on systems where it's absolutely
+ * necessary (i.e., on systems where writes to the global registers
+ * need to be avoided).
+ */
+ if (mct_local_idx && (reg & MCT_G_TCON_START))
+ return;
+
reg |= MCT_G_TCON_START;
exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
}
@@ -449,7 +467,7 @@ static int exynos4_mct_starting_cpu(unsigned int cpu)
per_cpu_ptr(&percpu_mct_tick, cpu);
struct clock_event_device *evt = &mevt->evt;

- mevt->base = EXYNOS4_MCT_L_BASE(cpu);
+ mevt->base = EXYNOS4_MCT_L_BASE(mct_local_idx + cpu);
snprintf(mevt->name, sizeof(mevt->name), "mct_tick%d", cpu);

evt->name = mevt->name;
@@ -554,13 +572,14 @@ static int __init exynos4_timer_interrupts(struct device_node *np,
} else {
for_each_possible_cpu(cpu) {
int mct_irq;
+ unsigned int irqidx = MCT_L0_IRQ + mct_local_idx + cpu;
struct mct_clock_event_device *pcpu_mevt =
per_cpu_ptr(&percpu_mct_tick, cpu);

pcpu_mevt->evt.irq = -1;
- if (MCT_L0_IRQ + cpu >= ARRAY_SIZE(mct_irqs))
+ if (irqidx >= ARRAY_SIZE(mct_irqs))
break;
- mct_irq = mct_irqs[MCT_L0_IRQ + cpu];
+ mct_irq = mct_irqs[irqidx];

irq_set_status_flags(mct_irq, IRQ_NOAUTOEN);
if (request_irq(mct_irq,
@@ -619,7 +638,7 @@ static int __init mct_init_dt(struct device_node *np, unsigned int int_type)
if (ret)
return ret;

- return exynos4_clockevent_init();
+ return (mct_local_idx == 0) ? exynos4_clockevent_init() : ret;
}


--
2.34.1


2022-03-07 09:29:18

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH] clocksource/drivers/exynos_mct: Support using only local timer

On 07/03/2022 09:32, Vincent Whitchurch wrote:
> The ARTPEC-8 SoC has a quad-core Cortex-A53 and a single-core Cortex-A5
> which share one MCT with one global and eight local timers.
>
> The Cortex-A53 boots first and starts the global FRC and also registers
> a clock events device using the global timer. (This global timer clock
> events is usually replaced by arch timer clock events for each of the
> cores.)
>
> When the A5 boots, we should not use the global timer interrupts or
> write to the global timer registers. This is because even if there are
> four global comparators, the control bits for all four are in the same
> registers, and we would need to synchronize between the cpus. Instead,
> the global timer FRC (already started by the A53) should be used as the
> clock source, and one of the local timers which are not used by the A53
> can be used for clock events on the A5.
>
> To support this, add a module param to set the local timer starting
> index. If this parameter is non-zero, the global timer clock events
> device is not registered and we don't write to the global FRC if it is
> already started.
>
> Signed-off-by: Vincent Whitchurch <[email protected]>
> ---
> drivers/clocksource/exynos_mct.c | 29 ++++++++++++++++++++++++-----
> 1 file changed, 24 insertions(+), 5 deletions(-)

This should not be send separately from the previous patch.

>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index f29c812b70c9..7ea2919b1808 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -33,7 +33,7 @@
> #define EXYNOS4_MCT_G_INT_ENB EXYNOS4_MCTREG(0x248)
> #define EXYNOS4_MCT_G_WSTAT EXYNOS4_MCTREG(0x24C)
> #define _EXYNOS4_MCT_L_BASE EXYNOS4_MCTREG(0x300)
> -#define EXYNOS4_MCT_L_BASE(x) (_EXYNOS4_MCT_L_BASE + (0x100 * x))
> +#define EXYNOS4_MCT_L_BASE(x) (_EXYNOS4_MCT_L_BASE + (0x100 * (x)))
> #define EXYNOS4_MCT_L_MASK (0xffffff00)
>
> #define MCT_L_TCNTB_OFFSET (0x00)
> @@ -77,6 +77,13 @@ static unsigned long clk_rate;
> static unsigned int mct_int_type;
> static int mct_irqs[MCT_NR_IRQS];
>
> +/*
> + * First local timer index to use. If non-zero, global
> + * timer is not written to.
> + */
> +static unsigned int mct_local_idx;
> +module_param_named(local_idx, mct_local_idx, int, 0);

No, it's a no go. Please use dedicated compatible if you need specific
quirks.

> +
> struct mct_clock_event_device {
> struct clock_event_device evt;
> unsigned long base;
> @@ -157,6 +164,17 @@ static void exynos4_mct_frc_start(void)
> u32 reg;
>
> reg = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
> +
> + /*
> + * If the FRC is already running, we don't need to start it again. We
> + * could probably just do this on all systems, but, to avoid any risk
> + * for regressions, we only do it on systems where it's absolutely
> + * necessary (i.e., on systems where writes to the global registers
> + * need to be avoided).
> + */
> + if (mct_local_idx && (reg & MCT_G_TCON_START))
> + return;

I don't get it. exynos4_mct_frc_start() is called exactly only once in
the system - during init. Not once per every CPU or cluster (I
understood you have two clusters, right?).

Best regards,
Krzysztof

2022-03-07 10:11:31

by Vincent Whitchurch

[permalink] [raw]
Subject: Re: [PATCH] clocksource/drivers/exynos_mct: Support using only local timer

On Mon, Mar 07, 2022 at 10:06:26AM +0100, Krzysztof Kozlowski wrote:
> On 07/03/2022 09:32, Vincent Whitchurch wrote:
> > The ARTPEC-8 SoC has a quad-core Cortex-A53 and a single-core Cortex-A5
> > which share one MCT with one global and eight local timers.
> >
> > The Cortex-A53 boots first and starts the global FRC and also registers
> > a clock events device using the global timer. (This global timer clock
> > events is usually replaced by arch timer clock events for each of the
> > cores.)
> >
> > When the A5 boots, we should not use the global timer interrupts or
> > write to the global timer registers. This is because even if there are
> > four global comparators, the control bits for all four are in the same
> > registers, and we would need to synchronize between the cpus. Instead,
> > the global timer FRC (already started by the A53) should be used as the
> > clock source, and one of the local timers which are not used by the A53
> > can be used for clock events on the A5.
> >
> > To support this, add a module param to set the local timer starting
> > index. If this parameter is non-zero, the global timer clock events
> > device is not registered and we don't write to the global FRC if it is
> > already started.
> >
> > Signed-off-by: Vincent Whitchurch <[email protected]>
> > ---
> > drivers/clocksource/exynos_mct.c | 29 ++++++++++++++++++++++++-----
> > 1 file changed, 24 insertions(+), 5 deletions(-)
>
> This should not be send separately from the previous patch.

OK, I will put it in a series.

>
> >
> > diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> > index f29c812b70c9..7ea2919b1808 100644
> > --- a/drivers/clocksource/exynos_mct.c
> > +++ b/drivers/clocksource/exynos_mct.c
> > @@ -33,7 +33,7 @@
> > #define EXYNOS4_MCT_G_INT_ENB EXYNOS4_MCTREG(0x248)
> > #define EXYNOS4_MCT_G_WSTAT EXYNOS4_MCTREG(0x24C)
> > #define _EXYNOS4_MCT_L_BASE EXYNOS4_MCTREG(0x300)
> > -#define EXYNOS4_MCT_L_BASE(x) (_EXYNOS4_MCT_L_BASE + (0x100 * x))
> > +#define EXYNOS4_MCT_L_BASE(x) (_EXYNOS4_MCT_L_BASE + (0x100 * (x)))
> > #define EXYNOS4_MCT_L_MASK (0xffffff00)
> >
> > #define MCT_L_TCNTB_OFFSET (0x00)
> > @@ -77,6 +77,13 @@ static unsigned long clk_rate;
> > static unsigned int mct_int_type;
> > static int mct_irqs[MCT_NR_IRQS];
> >
> > +/*
> > + * First local timer index to use. If non-zero, global
> > + * timer is not written to.
> > + */
> > +static unsigned int mct_local_idx;
> > +module_param_named(local_idx, mct_local_idx, int, 0);
>
> No, it's a no go. Please use dedicated compatible if you need specific
> quirks.

I could add a compatible, but please note that the hardware itself does
not have any quirks, it's only the usage of the hardware from one of the
Linux kernels (see the explanation below) which is different. Is it
correct to use a compatible to distinguish between these kind of
software-determined usage differences?

>
> > +
> > struct mct_clock_event_device {
> > struct clock_event_device evt;
> > unsigned long base;
> > @@ -157,6 +164,17 @@ static void exynos4_mct_frc_start(void)
> > u32 reg;
> >
> > reg = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
> > +
> > + /*
> > + * If the FRC is already running, we don't need to start it again. We
> > + * could probably just do this on all systems, but, to avoid any risk
> > + * for regressions, we only do it on systems where it's absolutely
> > + * necessary (i.e., on systems where writes to the global registers
> > + * need to be avoided).
> > + */
> > + if (mct_local_idx && (reg & MCT_G_TCON_START))
> > + return;
>
> I don't get it. exynos4_mct_frc_start() is called exactly only once in
> the system - during init. Not once per every CPU or cluster (I
> understood you have two clusters, right?).

Not quite. The Cortex-A53 and the Cortex-A5 do not have cache-coherency
between them, so they are not run in an SMP configuration. From the
Cortex-A53's perspective, the Cortex-A5 looks like any other hardware
block. The Cortex-A5 could just as well have run some other operating
system, but I run Linux on it. So there are two separate, independent
Linux kernels running on the SoC.

2022-03-07 10:36:29

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH] clocksource/drivers/exynos_mct: Support using only local timer

On 07/03/2022 10:24, Vincent Whitchurch wrote:
> On Mon, Mar 07, 2022 at 10:06:26AM +0100, Krzysztof Kozlowski wrote:
>> On 07/03/2022 09:32, Vincent Whitchurch wrote:
>>> The ARTPEC-8 SoC has a quad-core Cortex-A53 and a single-core Cortex-A5
>>> which share one MCT with one global and eight local timers.

Please mention that this is a two-OS case (or without cache coherency),
because usual design is different - two clusters being cache coherent.

>>>
>>> The Cortex-A53 boots first and starts the global FRC and also registers
>>> a clock events device using the global timer. (This global timer clock
>>> events is usually replaced by arch timer clock events for each of the
>>> cores.)
>>>
>>> When the A5 boots, we should not use the global timer interrupts or
>>> write to the global timer registers. This is because even if there are
>>> four global comparators, the control bits for all four are in the same
>>> registers, and we would need to synchronize between the cpus. Instead,
>>> the global timer FRC (already started by the A53) should be used as the
>>> clock source, and one of the local timers which are not used by the A53
>>> can be used for clock events on the A5.
>>>
>>> To support this, add a module param to set the local timer starting
>>> index. If this parameter is non-zero, the global timer clock events
>>> device is not registered and we don't write to the global FRC if it is
>>> already started.
>>>
>>> Signed-off-by: Vincent Whitchurch <[email protected]>
>>> ---
>>> drivers/clocksource/exynos_mct.c | 29 ++++++++++++++++++++++++-----
>>> 1 file changed, 24 insertions(+), 5 deletions(-)
>>
>> This should not be send separately from the previous patch.
>
> OK, I will put it in a series.
>
>>
>>>
>>> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
>>> index f29c812b70c9..7ea2919b1808 100644
>>> --- a/drivers/clocksource/exynos_mct.c
>>> +++ b/drivers/clocksource/exynos_mct.c
>>> @@ -33,7 +33,7 @@
>>> #define EXYNOS4_MCT_G_INT_ENB EXYNOS4_MCTREG(0x248)
>>> #define EXYNOS4_MCT_G_WSTAT EXYNOS4_MCTREG(0x24C)
>>> #define _EXYNOS4_MCT_L_BASE EXYNOS4_MCTREG(0x300)
>>> -#define EXYNOS4_MCT_L_BASE(x) (_EXYNOS4_MCT_L_BASE + (0x100 * x))
>>> +#define EXYNOS4_MCT_L_BASE(x) (_EXYNOS4_MCT_L_BASE + (0x100 * (x)))
>>> #define EXYNOS4_MCT_L_MASK (0xffffff00)
>>>
>>> #define MCT_L_TCNTB_OFFSET (0x00)
>>> @@ -77,6 +77,13 @@ static unsigned long clk_rate;
>>> static unsigned int mct_int_type;
>>> static int mct_irqs[MCT_NR_IRQS];
>>>
>>> +/*
>>> + * First local timer index to use. If non-zero, global
>>> + * timer is not written to.
>>> + */
>>> +static unsigned int mct_local_idx;
>>> +module_param_named(local_idx, mct_local_idx, int, 0);
>>
>> No, it's a no go. Please use dedicated compatible if you need specific
>> quirks.
>
> I could add a compatible, but please note that the hardware itself does
> not have any quirks, it's only the usage of the hardware from one of the
> Linux kernels (see the explanation below) which is different. Is it
> correct to use a compatible to distinguish between these kind of
> software-determined usage differences?
>
>>
>>> +
>>> struct mct_clock_event_device {
>>> struct clock_event_device evt;
>>> unsigned long base;
>>> @@ -157,6 +164,17 @@ static void exynos4_mct_frc_start(void)
>>> u32 reg;
>>>
>>> reg = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
>>> +
>>> + /*
>>> + * If the FRC is already running, we don't need to start it again. We
>>> + * could probably just do this on all systems, but, to avoid any risk
>>> + * for regressions, we only do it on systems where it's absolutely
>>> + * necessary (i.e., on systems where writes to the global registers
>>> + * need to be avoided).
>>> + */
>>> + if (mct_local_idx && (reg & MCT_G_TCON_START))
>>> + return;
>>
>> I don't get it. exynos4_mct_frc_start() is called exactly only once in
>> the system - during init. Not once per every CPU or cluster (I
>> understood you have two clusters, right?).
>
> Not quite. The Cortex-A53 and the Cortex-A5 do not have cache-coherency
> between them, so they are not run in an SMP configuration. From the
> Cortex-A53's perspective, the Cortex-A5 looks like any other hardware
> block. The Cortex-A5 could just as well have run some other operating
> system, but I run Linux on it. So there are two separate, independent
> Linux kernels running on the SoC.

I see, thanks for explanation. In such case it might not be a separate
compatible (programming model is the same) but rather dedicated property
or properties in DTS to indicate that some parts are shared with other
system. If property is present, you skip FRC initialization and use
local timers. You actually might need two properties - one for A53 and
one for A5. Or some kind of map to assign subset of local interrupts to
each of the systems.

I think still that DTS is the right place for it because it is a
property of hardware and it's too early in system boot to use some
remote-proc or mailbox...

Best regards,
Krzysztof