2022-10-21 15:53:51

by Joe Korty

[permalink] [raw]
Subject: [PATCH] arm64: arch_timer: XGene-1 has 31 bit, not 32 bit, arch timer.

arm64: XGene-1 has a 31 bit, not a 32 bit, arch timer.

Fixes: 012f188504528b8cb32f441ac3bd9ea2eba39c9e ("clocksource/drivers/arm_arch_timer:
Work around broken CVAL implementations")

Testing:
On an 8-cpu Mustang, the following sequence no longer locks up the system:

echo 0 >/proc/sys/kernel/watchdog
for i in {0..7}; do taskset -c $i echo hi there $i; done

Stable:
To be applied to 5.16 and above, once accepted by mainline.

Signed-off-by: Joe Korty <[email protected]>

Index: b/drivers/clocksource/arm_arch_timer.c
===================================================================
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -805,7 +805,7 @@ static u64 __arch_timer_check_delta(void
const struct midr_range broken_cval_midrs[] = {
/*
* XGene-1 implements CVAL in terms of TVAL, meaning
- * that the maximum timer range is 32bit. Shame on them.
+ * that the maximum timer range is 31bit. Shame on them.
*/
MIDR_ALL_VERSIONS(MIDR_CPU_MODEL(ARM_CPU_IMP_APM,
APM_CPU_PART_POTENZA)),
@@ -813,8 +813,8 @@ static u64 __arch_timer_check_delta(void
};

if (is_midr_in_range_list(read_cpuid_id(), broken_cval_midrs)) {
- pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 32bits");
- return CLOCKSOURCE_MASK(32);
+ pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 31bits");
+ return CLOCKSOURCE_MASK(31);
}
#endif
return CLOCKSOURCE_MASK(arch_counter_get_width());


2022-10-21 15:54:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] arm64: arch_timer: XGene-1 has 31 bit, not 32 bit, arch timer.

On Fri, Oct 21, 2022 at 11:34:24AM -0400, Joe Korty wrote:
> arm64: XGene-1 has a 31 bit, not a 32 bit, arch timer.
>
> Fixes: 012f188504528b8cb32f441ac3bd9ea2eba39c9e ("clocksource/drivers/arm_arch_timer:
> Work around broken CVAL implementations")
>
> Testing:
> On an 8-cpu Mustang, the following sequence no longer locks up the system:
>
> echo 0 >/proc/sys/kernel/watchdog
> for i in {0..7}; do taskset -c $i echo hi there $i; done
>
> Stable:
> To be applied to 5.16 and above, once accepted by mainline.
>
> Signed-off-by: Joe Korty <[email protected]>
>
> Index: b/drivers/clocksource/arm_arch_timer.c
> ===================================================================
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -805,7 +805,7 @@ static u64 __arch_timer_check_delta(void
> const struct midr_range broken_cval_midrs[] = {
> /*
> * XGene-1 implements CVAL in terms of TVAL, meaning
> - * that the maximum timer range is 32bit. Shame on them.
> + * that the maximum timer range is 31bit. Shame on them.
> */
> MIDR_ALL_VERSIONS(MIDR_CPU_MODEL(ARM_CPU_IMP_APM,
> APM_CPU_PART_POTENZA)),
> @@ -813,8 +813,8 @@ static u64 __arch_timer_check_delta(void
> };
>
> if (is_midr_in_range_list(read_cpuid_id(), broken_cval_midrs)) {
> - pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 32bits");
> - return CLOCKSOURCE_MASK(32);
> + pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 31bits");
> + return CLOCKSOURCE_MASK(31);
> }
> #endif
> return CLOCKSOURCE_MASK(arch_counter_get_width());

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

2022-10-21 18:30:08

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] arm64: arch_timer: XGene-1 has 31 bit, not 32 bit, arch timer.

On Fri, 21 Oct 2022 16:34:24 +0100,
Joe Korty <[email protected]> wrote:
>
> arm64: XGene-1 has a 31 bit, not a 32 bit, arch timer.
>
> Fixes: 012f188504528b8cb32f441ac3bd9ea2eba39c9e ("clocksource/drivers/arm_arch_timer:
> Work around broken CVAL implementations")

Sorry, but you'll have to provide a bit more of an analysis here. As
far as I can tell, you're just changing a parameter without properly
describing what breaks and how.

>
> Testing:
> On an 8-cpu Mustang, the following sequence no longer locks up the system:
>
> echo 0 >/proc/sys/kernel/watchdog
> for i in {0..7}; do taskset -c $i echo hi there $i; done
>
> Stable:
> To be applied to 5.16 and above, once accepted by mainline.
>
> Signed-off-by: Joe Korty <[email protected]>
>
> Index: b/drivers/clocksource/arm_arch_timer.c
> ===================================================================
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -805,7 +805,7 @@ static u64 __arch_timer_check_delta(void
> const struct midr_range broken_cval_midrs[] = {
> /*
> * XGene-1 implements CVAL in terms of TVAL, meaning
> - * that the maximum timer range is 32bit. Shame on them.
> + * that the maximum timer range is 31bit. Shame on them.
> */
> MIDR_ALL_VERSIONS(MIDR_CPU_MODEL(ARM_CPU_IMP_APM,
> APM_CPU_PART_POTENZA)),
> @@ -813,8 +813,8 @@ static u64 __arch_timer_check_delta(void
> };
>
> if (is_midr_in_range_list(read_cpuid_id(), broken_cval_midrs)) {
> - pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 32bits");
> - return CLOCKSOURCE_MASK(32);
> + pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 31bits");
> + return CLOCKSOURCE_MASK(31);
> }
> #endif
> return CLOCKSOURCE_MASK(arch_counter_get_width());
>

Also, this isn't much of a patch. Please see the documentation on how
to properly submit one.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2022-10-21 20:02:55

by Joe Korty

[permalink] [raw]
Subject: Re: [PATCH] arm64: arch_timer: XGene-1 has 31 bit, not 32 bit, arch timer.

Hi Marc,

On Fri, Oct 21, 2022 at 07:08:50PM +0100, Marc Zyngier wrote:
> Sorry, but you'll have to provide a bit more of an analysis here. As
> far as I can tell, you're just changing a parameter without properly
> describing what breaks and how.

There isn't much to analyse. For ages, 0x7fffffff (31 bits) was the
declared width of 'arch timer' for all arm architures, and that worked.
Your patch series made the declared width vary according to which chipset
was in use, which is good, but that rewrite changed the above mask for
the XGene-1 from 0x7fffffff to 0xffffffff. That change broke timers
for the XGene-1 since it seems that, in actuality, it has only a 31 bit
wide arch timer. Thus declaring that arch timer has 32-bits is wrong.
This mismatch between the actual and declared sizes would cause arithmetic
errors in the calculation of timer deltas which more than accounts for
the hrtimer failures I am seeing when running 5.16+ on my Mustang XGene1.

Only one line need change, the rest are fluff:

- return CLOCKSOURCE_MASK(32);
+ return CLOCKSOURCE_MASK(31);

> Also, this isn't much of a patch.

I don't know what this means. The patch contains all that is needed for
the fix, no more. I could add more comments as to _why_ it is 31 bits
not 32, but I don't know why. I only know that the motherboard behaves
as if 31 bits is all that is available in the hardware.

> Please see the documentation on how to properly submit one.

AFAICS, the only submission mistake is that the 'Cc: [email protected]'
line is missing.

Regards,
Joe

2022-10-22 05:47:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] arm64: arch_timer: XGene-1 has 31 bit, not 32 bit, arch timer.

On Fri, Oct 21, 2022 at 03:47:46PM -0400, Joe Korty wrote:
> Hi Marc,
>
> On Fri, Oct 21, 2022 at 07:08:50PM +0100, Marc Zyngier wrote:
> > Sorry, but you'll have to provide a bit more of an analysis here. As
> > far as I can tell, you're just changing a parameter without properly
> > describing what breaks and how.
>
> There isn't much to analyse. For ages, 0x7fffffff (31 bits) was the
> declared width of 'arch timer' for all arm architures, and that worked.
> Your patch series made the declared width vary according to which chipset
> was in use, which is good, but that rewrite changed the above mask for
> the XGene-1 from 0x7fffffff to 0xffffffff. That change broke timers
> for the XGene-1 since it seems that, in actuality, it has only a 31 bit
> wide arch timer. Thus declaring that arch timer has 32-bits is wrong.
> This mismatch between the actual and declared sizes would cause arithmetic
> errors in the calculation of timer deltas which more than accounts for
> the hrtimer failures I am seeing when running 5.16+ on my Mustang XGene1.
>
> Only one line need change, the rest are fluff:
>
> - return CLOCKSOURCE_MASK(32);
> + return CLOCKSOURCE_MASK(31);
>
> > Also, this isn't much of a patch.
>
> I don't know what this means. The patch contains all that is needed for
> the fix, no more. I could add more comments as to _why_ it is 31 bits
> not 32, but I don't know why. I only know that the motherboard behaves
> as if 31 bits is all that is available in the hardware.
>
> > Please see the documentation on how to properly submit one.
>
> AFAICS, the only submission mistake is that the 'Cc: [email protected]'
> line is missing.

No, you need a much better changelog text and probably subject line, and
to properly cc: the correct maintainers and developers. As my bot would
say:

- Kernel development is done in public, please always cc: a public
mailing list with a patch submission. Using the tool,
scripts/get_maintainer.pl on the patch will tell you what mailing list
to cc.

- You did not specify a description of why the patch is needed, or
possibly, any description at all, in the email body. Please read the
section entitled "The canonical patch format" in the kernel file,
Documentation/SubmittingPatches for what is needed in order to
properly describe the change.

- You did not write a descriptive Subject: for the patch, allowing Greg,
and everyone else, to know what this patch is all about. Please read
the section entitled "The canonical patch format" in the kernel file,
Documentation/SubmittingPatches for what a proper Subject: line should
look like.


Thanks,

greg k-h

2022-10-22 11:27:10

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] arm64: arch_timer: XGene-1 has 31 bit, not 32 bit, arch timer.

Hi Joe,

On Fri, 21 Oct 2022 20:47:46 +0100,
Joe Korty <[email protected]> wrote:
>
> Hi Marc,
>
> On Fri, Oct 21, 2022 at 07:08:50PM +0100, Marc Zyngier wrote:
> > Sorry, but you'll have to provide a bit more of an analysis here. As
> > far as I can tell, you're just changing a parameter without properly
> > describing what breaks and how.
>
> There isn't much to analyse.

Actually, there is plenty to analyse. Starting with *why* 31 is the
correct value (it actually is, see below) other than "hey, I reverted
this and it's all good, just merge it".

> For ages, 0x7fffffff (31 bits) was the
> declared width of 'arch timer' for all arm architures, and that worked.
> Your patch series made the declared width vary according to which chipset
> was in use, which is good, but that rewrite changed the above mask for
> the XGene-1 from 0x7fffffff to 0xffffffff.

This isn't quite what my changes did, but hey, let's not get derailed.

> That change broke timers
> for the XGene-1 since it seems that, in actuality, it has only a 31 bit
> wide arch timer. Thus declaring that arch timer has 32-bits is wrong.
> This mismatch between the actual and declared sizes would cause arithmetic
> errors in the calculation of timer deltas which more than accounts for
> the hrtimer failures I am seeing when running 5.16+ on my Mustang XGene1.

This is the important point, and the reason why it breaks:

XGene implements CVAL (a 64bit comparator) in terms of TVAL (a
countdown register) instead of the other way around. TVAL being a
32bit register, the width of the counter should equally be 32.
However, TVAL is a *signed* value, and keeps counting down in the
negative range once the timer fires.

It means that any TVAL value with bit 31 set will fire immediately, as
it cannot be distinguished from an already expired timer. Reducing the
timer range back to a paltry 31 bits papers over the issue.

Another problem cannot be fixed though, which is that the timer
interrupt *must* be handled within the negative countdown period, or
the interrupt will be lost (TVAL will rollover to a positive value,
indicative of a new timer deadline).

> Only one line need change, the rest are fluff:
>
> - return CLOCKSOURCE_MASK(32);
> + return CLOCKSOURCE_MASK(31);

Yes, and all you need is to send a proper patch, see below.

>
> > Also, this isn't much of a patch.
>
> I don't know what this means. The patch contains all that is needed for
> the fix, no more. I could add more comments as to _why_ it is 31 bits
> not 32, but I don't know why. I only know that the motherboard behaves
> as if 31 bits is all that is available in the hardware.
>
> > Please see the documentation on how to properly submit one.
>
> AFAICS, the only submission mistake is that the 'Cc: [email protected]'
> line is missing.

What you have done here is to write an email with a diff appended to
it, which isn't a proper kernel patch. I expect a patch to be
formatted with "git format-patch" instead of "git diff"
(i.e. something that is an actually commit instead of a local diff),
with a proper commit message (feel free to nick some of the
description above), with a Cc: stable@ and a Fixes: tag at the right
spot, Cc'ing all the relevant maintainers.

All of this is eloquently explained in the kernel documentation
(Documentation/process/submitting-patches.rst), and I would definitely
encourage you to read the sections titled "Describe your changes" and
"The canonical patch format". You can also look at the previous
commits to the same file to get a sense of the formatting that people
use.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.