LinuxLists.cc - [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

2014-01-08 13:29:36

Subject: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

The erratum-773769 occurs on Arm Coretex-A15 (rev r2p0),
when L2 Data Ram latency is set to 4 cycles or more; or
when ACP is in use, or with L2 Data RAM slice configured.
Therefore, the effective latency as calculated in Table 7-2 of
Cotex-A15 (rev r2p0) trm should be 3 cycles or less.

On Exynos5250 based systems the effective data ram latency
is 4 cycles, since we have DATA_RAM_SETUP bit enabled (L2CTRL[5]=1b'1)
and DATA_RAM_LATENCY bits set to 0x2 (L2CTLR[2:0]=3b'010) therefore,
the effective L2 data RAM latency becomes 4 cycles.
So erratum '773769' occurs causing a corrupted L2 Cache.

This patch gives a workaround to the mentioned erratum, using below
mentioned algo:
----------------------------------------------------------------
if data RAM setup = 1
then check if effective latency i.e (latency + setup + 1) > 3
if 'true'
then clear data RAM setup
goto branch 'a'
if data RAM setup = 0
a: then check if data RAM latency > 0x10
if true then force data RAM latency = 0x10
----------------------------------------------------------------
so that the effective data RAM latency reduces to 3 cycles or less
and hence prevent hitting the erratum.

NOTE: The Exynos5250 based products have already been shipped, which
makes it impossible to add the change in bootloader, so we are
adding the required change in kernel.

Signed-off-by: Vivek Gautam <[email protected]>
Cc: Doug Anderson <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: David Garbett <[email protected]>
---
arch/arm/Kconfig | 15 ++++++++
arch/arm/mach-exynos/Kconfig | 1 +
arch/arm/mm/proc-v7.S | 79 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 95 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c59fa19..2e6f36c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1250,6 +1250,21 @@ config ARM_ERRATA_751472
operation is received by a CPU before the ICIALLUIS has completed,
potentially leading to corrupted entries in the cache or TLB.

+config ARM_ERRATA_773769
+ bool "ARM errata: Large data RAM latencies can lead to rare data corruption"
+ depends on CPU_V7
+ help
+ This option enables the workaround for the erratum 773769, which affects
+ Cortex-A15 (rev r2p0).
+ In systems with L2 Data RAM latency programmed to 4 or more cycles,
+ or with ACP in use, or with a L2 Data RAM slice configured, it is
+ possible that a rare collision between non-cacheable stores and
+ L1 data cache evictions which can lead to data corruption in L2 cache
+ or memory.
+ This workaround is to configure an effective Data RAM latency of 3 or
+ less. Also note that, if a Data RAM slice is configured in A15 then
+ there is no workaround.
+
config PL310_ERRATA_753970
bool "PL310 errata: cache sync operation may be faulty"
depends on CACHE_PL310
diff --git a/arch/arm/mach-exynos/Kconfig b/arch/arm/mach-exynos/Kconfig
index 4c414af..29f505f 100644
--- a/arch/arm/mach-exynos/Kconfig
+++ b/arch/arm/mach-exynos/Kconfig
@@ -82,6 +82,7 @@ config SOC_EXYNOS5250
default y
depends on ARCH_EXYNOS5
select ARCH_HAS_BANDGAP
+ select ARM_ERRATA_773769
select PINCTRL_EXYNOS
select PM_GENERIC_DOMAINS if PM
select S5P_PM if PM
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index bd17819..0674c4c 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -141,6 +141,49 @@ ENTRY(cpu_v7_do_resume)
mcr p15, 0, r4, c10, c2, 0 @ write PRRR
mcr p15, 0, r5, c10, c2, 1 @ write NMRR
#endif /* CONFIG_MMU */
+
+#ifdef CONFIG_ARM_ERRATA_773769
+ /* get the arm rev id */
+ mrc p15, 0, r3, c0, c0, 0 @ read main ID register
+ and r4, r3, #0xff000000 @ ARM?
+ teq r4, #0x41000000
+ bne 8f
+ and r5, r3, #0x00f00000 @ variant
+ and r7, r3, #0x0000000f @ revision
+ orr r7, r7, r5, lsr #20-4 @ combine variant and revision
+ ubfx r3, r3, #4, #12 @ primary part number
+
+ ldr r4, =0x00000c0f @ Cortex-A15 primary part number
+ teq r3, r4
+ bne 8f
+
+ ALT_SMP(cmp r7, #0x21) @ present prior to r2p1
+ ALT_UP_B(8f)
+ mrclt p15, 0, r3, c1, c0, 0 @ read system control register
+ andlt r3, r3, #0x4 @ mask for C bit
+ cmplt r3, #0x0 @ check if cache is on/off
+ bne 8f @ Do nothing when cache is on
+
+ mrceq p15, 1, r5, c9, c0, 2 @ read L2 control register
+ andeq r3, r5, #(1 << 5) @ mask for data RAM setup
+ lsreq r3, r3, #0x5
+ cmpeq r3, #0x1 @ check if data RAM setup = 1
+ bne 9f
+ and r4, r5, #0x7 @ mask for data RAM latency
+ add r4, r4, r3
+ add r4, r4, #0x1 @ effective latency
+ cmp r4, #0x3
+ bicgt r5, r5, #(1 << 5) @ clear data RAM setup bit
+
+9: and r4, r5, #0x7 @ mask for data RAM latency
+ cmp r4, #0x2 @ check if data RAM latency > 2
+ ble 10f
+ bic r5, r5, #0x7 @ clear data RAM latency bits
+ orr r5, r5, #0x2 @ force data RAM latency = 2
+10: mcr p15, 1, r5, c9, c0, 2 @ set L2 control register
+8:
+#endif
+
mrc p15, 0, r4, c1, c0, 1 @ Read Auxiliary control register
teq r4, r9 @ Is it already set?
mcrne p15, 0, r9, c1, c0, 1 @ No, so write it
@@ -349,6 +392,42 @@ __v7_setup:
mcrle p15, 0, r10, c1, c0, 1 @ write aux control register
#endif

+#ifdef CONFIG_ARM_ERRATA_773769
+ ALT_SMP(cmp r6, #0x21) @ present prior to r2p1
+ ALT_UP_B(5f)
+ mrclt p15, 0, r3, c1, c0, 0 @ read system control register
+ andlt r3, r3, #0x4 @ mask for C bit
+ cmplt r3, #0x0 @ check if cache is on/off
+ bne 5f @ Do nothing when cache is on
+ /*
+ * if data RAM setup = 1
+ * then check if effective latency i.e (latency + setup + 1) > 3
+ * if true then clear data RAM setup
+ * goto branch 'a'
+ * if data RAM setup = 0
+ * a: then check if data RAM latency > 0x10
+ * if true then force data RAM latency = 0x10
+ */
+ mrceq p15, 1, r5, c9, c0, 2 @ read L2 control register
+ andeq r3, r5, #(1 << 5) @ mask for data RAM setup
+ lsreq r3, r3, #0x5
+ cmpeq r3, #0x1 @ check if data RAM setup = 1
+ bne 6f
+ and r10, r5, #0x7 @ mask for data RAM latency
+ add r10, r10, r3
+ add r10, r10, #0x1 @ effective latency
+ cmp r10, #0x3
+ bicgt r5, r5, #(1 << 5) @ clear data RAM setup bit
+
+6: and r10, r5, #0x7 @ mask for data RAM latency
+ cmp r10, #0x2 @ check if data RAM latency > 2
+ ble 7f
+ bic r5, r5, #0x7 @ clear data RAM latency bits
+ orr r5, r5, #0x2 @ force data RAM latency = 2
+7: mcr p15, 1, r5, c9, c0, 2 @ set L2 control register
+5:
+#endif
+
4: mov r10, #0
mcr p15, 0, r10, c7, c5, 0 @ I+BTB cache invalidate
dsb
--
1.7.10.4

2014-01-08 14:36:08

by Will Deacon

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

On Wed, Jan 08, 2014 at 01:33:11PM +0000, Vivek Gautam wrote:
> The erratum-773769 occurs on Arm Coretex-A15 (rev r2p0),
> when L2 Data Ram latency is set to 4 cycles or more; or
> when ACP is in use, or with L2 Data RAM slice configured.
> Therefore, the effective latency as calculated in Table 7-2 of
> Cotex-A15 (rev r2p0) trm should be 3 cycles or less.
>
> On Exynos5250 based systems the effective data ram latency
> is 4 cycles, since we have DATA_RAM_SETUP bit enabled (L2CTRL[5]=1b'1)
> and DATA_RAM_LATENCY bits set to 0x2 (L2CTLR[2:0]=3b'010) therefore,
> the effective L2 data RAM latency becomes 4 cycles.
> So erratum '773769' occurs causing a corrupted L2 Cache.
>
> This patch gives a workaround to the mentioned erratum, using below
> mentioned algo:
> ----------------------------------------------------------------
> if data RAM setup = 1
> then check if effective latency i.e (latency + setup + 1) > 3
> if 'true'
> then clear data RAM setup
> goto branch 'a'
> if data RAM setup = 0
> a: then check if data RAM latency > 0x10
> if true then force data RAM latency = 0x10
> ----------------------------------------------------------------
> so that the effective data RAM latency reduces to 3 cycles or less
> and hence prevent hitting the erratum.
>
> NOTE: The Exynos5250 based products have already been shipped, which
> makes it impossible to add the change in bootloader, so we are
> adding the required change in kernel.

NAK. Whilst I appreciate that you may not be able to fix your bootloader,
this isn't the right change to make in the kernel. Blindly changing memory
latencies is likely to do more harm than good for a multi-platform kernel,
even if it works for exynos 5250. The only alternative I can think of (if you
have to make a mainline kernel change) is to restrict the clock frequencies at
which the CPU is allowed to run, although that obviously requires some
investigation in order to determine how viable it is for your SoC.

Will

2014-01-08 16:21:26

by Doug Anderson

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

WIll,

Thanks for your comments!

On Wed, Jan 8, 2014 at 6:35 AM, Will Deacon <[email protected]> wrote:
> On Wed, Jan 08, 2014 at 01:33:11PM +0000, Vivek Gautam wrote:
>> The erratum-773769 occurs on Arm Coretex-A15 (rev r2p0),
>> when L2 Data Ram latency is set to 4 cycles or more; or
>> when ACP is in use, or with L2 Data RAM slice configured.
>> Therefore, the effective latency as calculated in Table 7-2 of
>> Cotex-A15 (rev r2p0) trm should be 3 cycles or less.
>>
>> On Exynos5250 based systems the effective data ram latency
>> is 4 cycles, since we have DATA_RAM_SETUP bit enabled (L2CTRL[5]=1b'1)
>> and DATA_RAM_LATENCY bits set to 0x2 (L2CTLR[2:0]=3b'010) therefore,
>> the effective L2 data RAM latency becomes 4 cycles.
>> So erratum '773769' occurs causing a corrupted L2 Cache.
>>
>> This patch gives a workaround to the mentioned erratum, using below
>> mentioned algo:
>> ----------------------------------------------------------------
>> if data RAM setup = 1
>> then check if effective latency i.e (latency + setup + 1) > 3
>> if 'true'
>> then clear data RAM setup
>> goto branch 'a'
>> if data RAM setup = 0
>> a: then check if data RAM latency > 0x10
>> if true then force data RAM latency = 0x10
>> ----------------------------------------------------------------
>> so that the effective data RAM latency reduces to 3 cycles or less
>> and hence prevent hitting the erratum.
>>
>> NOTE: The Exynos5250 based products have already been shipped, which
>> makes it impossible to add the change in bootloader, so we are
>> adding the required change in kernel.
>
> NAK. Whilst I appreciate that you may not be able to fix your bootloader,
> this isn't the right change to make in the kernel. Blindly changing memory
> latencies is likely to do more harm than good for a multi-platform kernel,
> even if it works for exynos 5250. The only alternative I can think of (if you
> have to make a mainline kernel change) is to restrict the clock frequencies at
> which the CPU is allowed to run, although that obviously requires some
> investigation in order to determine how viable it is for your SoC.

OK, so humor me a little here...

I'll start off by saying that I'm totally OK if mainline doesn't want
this fixed. If mainline is not interested in running reliably on
exynos5250-based products then there's nothing I can do about it. It
seems unfortunate, but I'm not going to get into a shouting match
about it.

You're saying that this patch is blindly changing memory latencies. I
don't think it is (please correct me if I'm wrong!). This patch:

* Is guarded by a CONFIG option so the code isn't even compiled in if
you don't want EXYNOS5250 support. I know this doesn't help
multiplatform, but it means that if you really hate the code it's easy
to disable.

* Is guarded by a runtime check of the revision number so that it
doesn't run on unaffected A15 revisions.

* Is guarded by a runtime check so it does nothing at all if the total
latency is <= 3 (AKA if boot code already picked a sane value)

...with the above guards it's pretty safe...

I will agree that there is a _potential_ that this could make things
work worse on an already broken product our there, but I would say
there's a reasonable chance that such a product doesn't exist (but
please correct me if I'm wrong). Specifically, this patch will cause
problems in two examples that I can think of:

--

Example A: existing A15 <=r2p0 product with 773769-ignorant boot code
that could be fixed, but that needs "setup = 1"

In this case we've got boot code that's like the exynos5250 boot code
that accidentally sets the total latency to >= 4 when it would work
just fine to use a value of 3. ...except that unlike the exynos5250
this hypthetical SoC needs to leave setup = 1.

...if such a machine were found in the wild (seems unlikely?) then
we'll need to figure out what to do. If its boot code cannot be
updated and we want to support this product with a similar patch then
we'll need to be more dynamic.

--

Example B: existing A15 <=r2p0 product that just has to live with crashes

In this case we've got a product where we're going to just accept that
it crashes sometimes (since this is a fairly hard crash to trigger)
because it crashes even more when the total latency < 4. In this case
we don't want to "fix" the errata because that makes things worse.

...if such a machine were find in the wild (it's possible, I guess?)
then that's a really good reason not to take this patch or to find
some way to dynamically enable / disable it.

--

Let me know what you think of the above, or if I'm misunderstanding something...

-Doug

2014-01-08 19:20:41

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

On Wed, Jan 08, 2014 at 08:21:21AM -0800, Doug Anderson wrote:
> WIll,
>
> Thanks for your comments!
>
> On Wed, Jan 8, 2014 at 6:35 AM, Will Deacon <[email protected]> wrote:
> > NAK. Whilst I appreciate that you may not be able to fix your bootloader,
> > this isn't the right change to make in the kernel. Blindly changing memory
> > latencies is likely to do more harm than good for a multi-platform kernel,
> > even if it works for exynos 5250. The only alternative I can think of (if you
> > have to make a mainline kernel change) is to restrict the clock frequencies at
> > which the CPU is allowed to run, although that obviously requires some
> > investigation in order to determine how viable it is for your SoC.

+1

> I'll start off by saying that I'm totally OK if mainline doesn't want
> this fixed. If mainline is not interested in running reliably on
> exynos5250-based products then there's nothing I can do about it. It
> seems unfortunate, but I'm not going to get into a shouting match
> about it.

No, we're saying to put the work-around in the boot loader, not the kernel.

> You're saying that this patch is blindly changing memory latencies. I
> don't think it is (please correct me if I'm wrong!). This patch:
>
> * Is guarded by a CONFIG option so the code isn't even compiled in if
> you don't want EXYNOS5250 support. I know this doesn't help
> multiplatform, but it means that if you really hate the code it's easy
> to disable.

As you identify, this is completely useless in the multiplatform kernel
case, so you can completely remove this from the argument _for_ this
change - it carries no weight what so ever.

> * Is guarded by a runtime check of the revision number so that it
> doesn't run on unaffected A15 revisions.

So what if the A15 reports an affected revision, but the SoC integrator
has fixed the problem - why should they have to put up with the work-
around being applied on their silicon?

> * Is guarded by a runtime check so it does nothing at all if the total
> latency is <= 3 (AKA if boot code already picked a sane value)

Right, so the effect of the above is that with a multiplatform kernel
including Exynos 5 support, all platforms with an A15 r2p0 or earlier
have this work-around applied even if it was actually fixed in the
silicon - which there's no way to know from the software perspective.

We've been through these arguments many times, you're not the first to
raise it, and we've decided upon the policy. We want as _few_ work-
arounds in the kernel as possible, because applying the work-arounds
is very problematical with the mixture of secure and non-secure booting.

Maybe Will can check, but I believe that the L2 cache control register
is one of those which is not writable from non-secure mode, and thus
will crash the kernel at boot if a write is attempted.

--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

2014-01-08 19:43:32

by Doug Anderson

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

Hi,

On Wed, Jan 8, 2014 at 11:20 AM, Russell King - ARM Linux
<[email protected]> wrote:
> On Wed, Jan 08, 2014 at 08:21:21AM -0800, Doug Anderson wrote:
>> WIll,
>>
>> Thanks for your comments!
>>
>> On Wed, Jan 8, 2014 at 6:35 AM, Will Deacon <[email protected]> wrote:
>> > NAK. Whilst I appreciate that you may not be able to fix your bootloader,
>> > this isn't the right change to make in the kernel. Blindly changing memory
>> > latencies is likely to do more harm than good for a multi-platform kernel,
>> > even if it works for exynos 5250. The only alternative I can think of (if you
>> > have to make a mainline kernel change) is to restrict the clock frequencies at
>> > which the CPU is allowed to run, although that obviously requires some
>> > investigation in order to determine how viable it is for your SoC.
>
> +1
>
>> I'll start off by saying that I'm totally OK if mainline doesn't want
>> this fixed. If mainline is not interested in running reliably on
>> exynos5250-based products then there's nothing I can do about it. It
>> seems unfortunate, but I'm not going to get into a shouting match
>> about it.
>
> No, we're saying to put the work-around in the boot loader, not the kernel.

Unfortunately the resume path of the firmware runs from Read Only
firmware code (yes, it sucks), so it's not totally trivial to fix. It
would be possible for someone to unscrew their write protect switch
and update their RO firmware, but that doesn't help the average user.

Olof came up with the idea that you could update the RW firmware
(affects initial boot) and then cache away the value and restore it in
the kernel after resume. That would still require a kernel patch but
perhaps a less objectionable one. ...of course if writing this
register is a problem in secure mode then maybe that patch would be
NAKed anyway.

>> * Is guarded by a runtime check of the revision number so that it
>> doesn't run on unaffected A15 revisions.
>
> So what if the A15 reports an affected revision, but the SoC integrator
> has fixed the problem - why should they have to put up with the work-
> around being applied on their silicon?

That's one I didn't think about, you're right. ...but we're really
getting into hypothetical situations here. Are there any r2p0
products that have such a fix (and thus require a latency of >= 4)?

> We've been through these arguments many times, you're not the first to
> raise it, and we've decided upon the policy. We want as _few_ work-
> arounds in the kernel as possible, because applying the work-arounds
> is very problematical with the mixture of secure and non-secure booting.

OK, fair enough. If we want no workaround here then we can drop this patch.

I'd guess that the way forward is:

* Land kernel workaround locally in Chromium tree

* I'll try to land FW change locally in at least one Chromium branch.
If we were to get a new RO build ever (seems unlikely at this point)
it would be fixed for upstream kernels. If we were to get a new RW
build (seems unlikely, but at least less unlikely) it would be fixed
if someone landed a kernel patch to save/restore this register across
suspend/resume.

* If Joe Upstream wants to run an upstream kernel on some type of
exynos5250 product (Samsung ARM Chromebook, HP Chromebook 11, Nexus 10
are the ones I know of) then he will deal with the small number of
crashes or figure out a solution.

-Doug

2014-01-08 21:02:38

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

On Wed, Jan 08, 2014 at 11:43:29AM -0800, Doug Anderson wrote:
> Olof came up with the idea that you could update the RW firmware
> (affects initial boot) and then cache away the value and restore it in
> the kernel after resume. That would still require a kernel patch but
> perhaps a less objectionable one. ...of course if writing this
> register is a problem in secure mode then maybe that patch would be
> NAKed anyway.

It's not a problem in secure mode, since secure mode will have access
to the register. It's the non-secure mode where various registers
either ignore writes, or they trigger an exception that are the problem.

Consider what would happen if an exception were to be triggered when no
exception handlers were installed (eg, because the MMU is not enabled.)

> That's one I didn't think about, you're right. ...but we're really
> getting into hypothetical situations here. Are there any r2p0
> products that have such a fix (and thus require a latency of >= 4)?

Really, we can't know, because this kind of information is dependent on
the SoC, and probably such customisations are only knowable via NDAs.

However, I do know of some A9 CPUs where exactly this kind of thing has
happened - where various selected silicon bugs have been fixed but the
revision still reports the same as one with all those bugs.

--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

2014-01-08 21:08:45

by Nicolas Pitre

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

On Wed, 8 Jan 2014, Doug Anderson wrote:

> Hi,
>
> On Wed, Jan 8, 2014 at 11:20 AM, Russell King - ARM Linux
> > No, we're saying to put the work-around in the boot loader, not the kernel.
>
> Unfortunately the resume path of the firmware runs from Read Only
> firmware code (yes, it sucks), so it's not totally trivial to fix. It
> would be possible for someone to unscrew their write protect switch
> and update their RO firmware, but that doesn't help the average user.

[...]

> I'd guess that the way forward is:
>
> * Land kernel workaround locally in Chromium tree
>
> * I'll try to land FW change locally in at least one Chromium branch.
> If we were to get a new RO build ever (seems unlikely at this point)
> it would be fixed for upstream kernels. If we were to get a new RW
> build (seems unlikely, but at least less unlikely) it would be fixed
> if someone landed a kernel patch to save/restore this register across
> suspend/resume.
>
> * If Joe Upstream wants to run an upstream kernel on some type of
> exynos5250 product (Samsung ARM Chromebook, HP Chromebook 11, Nexus 10
> are the ones I know of) then he will deal with the small number of
> crashes or figure out a solution.

If Joe Upstream wants to run an upstream kernel, doesn't he have to
unscrew his write protect switch first, at which point the RO firmware
can be updated as well?

Nicolas

2014-01-08 21:14:15

by Nicolas Pitre

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

On Wed, 8 Jan 2014, Doug Anderson wrote:

> On Wed, Jan 8, 2014 at 11:20 AM, Russell King - ARM Linux
> <[email protected]> wrote:
> > We've been through these arguments many times, you're not the first to
> > raise it, and we've decided upon the policy. We want as _few_ work-
> > arounds in the kernel as possible, because applying the work-arounds
> > is very problematical with the mixture of secure and non-secure booting.
>
> OK, fair enough. If we want no workaround here then we can drop this patch.
>
>
> I'd guess that the way forward is:
>
> * Land kernel workaround locally in Chromium tree
>
> * I'll try to land FW change locally in at least one Chromium branch.
> If we were to get a new RO build ever (seems unlikely at this point)
> it would be fixed for upstream kernels. If we were to get a new RW
> build (seems unlikely, but at least less unlikely) it would be fixed
> if someone landed a kernel patch to save/restore this register across
> suspend/resume.
>
> * If Joe Upstream wants to run an upstream kernel on some type of
> exynos5250 product (Samsung ARM Chromebook, HP Chromebook 11, Nexus 10
> are the ones I know of) then he will deal with the small number of
> crashes or figure out a solution.

At some point you have to realize that what you're proposing is not
viable when taking into account the bigger picture. And your FW
architecture is to blame. If you were running Windows instead of Linux
I bet you'd have found a way to fix things differently than asking
Microsoft to add a special quirk in their code. In the PC world you are
required to perform a BIOS update instead.

So you really need to provision the ability to update at least a portion
of the firmware that is not read-only. That may be a third-stage
bootloader or the like which purpose is only to install workarounds such
as this before executing the kernel and which doesn't need to be updated
with the kernel. And in theory you should be able to do that on top of
your current RO firmware.

Errata workarounds are often machine specific and they can be applied
only in special conditions the kernel might not safely know about until
it is too late.

Nicolas

2014-01-08 21:32:21

by Doug Anderson

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

Nicolas,

On Wed, Jan 8, 2014 at 12:58 PM, Nicolas Pitre <[email protected]> wrote:
> On Wed, 8 Jan 2014, Doug Anderson wrote:
>
>> On Wed, Jan 8, 2014 at 11:20 AM, Russell King - ARM Linux
>> <[email protected]> wrote:
>> > We've been through these arguments many times, you're not the first to
>> > raise it, and we've decided upon the policy. We want as _few_ work-
>> > arounds in the kernel as possible, because applying the work-arounds
>> > is very problematical with the mixture of secure and non-secure booting.
>>
>> OK, fair enough. If we want no workaround here then we can drop this patch.
>>
>>
>> I'd guess that the way forward is:
>>
>> * Land kernel workaround locally in Chromium tree
>>
>> * I'll try to land FW change locally in at least one Chromium branch.
>> If we were to get a new RO build ever (seems unlikely at this point)
>> it would be fixed for upstream kernels. If we were to get a new RW
>> build (seems unlikely, but at least less unlikely) it would be fixed
>> if someone landed a kernel patch to save/restore this register across
>> suspend/resume.
>>
>> * If Joe Upstream wants to run an upstream kernel on some type of
>> exynos5250 product (Samsung ARM Chromebook, HP Chromebook 11, Nexus 10
>> are the ones I know of) then he will deal with the small number of
>> crashes or figure out a solution.
>
> At some point you have to realize that what you're proposing is not
> viable when taking into account the bigger picture. And your FW
> architecture is to blame. If you were running Windows instead of Linux
> I bet you'd have found a way to fix things differently than asking
> Microsoft to add a special quirk in their code. In the PC world you are
> required to perform a BIOS update instead.

Yup, I can believe that.

...I'm merely trying to report reality. Firmware updates (even RW
ones) require an extra level of scrutiny / testing and a chunk of
in-house resources. Those resources get allocated if there is no
other choice, but I can't force a lot of people to spend a lot of time
to approve a RW firmware update (actually approving >= 3 RW firmware
updates since there are at least three different 5250 variants) when
there is a simple and low risk kernel fix for it. I could try, but
I'll get slapped down / laughed at.

If this were the Windows/x86 world and we had no choice the decision
would go a different way. ...but we're not in the Windows/x86 world.

> So you really need to provision the ability to update at least a portion
> of the firmware that is not read-only. That may be a third-stage
> bootloader or the like which purpose is only to install workarounds such
> as this before executing the kernel and which doesn't need to be updated
> with the kernel. And in theory you should be able to do that on top of
> your current RO firmware.

The fact that we have too much code in RO is a known issue and people
are definitely working on that. It has already bitten us. It's
unlikely that someone will try retrofit something to work on existing
devices, though.

-Doug

2014-01-08 21:33:41

by Doug Anderson

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

Russell,

On Wed, Jan 8, 2014 at 1:02 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Wed, Jan 08, 2014 at 11:43:29AM -0800, Doug Anderson wrote:
>> Olof came up with the idea that you could update the RW firmware
>> (affects initial boot) and then cache away the value and restore it in
>> the kernel after resume. That would still require a kernel patch but
>> perhaps a less objectionable one. ...of course if writing this
>> register is a problem in secure mode then maybe that patch would be
>> NAKed anyway.
>
> It's not a problem in secure mode, since secure mode will have access
> to the register. It's the non-secure mode where various registers
> either ignore writes, or they trigger an exception that are the problem.
>
> Consider what would happen if an exception were to be triggered when no
> exception handlers were installed (eg, because the MMU is not enabled.)

OK, good to know. ...so if someone wanted to code up this patch and
send upstream they would be able to. :)

-Doug

2014-01-08 21:35:30

by Doug Anderson

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

Nicolas,

On Wed, Jan 8, 2014 at 1:08 PM, Nicolas Pitre <[email protected]> wrote:
> On Wed, 8 Jan 2014, Doug Anderson wrote:
>
>> Hi,
>>
>> On Wed, Jan 8, 2014 at 11:20 AM, Russell King - ARM Linux
>> > No, we're saying to put the work-around in the boot loader, not the kernel.
>>
>> Unfortunately the resume path of the firmware runs from Read Only
>> firmware code (yes, it sucks), so it's not totally trivial to fix. It
>> would be possible for someone to unscrew their write protect switch
>> and update their RO firmware, but that doesn't help the average user.
>
> [...]
>
>> I'd guess that the way forward is:
>>
>> * Land kernel workaround locally in Chromium tree
>>
>> * I'll try to land FW change locally in at least one Chromium branch.
>> If we were to get a new RO build ever (seems unlikely at this point)
>> it would be fixed for upstream kernels. If we were to get a new RW
>> build (seems unlikely, but at least less unlikely) it would be fixed
>> if someone landed a kernel patch to save/restore this register across
>> suspend/resume.
>>
>> * If Joe Upstream wants to run an upstream kernel on some type of
>> exynos5250 product (Samsung ARM Chromebook, HP Chromebook 11, Nexus 10
>> are the ones I know of) then he will deal with the small number of
>> crashes or figure out a solution.
>
> If Joe Upstream wants to run an upstream kernel, doesn't he have to
> unscrew his write protect switch first, at which point the RO firmware
> can be updated as well?

Actually, no. You can move your device into dev mode and run any
kernel you want. You'll get an annoying "you're in dev mode" screen
at every bootup, but otherwise it works just fine.

Going into dev mode requires some special keystrokes at bootup and a
wipe of your hard disk but no screwdrivers.

-Doug

2014-01-10 17:09:39

by Tomasz Figa

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

Hi,

On 08.01.2014 17:21, Doug Anderson wrote:
> WIll,
>
> Thanks for your comments!
>
> On Wed, Jan 8, 2014 at 6:35 AM, Will Deacon <[email protected]> wrote:
>> On Wed, Jan 08, 2014 at 01:33:11PM +0000, Vivek Gautam wrote:
>>> The erratum-773769 occurs on Arm Coretex-A15 (rev r2p0),
>>> when L2 Data Ram latency is set to 4 cycles or more; or
>>> when ACP is in use, or with L2 Data RAM slice configured.
>>> Therefore, the effective latency as calculated in Table 7-2 of
>>> Cotex-A15 (rev r2p0) trm should be 3 cycles or less.
>>>
>>> On Exynos5250 based systems the effective data ram latency
>>> is 4 cycles, since we have DATA_RAM_SETUP bit enabled (L2CTRL[5]=1b'1)
>>> and DATA_RAM_LATENCY bits set to 0x2 (L2CTLR[2:0]=3b'010) therefore,
>>> the effective L2 data RAM latency becomes 4 cycles.
>>> So erratum '773769' occurs causing a corrupted L2 Cache.
>>>
>>> This patch gives a workaround to the mentioned erratum, using below
>>> mentioned algo:
>>> ----------------------------------------------------------------
>>> if data RAM setup = 1
>>> then check if effective latency i.e (latency + setup + 1) > 3
>>> if 'true'
>>> then clear data RAM setup
>>> goto branch 'a'
>>> if data RAM setup = 0
>>> a: then check if data RAM latency > 0x10
>>> if true then force data RAM latency = 0x10
>>> ----------------------------------------------------------------
>>> so that the effective data RAM latency reduces to 3 cycles or less
>>> and hence prevent hitting the erratum.
>>>
>>> NOTE: The Exynos5250 based products have already been shipped, which
>>> makes it impossible to add the change in bootloader, so we are
>>> adding the required change in kernel.
>>
>> NAK. Whilst I appreciate that you may not be able to fix your bootloader,
>> this isn't the right change to make in the kernel. Blindly changing memory
>> latencies is likely to do more harm than good for a multi-platform kernel,
>> even if it works for exynos 5250. The only alternative I can think of (if you
>> have to make a mainline kernel change) is to restrict the clock frequencies at
>> which the CPU is allowed to run, although that obviously requires some
>> investigation in order to determine how viable it is for your SoC.
>
> OK, so humor me a little here...
>
> I'll start off by saying that I'm totally OK if mainline doesn't want
> this fixed. If mainline is not interested in running reliably on
> exynos5250-based products then there's nothing I can do about it. It
> seems unfortunate, but I'm not going to get into a shouting match
> about it.
>
> You're saying that this patch is blindly changing memory latencies. I
> don't think it is (please correct me if I'm wrong!). This patch:
>
> * Is guarded by a CONFIG option so the code isn't even compiled in if
> you don't want EXYNOS5250 support. I know this doesn't help
> multiplatform, but it means that if you really hate the code it's easy
> to disable.
>
> * Is guarded by a runtime check of the revision number so that it
> doesn't run on unaffected A15 revisions.
>
> * Is guarded by a runtime check so it does nothing at all if the total
> latency is <= 3 (AKA if boot code already picked a sane value)
>
> ...with the above guards it's pretty safe...
>
> I will agree that there is a _potential_ that this could make things
> work worse on an already broken product our there, but I would say
> there's a reasonable chance that such a product doesn't exist (but
> please correct me if I'm wrong). Specifically, this patch will cause
> problems in two examples that I can think of:
>
> --
>
> Example A: existing A15 <=r2p0 product with 773769-ignorant boot code
> that could be fixed, but that needs "setup = 1"
>
> In this case we've got boot code that's like the exynos5250 boot code
> that accidentally sets the total latency to >= 4 when it would work
> just fine to use a value of 3. ...except that unlike the exynos5250
> this hypthetical SoC needs to leave setup = 1.
>
> ...if such a machine were found in the wild (seems unlikely?) then
> we'll need to figure out what to do. If its boot code cannot be
> updated and we want to support this product with a similar patch then
> we'll need to be more dynamic.
>
> --
>
> Example B: existing A15 <=r2p0 product that just has to live with crashes
>
> In this case we've got a product where we're going to just accept that
> it crashes sometimes (since this is a fairly hard crash to trigger)
> because it crashes even more when the total latency < 4. In this case
> we don't want to "fix" the errata because that makes things worse.
>
> ...if such a machine were find in the wild (it's possible, I guess?)
> then that's a really good reason not to take this patch or to find
> some way to dynamically enable / disable it.
>
> --
>
> Let me know what you think of the above, or if I'm misunderstanding something...

It's hard to disagree with any of the opinions presented here. We want
to support Exynos5250-based hardware, but at the same time we don't want
to regress any hardware unaffected by mentioned issue...

Let me propose you a solution that comes to my mind (as discussed with
ojn on #armlinux):

To work around the erratum we basically need to ensure that two things
are done:

1) At boot-up the L2 controller configured by R/O firmware need to
be reconfigured to sane values.

2) At resume from sleep mode the same process as in 1) must be done.

As for the first one, it might be handled either by R/W firmware of the
board or some small loader code glued in front of zImage. Second one can
be implemented in SoC-specific resume code
(arch/arm/plat-samsung/s5p-sleep.S), as it's currently done with L2X0 on
Exynos4.

Of course this is scattering the workaround over multiple code bases,
but this way the workaround is executed only on affected platforms
without hurting the unaffected ones. What do you think?

Best regards,
Tomasz

2014-01-29 12:26:57

by Sander

[permalink] [raw]

Subject: Re: [PATCH] arm: Add Arm Erratum 773769 for Large data RAM latency.

Doug Anderson wrote (ao):
> On Wed, Jan 8, 2014 at 1:08 PM, Nicolas Pitre <[email protected]> wrote:
> > On Wed, 8 Jan 2014, Doug Anderson wrote:
> >> * If Joe Upstream wants to run an upstream kernel on some type of
> >> exynos5250 product (Samsung ARM Chromebook, HP Chromebook 11, Nexus 10
> >> are the ones I know of) then he will deal with the small number of
> >> crashes or figure out a solution.
> >
> > If Joe Upstream wants to run an upstream kernel, doesn't he have to
> > unscrew his write protect switch first, at which point the RO firmware
> > can be updated as well?
>
> Actually, no. You can move your device into dev mode and run any
> kernel you want. You'll get an annoying "you're in dev mode" screen
> at every bootup, but otherwise it works just fine.
>
> Going into dev mode requires some special keystrokes at bootup and a
> wipe of your hard disk but no screwdrivers.

And there is http://www.arndaleboard.org/ where you just put an upstream
kernel on sd and boot.

Sander