On Sat, Dec 28, 2013 at 1:05 PM, Olof Johansson <[email protected]> wrote:
> Sigh, it's not this after all. I did a clean build with this applied
> and still see failures. Something else is (also?) going on here.
Ok, so after some more digging I actually think that this isn't about
the new code added as much as it is about having more code in low
memory.
Before, there were only two instuctions in __start:
b .__start_initialization_multiplatform
trap
Now, there's a whole bunch:
c000000000000000 <.__start>:
c000000000000000: 08 00 00 48 tdi 0,r0,72
c000000000000004: 48 00 00 24 b c000000000000028 <.__start+0x28>
c000000000000008: 05 00 9f 42 .long 0x5009f42
c00000000000000c: a6 02 48 7d lhzu r16,18557(r2)
c000000000000010: 1c 00 4a 39 mulli r0,r0,19001
c000000000000014: a6 00 60 7d lhzu r16,24701(0)
c000000000000018: 01 00 6b 69 .long 0x1006b69
c00000000000001c: a6 03 5a 7d lhzu r16,23165(r3)
c000000000000020: a6 03 7b 7d lhzu r16,31613(r3)
c000000000000024: 24 00 00 4c dozi r0,r0,76
c000000000000028: 48 00 95 84 b c0000000000095ac
<.__start_initialization_multiplatform>
c00000000000002c: 7f e0 00 08 trap
And indeed, by replacing some of the LE hand-converted code with 0x0,
it seems that what's really making things blow up here is that 0x8-0xc
contain something else than 0x0.
Where/why this comes from I'm less certain of -- and since I seem to
no longer have a usable JTAG setup, I can't break in and see where the
code gets stuck and call paths, etc. So it's pure speculation, but I'm
guessing it's a null pointer dereference somewhere with a chained
pointer as the second member in a struct, i.e. with NULL the stray
null ptr deref does no harm.
Since it doesn't seem to impact pSeries, there's a chance that the bug
is in firmware, not in the kernel, since this seems to happen during
fairly early boot, i.e. possibly while grabbing the DT contents out.
This makes things interesting though. The BE/LE trampoline code
assumes at least 3 consecutive instructions. What was the reasoning
behind entering the kernel LE instead of keeping the old boot protocol
and just switching to LE once kernel is loaded? Is it actually used on
some platforms or is this just a theoretical thing?
-Olof
On Thu, Jan 02, 2014 at 11:56:04PM -0800, Olof Johansson wrote:
> This makes things interesting though. The BE/LE trampoline code
> assumes at least 3 consecutive instructions. What was the reasoning
> behind entering the kernel LE instead of keeping the old boot protocol
> and just switching to LE once kernel is loaded? Is it actually used on
> some platforms or is this just a theoretical thing?
Actually, adding a little hack that zeroes out the memory once we're done
executing it will work just fine too. I know this is sort of icky, but maybe
it'll be good enough for now?
Of course, main worry is that this is just hiding some latent NULL deref in
the kernel now... :-/
-Olof
--- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< ---
>From 4d003186cae546900cefc9e51b0ed4e65f775be1 Mon Sep 17 00:00:00 2001
From: Olof Johansson <[email protected]>
Date: Fri, 3 Jan 2014 00:09:28 -0800
Subject: [PATCH] powerpc: set some low memory contents to 0 early
The little-endian code adds some code path to __start, which essentially ends
up adding memory contents in low memory that didn't use to be there.
That seems to have triggered a latent bug, either in firmware or kernel, where
the 64-bit word located at physical address 8 needs to be 0.
The simple hack for this right now is to write it to 0 after we're done
executing it, which is what this patch does. Unfortunately I no longer
seem to have a working JTAG setup nor firmware sources, so debugging
this down to root cause might be more trouble than it's worth given the
relatively simple workaround.
Signed-off-by: Olof Johansson <[email protected]>
---
arch/powerpc/kernel/head_64.S | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 2ae41ab..437d8bd 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -69,6 +69,13 @@ _GLOBAL(__start)
/* NOP this out unconditionally */
BEGIN_FTR_SECTION
FIXUP_ENDIAN
+ /* Hack for PWRficient platforms: Due to CFE(?) bug, the 64-bit
+ * word at 0x8 needs to be set to 0. Patch it up here once we're
+ * done executing it (we can be lazy and avoid invalidating
+ * icache)
+ */
+ li r0,0
+ std 0,8(0)
b .__start_initialization_multiplatform
END_FTR_SECTION(0, 1)
--
1.7.10.4
On Fri, 2014-01-03 at 00:12 -0800, Olof Johansson wrote:
> On Thu, Jan 02, 2014 at 11:56:04PM -0800, Olof Johansson wrote:
>
> > This makes things interesting though. The BE/LE trampoline code
> > assumes at least 3 consecutive instructions. What was the reasoning
> > behind entering the kernel LE instead of keeping the old boot protocol
> > and just switching to LE once kernel is loaded? Is it actually used on
> > some platforms or is this just a theoretical thing?
>
> Actually, adding a little hack that zeroes out the memory once we're done
> executing it will work just fine too. I know this is sort of icky, but maybe
> it'll be good enough for now?
>
> Of course, main worry is that this is just hiding some latent NULL deref in
> the kernel now... :-/
Wow, that would have to come close to winning the grossest-hack-in-arch-powerpc
award :)
Have you tried changing the value at 8 to point to a reserved page?
Some other possibilities:
* Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would mean
you'd only be able to boot pasemi_defconfig.
* Move the hack into FIXUP_ENDIAN
cheers
On Wed, 2014-01-08 at 15:09 +1100, Michael Ellerman wrote:
> > Of course, main worry is that this is just hiding some latent NULL
> deref in
> > the kernel now... :-/
>
> Wow, that would have to come close to winning the
> grossest-hack-in-arch-powerpc
> award :)
>
> Have you tried changing the value at 8 to point to a reserved page?
>
> Some other possibilities:
>
> * Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would
> mean
> you'd only be able to boot pasemi_defconfig.
> * Move the hack into FIXUP_ENDIAN
We actually found the root cause on irc the other day, I was waiting for
Olof to send a fix :-)
Olof: Can you try this totally untested patch ?
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1986,8 +1986,6 @@ static void __init prom_init_stdout(void)
/* Get the full OF pathname of the stdout device */
memset(path, 0, 256);
call_prom("instance-to-path", 3, 1, prom.stdout, path, 255);
- stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
- val = cpu_to_be32(stdout_node);
prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
&val, sizeof(val));
prom_printf("OF stdout device is: %s\n", of_stdout_device);
@@ -1995,10 +1993,14 @@ static void __init prom_init_stdout(void)
path, strlen(path) + 1);
/* If it's a display, note it */
- memset(type, 0, sizeof(type));
- prom_getprop(stdout_node, "device_type", type, sizeof(type));
- if (strcmp(type, "display") == 0)
- prom_setprop(stdout_node, path, "linux,boot-display", NULL, 0);
+ stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
+ if (stdout_node != PROM_ERROR) {
+ val = cpu_to_be32(stdout_node);
+ memset(type, 0, sizeof(type));
+ prom_getprop(stdout_node, "device_type", type, sizeof(type));
+ if (strcmp(type, "display") == 0)
+ prom_setprop(stdout_node, path, "linux,boot-display", NU
+ }
}
On Wed, Jan 08, 2014 at 03:18:26PM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2014-01-08 at 15:09 +1100, Michael Ellerman wrote:
> > > Of course, main worry is that this is just hiding some latent NULL
> > deref in
> > > the kernel now... :-/
> >
> > Wow, that would have to come close to winning the
> > grossest-hack-in-arch-powerpc
> > award :)
> >
> > Have you tried changing the value at 8 to point to a reserved page?
> >
> > Some other possibilities:
> >
> > * Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would
> > mean
> > you'd only be able to boot pasemi_defconfig.
No thanks -- this went uncaught because that used to be all I booted
(and for some random reason it didn't trigger in that case).
> > * Move the hack into FIXUP_ENDIAN
>
> We actually found the root cause on irc the other day, I was waiting for
> Olof to send a fix :-)
Yeah, I'm low on spare time these days, in particular spare time to spend on
ppc stuff. :-(
> Olof: Can you try this totally untested patch ?
With one fixup below:
Tested-by: Olof Johansson <[email protected]>
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -1986,8 +1986,6 @@ static void __init prom_init_stdout(void)
> /* Get the full OF pathname of the stdout device */
> memset(path, 0, 256);
> call_prom("instance-to-path", 3, 1, prom.stdout, path, 255);
> - stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
> - val = cpu_to_be32(stdout_node);
> prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
> &val, sizeof(val));
> prom_printf("OF stdout device is: %s\n", of_stdout_device);
> @@ -1995,10 +1993,14 @@ static void __init prom_init_stdout(void)
> path, strlen(path) + 1);
>
> /* If it's a display, note it */
> - memset(type, 0, sizeof(type));
> - prom_getprop(stdout_node, "device_type", type, sizeof(type));
> - if (strcmp(type, "display") == 0)
> - prom_setprop(stdout_node, path, "linux,boot-display", NULL, 0);
> + stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
> + if (stdout_node != PROM_ERROR) {
> + val = cpu_to_be32(stdout_node);
> + memset(type, 0, sizeof(type));
> + prom_getprop(stdout_node, "device_type", type, sizeof(type));
> + if (strcmp(type, "display") == 0)
> + prom_setprop(stdout_node, path, "linux,boot-display", NU
Line is cut off, this needs "NULL, 0);" at the end.
-Olof
On Wed, 2014-01-08 at 09:48 -0800, Olof Johansson wrote:
> > /* If it's a display, note it */
> > - memset(type, 0, sizeof(type));
> > - prom_getprop(stdout_node, "device_type", type, sizeof(type));
> > - if (strcmp(type, "display") == 0)
> > - prom_setprop(stdout_node, path, "linux,boot-display", NULL, 0);
> > + stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
> > + if (stdout_node != PROM_ERROR) {
> > + val = cpu_to_be32(stdout_node);
> > + memset(type, 0, sizeof(type));
> > + prom_getprop(stdout_node, "device_type", type, sizeof(type));
> > + if (strcmp(type, "display") == 0)
> > + prom_setprop(stdout_node, path, "linux,boot-display", NU
>
> Line is cut off, this needs "NULL, 0);" at the end.
Right, copy/paste failure :-)
Thanks, I'll try to get that to Linus before he cuts .13, otherwise it
will be -stable.
Cheers,
Ben.