2007-08-12 09:49:21

by Dan Merillat

[permalink] [raw]
Subject: reset during bootup - solved

On 8/11/07, Dan Merillat <[email protected]> wrote:
> This one is going to be fun, since it's a hard reset back to bios, no
> OOPS or anything shown. It may be about the time RadeonFB kicks in,
> but it's impossible to tell. I'd guess 15-20 lines into dmesg.
>
> I'm in the process of bisecting, currently 94c18227..d23cf676.
>
> Any guesses of a specific patch to check?

Except it's not d23cf676, that was the version I was running while
building HEAD, whoops.
I'm used to monotonic version numbers, not SHA hashes. I'll keep
better track next time.

For completeness, the commit that caused boot to fail was:

commit ab144f5ec64c42218a555ec1dbde6b60cf2982d6
Author: Andi Kleen <[email protected]>
Date: Fri Aug 10 22:31:03 2007 +0200

i386: Make patching more robust, fix paravirt issue

Except I'm using x86_64? So not sure why that one rev kills me.
ACPI? It dies right after the CPU is identified:
CPU: AMD Athlon(tm) 64 Processor 3000+ stepping 00
ACPI: Core revision 20070126

I don't think that second line is printed on the crash.

However, a git-pull at 2am (b8d3f244) fixed it. Not sure where in
there it started working again, I can bisect backwards to find out if
need be.


2007-08-12 11:09:31

by Paolo Ornati

[permalink] [raw]
Subject: Re: reset during bootup - solved

On Sun, 12 Aug 2007 05:44:36 -0400
"Dan Merillat" <[email protected]> wrote:

> However, a git-pull at 2am (b8d3f244) fixed it. Not sure where in
> there it started working again, I can bisect backwards to find out if
> need be.

I think it is was fixed by this one:

----

Do not replace whole memcpy in apply alternatives

apply_alternatives uses memcpy() to apply alternatives. Which has the
unfortunate effect that while applying memcpy alternative to memcpy
itself it tries to overwrite itself with nops - which causes #UD fault
as it overwrites half of an instruction in copy loop, and from this
point on only possible outcome is triplefault and reboot.

b8d3f2448b8f4ba24f301e23585547ba1acc1f04
arch/x86_64/lib/memcpy.S | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/lib/memcpy.S b/arch/x86_64/lib/memcpy.S
index 0ea0ddc..c22981f 100644
--- a/arch/x86_64/lib/memcpy.S
+++ b/arch/x86_64/lib/memcpy.S
@@ -124,6 +124,8 @@ ENDPROC(__memcpy)
.quad memcpy
.quad 1b
.byte X86_FEATURE_REP_GOOD
- .byte .Lfinal - memcpy
+ /* Replace only beginning, memcpy is used to apply alternatives, so it
+ * is silly to overwrite itself with nops - reboot is only outcome... */
+ .byte 2b - 1b
.byte 2b - 1b
.previous

--
Paolo Ornati
Linux 2.6.23-rc2-gac078602 on x86_64

2007-08-12 13:09:23

by Andi Kleen

[permalink] [raw]
Subject: Re: reset during bootup - solved

"Dan Merillat" <[email protected]> writes:

> On 8/11/07, Dan Merillat <[email protected]> wrote:
> > This one is going to be fun, since it's a hard reset back to bios, no
> > OOPS or anything shown. It may be about the time RadeonFB kicks in,
> > but it's impossible to tell. I'd guess 15-20 lines into dmesg.
> >
> > I'm in the process of bisecting, currently 94c18227..d23cf676.
> >
> > Any guesses of a specific patch to check?
>
> Except it's not d23cf676, that was the version I was running while
> building HEAD, whoops.
> I'm used to monotonic version numbers, not SHA hashes. I'll keep
> better track next time.
>
> For completeness, the commit that caused boot to fail was:
>
> commit ab144f5ec64c42218a555ec1dbde6b60cf2982d6
> Author: Andi Kleen <[email protected]>

BTW the patch was actually from Rusty, not me, just the From
line got lost somewhere. I did a few changes though because
the initial version didn't work on x86-64 at all. Submitted
one worked for me.

> Date: Fri Aug 10 22:31:03 2007 +0200
>
> i386: Make patching more robust, fix paravirt issue
>
> Except I'm using x86_64? So not sure why that one rev kills me.

The code is used on x86-64 too, but the script that generates
the prefixes doesn't know this (perhaps I should drop them)

> ACPI? It dies right after the CPU is identified:
> CPU: AMD Athlon(tm) 64 Processor 3000+ stepping 00
> ACPI: Core revision 20070126

Ok that settles it.

However I must say I'm still dubious of Petr's patch. If there
is something wrong with alternative() then it must be fixed
in alternative() not the call sites. Otherwise it could hit
other patch sites too. There are other copies who
use a similar patching pattern.

Can people who see a failure please send me .config and
arch/x86_64/lib/memcpy.o privately ?

-Andi