2007-08-12 08:13:11

by Petr Vandrovec

[permalink] [raw]
Subject: [PATCH] Fix triplefault on x86-64 bootup

Hello,
after I upgraded kernel on my box to current git, only thing it did
was rebooting in a loop. After some digging I found that it is silly
to apply alternative to memcpy by using that every same memcpy...
Sorry if it is known bug, I do not see it reported in my LKML mailbox...
Petr

From: Petr Vandrovec <[email protected]>

Do not replace whole memcpy in apply alternatives

apply_alternatives uses memcpy() to apply alternatives. Which has
unfortunate effect that while applying memcpy alternative memcpy itself
it tries to overwrite itself with nops - which causes #UD fault as it
overwrites half of instruction in copy loop, and from this point on only
possible outcome is triplefault and reboot. So let's overwrite only
first two instructions of memcpy - as long as main memcpy loop is
not in first two bytes it will work fine.

Signed-off-by: Petr Vandrovec <[email protected]>

---
commit 7c6f7d582f301e16899a43fd6c775ed103e8a8d2
tree 112d7320e8a21120e4e43f1ac91918ca0352ff56
parent f81e9b8cc6301f6328eafaa553ef35441874b1a4
author Petr Vandrovec <[email protected]> Sun, 12 Aug 2007 01:05:11 -0700
committer Petr Vandrovec <[email protected]> Sun, 12 Aug 2007 01:05:11 -0700

arch/x86_64/lib/memcpy.S | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/lib/memcpy.S b/arch/x86_64/lib/memcpy.S
index 0ea0ddc..c22981f 100644
--- a/arch/x86_64/lib/memcpy.S
+++ b/arch/x86_64/lib/memcpy.S
@@ -124,6 +124,8 @@ ENDPROC(__memcpy)
.quad memcpy
.quad 1b
.byte X86_FEATURE_REP_GOOD
- .byte .Lfinal - memcpy
+ /* Replace only beginning, memcpy is used to apply alternatives, so it
+ * is silly to overwrite itself with nops - reboot is only outcome... */
+ .byte 2b - 1b
.byte 2b - 1b
.previous


2007-08-12 08:41:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix triplefault on x86-64 bootup



On Sun, 12 Aug 2007, Petr Vandrovec wrote:
>
> after I upgraded kernel on my box to current git, only thing it did
> was rebooting in a loop. After some digging I found that it is silly
> to apply alternative to memcpy by using that every same memcpy...
> Sorry if it is known bug, I do not see it reported in my LKML mailbox...

Hmm. Patch looks ok, I just wonder what started triggering this for you?

Linus

2007-08-12 08:59:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix triplefault on x86-64 bootup



On Sun, 12 Aug 2007, Linus Torvalds wrote:
>
> Hmm. Patch looks ok, I just wonder what started triggering this for you?

Oh. It's the "Make patching more robust" commit.

"Robust" my ass.

We used to just copy the replacement in one go (works fine, since it just
overwrote the two first bytes), and then "nop_out()" the rest (works fine,
since it didn't matter for memcpy).

That whole commit looks a bit dubious. It also adds a 254-byte stack
usage (anything actually even close to that big?). Gaah.

Linus

2007-08-12 10:01:44

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix triplefault on x86-64 bootup

On Sunday 12 August 2007 10:40, Linus Torvalds wrote:
> On Sun, 12 Aug 2007, Petr Vandrovec wrote:
> > after I upgraded kernel on my box to current git, only thing it did
> > was rebooting in a loop. After some digging I found that it is silly
> > to apply alternative to memcpy by using that every same memcpy...
> > Sorry if it is known bug, I do not see it reported in my LKML mailbox...
>
> Hmm. Patch looks ok, I just wonder what started triggering this for you?

Can you please just apply the patch series please?
This one is fixed in "Make patching more robust, fix paravirt issue"


-Andi

2007-08-12 10:02:05

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] [PATCH] Fix triplefault on x86-64 bootup

On Sunday 12 August 2007 10:12, Petr Vandrovec wrote:
> Hello,
> after I upgraded kernel on my box to current git, only thing it did
> was rebooting in a loop. After some digging I found that it is silly
> to apply alternative to memcpy by using that every same memcpy...


I already have a patch submitted for this, but so far Linus hasn't applied it.
> Sorry if it is known bug, I do not see it reported in my LKML mailbox...

It was posted in my late merge review series a couple of days ago

-Andi

2007-08-12 10:03:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix triplefault on x86-64 bootup

On Sunday 12 August 2007 10:59, Linus Torvalds wrote:
> On Sun, 12 Aug 2007, Linus Torvalds wrote:
> > Hmm. Patch looks ok, I just wonder what started triggering this for you?
>
> Oh. It's the "Make patching more robust" commit.

No it should have been the earlier text_poke change which first
started using mempcy in that path.

> "Robust" my ass.
>
> We used to just copy the replacement in one go (works fine, since it just
> overwrote the two first bytes), and then "nop_out()" the rest (works fine,
> since it didn't matter for memcpy).
>
> That whole commit looks a bit dubious. It also adds a 254-byte stack
> usage (anything actually even close to that big?). Gaah.

The x86-64 copy_user copies the whole function as alternative
and it is over 200 bytes.
The original patch had a smaller buffer and cause x86-64 to BUG
at boot

-Andi


2007-08-12 10:12:19

by Andi Kleen

[permalink] [raw]
Subject: Please remove ab144f5ec64c42218a555ec1dbde6b60cf2982d6 was Re: [discuss] [PATCH] Fix triplefault on x86-64 bootup

On Sunday 12 August 2007 10:12, Petr Vandrovec wrote:
> Hello,
> after I upgraded kernel on my box to current git, only thing it did
> was rebooting in a loop. After some digging I found that it is silly
> to apply alternative to memcpy by using that every same memcpy...
> Sorry if it is known bug, I do not see it reported in my LKML mailbox...

Ok Linus already applied your patch. Even though it's a really
bad fragile hack, not better than the old bug.

Petr are you double sure you really tested with
ab144f5ec64c42218a555ec1dbde6b60cf2982d6
already applied? I bet not -- it is the symptom exactly fixed by this patch
(although

Linus, I would prefer if you reverted
b8d3f2448b8f4ba24f301e23585547ba1acc1f04
again -- it should really not be needed with
ab144f5ec64c42218a555ec1dbde6b60cf2982d6

And I really dislike Petr's patch because while it might work
today (I'm not 100% sure it actually works to only replace
2 bytes) if we change memcpy ever it'll likely cause strange
problems again.

-Andi

2007-08-12 10:46:21

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH] Fix triplefault on x86-64 bootup

On Sun, Aug 12, 2007 at 10:12:52AM +0200, Petr Vandrovec wrote:
> --- a/arch/x86_64/lib/memcpy.S
> +++ b/arch/x86_64/lib/memcpy.S
> @@ -124,6 +124,8 @@ ENDPROC(__memcpy)
> .quad memcpy
> .quad 1b
> .byte X86_FEATURE_REP_GOOD
> - .byte .Lfinal - memcpy
> + /* Replace only beginning, memcpy is used to apply alternatives, so it
> + * is silly to overwrite itself with nops - reboot is only outcome... */
> + .byte 2b - 1b
> .byte 2b - 1b
> .previous

Thanks, this fixes E6400 reboots on bootup here.

2007-08-12 10:49:59

by Petr Vandrovec

[permalink] [raw]
Subject: Re: Please remove ab144f5ec64c42218a555ec1dbde6b60cf2982d6 was Re: [discuss] [PATCH] Fix triplefault on x86-64 bootup

Andi Kleen wrote:
> On Sunday 12 August 2007 10:12, Petr Vandrovec wrote:
>> Hello,
>> after I upgraded kernel on my box to current git, only thing it did
>> was rebooting in a loop. After some digging I found that it is silly
>> to apply alternative to memcpy by using that every same memcpy...
>> Sorry if it is known bug, I do not see it reported in my LKML mailbox...
>
> Ok Linus already applied your patch. Even though it's a really
> bad fragile hack, not better than the old bug.
>
> Petr are you double sure you really tested with
> ab144f5ec64c42218a555ec1dbde6b60cf2982d6
> already applied? I bet not -- it is the symptom exactly fixed by this patch

I'm quite sure that this patch is in my tree, as I have that "u8 *instr
= a->instr;" in apply_alternatives, and it seems that this one was added
by checkin you mention... My tree was synced up to:

Commit: 3dab307e527f2a9bbb4f9d00240374bb93d1945f
Author: Chuck Ebbert <[email protected]> Fri, 10 Aug 2007 22:31:11 +0200

which as far as I can tell really *is* after your fix. I'm quite sure
that I did not hit any BUG_ON() or anything like that - when patching
got to memcpy alternative, it entered text_poke(), and instead of
returning to caller it returned to BIOS :-(

> (although
>
> Linus, I would prefer if you reverted
> b8d3f2448b8f4ba24f301e23585547ba1acc1f04
> again -- it should really not be needed with
> ab144f5ec64c42218a555ec1dbde6b60cf2982d6
>
> And I really dislike Petr's patch because while it might work
> today (I'm not 100% sure it actually works to only replace
> 2 bytes) if we change memcpy ever it'll likely cause strange
> problems again.

It does not actually change two bytes - it changes two bytes now because
alternative is two bytes long - it makes no sense to replace whole
function with NOPs - it is necessary when you fall through that
function, but for this (and other x86-64 alternatives) it makes no sense
to replace whole function with nops if first instruction in alternative
is jump - then you need to only put that jump in place.
Petr


2007-08-12 12:59:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix triplefault on x86-64 bootup

Alexey Dobriyan <[email protected]> writes:

> On Sun, Aug 12, 2007 at 10:12:52AM +0200, Petr Vandrovec wrote:
> > --- a/arch/x86_64/lib/memcpy.S
> > +++ b/arch/x86_64/lib/memcpy.S
> > @@ -124,6 +124,8 @@ ENDPROC(__memcpy)
> > .quad memcpy
> > .quad 1b
> > .byte X86_FEATURE_REP_GOOD
> > - .byte .Lfinal - memcpy
> > + /* Replace only beginning, memcpy is used to apply alternatives, so it
> > + * is silly to overwrite itself with nops - reboot is only outcome... */
> > + .byte 2b - 1b
> > .byte 2b - 1b
> > .previous
>
> Thanks, this fixes E6400 reboots on bootup here.

Please test with only ab144f5ec64c42218a555ec1dbde6b60cf2982d6
applied (or rather a few patches back in the git history)
I bet that fixes it already.

-Andi

2007-08-12 13:01:46

by Andi Kleen

[permalink] [raw]
Subject: Re: Please remove ab144f5ec64c42218a555ec1dbde6b60cf2982d6 was Re: [discuss] [PATCH] Fix triplefault on x86-64 bootup

Petr Vandrovec <[email protected]> writes:

> Andi Kleen wrote:
> > On Sunday 12 August 2007 10:12, Petr Vandrovec wrote:
> >> Hello,
> >> after I upgraded kernel on my box to current git, only thing it did
> >> was rebooting in a loop. After some digging I found that it is silly
> >> to apply alternative to memcpy by using that every same memcpy...
> >> Sorry if it is known bug, I do not see it reported in my LKML mailbox...
> > Ok Linus already applied your patch. Even though it's a really bad
> > fragile hack, not better than the old bug.
> > Petr are you double sure you really tested with
> > ab144f5ec64c42218a555ec1dbde6b60cf2982d6
> > already applied? I bet not -- it is the symptom exactly fixed by this patch
>
> I'm quite sure that this patch is in my tree, as I have that "u8
> *instr = a->instr;" in apply_alternatives, and it seems that this one
> was added by checkin you mention... My tree was synced up to:

Can you double check? I have a hard time believing it.

> It does not actually change two bytes - it changes two bytes now
> because alternative is two bytes long - it makes no sense to replace
> whole function with NOPs - it is necessary when you fall through that

It saves the jump. Admittedly not a big advantage.

-Andi

2007-08-12 13:12:43

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH] Fix triplefault on x86-64 bootup

On Sun, Aug 12, 2007 at 03:53:57PM +0200, Andi Kleen wrote:
> Alexey Dobriyan <[email protected]> writes:
>
> > On Sun, Aug 12, 2007 at 10:12:52AM +0200, Petr Vandrovec wrote:
> > > --- a/arch/x86_64/lib/memcpy.S
> > > +++ b/arch/x86_64/lib/memcpy.S
> > > @@ -124,6 +124,8 @@ ENDPROC(__memcpy)
> > > .quad memcpy
> > > .quad 1b
> > > .byte X86_FEATURE_REP_GOOD
> > > - .byte .Lfinal - memcpy
> > > + /* Replace only beginning, memcpy is used to apply alternatives, so it
> > > + * is silly to overwrite itself with nops - reboot is only outcome... */
> > > + .byte 2b - 1b
> > > .byte 2b - 1b
> > > .previous
> >
> > Thanks, this fixes E6400 reboots on bootup here.
>
> Please test with only ab144f5ec64c42218a555ec1dbde6b60cf2982d6
> applied (or rather a few patches back in the git history)
> I bet that fixes it already.

Sorry?
/me pulls
/me recompiles
/me sees reboot on boot
/me goes in short unprejudiced bisect session
bisections points to "i386: Make patching more robust, fix paravirt issue"
/me sees Petr's patch
above patch fixes problem

2007-08-12 13:47:52

by Paolo Ornati

[permalink] [raw]
Subject: Re: Please remove ab144f5ec64c42218a555ec1dbde6b60cf2982d6 was Re: [discuss] [PATCH] Fix triplefault on x86-64 bootup

On 12 Aug 2007 15:55:50 +0200
Andi Kleen <[email protected]> wrote:

> > > Ok Linus already applied your patch. Even though it's a really bad
> > > fragile hack, not better than the old bug.
> > > Petr are you double sure you really tested with
> > > ab144f5ec64c42218a555ec1dbde6b60cf2982d6
> > > already applied? I bet not -- it is the symptom exactly fixed by this patch
> >
> > I'm quite sure that this patch is in my tree, as I have that "u8
> > *instr = a->instr;" in apply_alternatives, and it seems that this one
> > was added by checkin you mention... My tree was synced up to:
>
> Can you double check? I have a hard time believing it.

I've seen the reboot too, with "rc2-g3864e8cc" that is newer than
ab144f5ec64c4:

$ git-log ab144f5ec64c4..| grep 3864e8cc
commit 3864e8ccbba1dcdea87398ab80fdc8ae0fab7c45

--
Paolo Ornati
Linux 2.6.23-rc2-gac078602 on x86_64

2007-08-12 17:56:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix triplefault on x86-64 bootup



On Sun, 12 Aug 2007, Andi Kleen wrote:
>
> Can you please just apply the patch series please?
> This one is fixed in "Make patching more robust, fix paravirt issue"

I *had* applied the patch series, Andi. And that one will do

memcpy(addr, opcode, len);

even though "addr" itself may be all of memcpy().

So the code was buggy, and Petr fixed it, and your patch-series didn't fix
*anything*. In fact, from what I can tell, it's the one that introduced
the problem, because before thatone hit, we'd always do the two-byte
sequence on its own (since the "nop_out()" on the rest of the memcpy would
be called later, and separately, once it didn't matter any more).

So out of 12 patches, 2 caused machines to not even boot. Not good.

Linus

2007-08-12 17:57:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: Please remove ab144f5ec64c42218a555ec1dbde6b60cf2982d6 was Re: [discuss] [PATCH] Fix triplefault on x86-64 bootup



On Sun, 12 Aug 2007, Andi Kleen wrote:
>
> Can you double check? I have a hard time believing it.

Andi, read the code. The old code was fine. Your patch was the culprit.

Linus