Date: Tue, 26 Dec 2017 23:41:14 -0500
From: Alexandru Chirvasitu <achirvasub@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        kernel list <linux-kernel@vger.kernel.org>,
        Borislav Petkov <bp@alien8.de>, Brian Gerst <brgerst@gmail.com>,
        Denys Vlasenko <dvlasenk@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
        Josh Poimboeuf <jpoimboe@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@kernel.org>
Subject: Re: PROBLEM: consolidated IDT invalidation causes kexec to reboot
Message-ID: <20171227044114.GD1410@arch-chirva.localdomain>
References: <20171224014415.GA5663@chirva-void>
 <CA+55aFypaeWzKh=wmwdw0PmMsVyP98hts5doC7CMtAH4095KTA@mail.gmail.com>
 <CALCETrWFK0kY7ety3Hsipi3W3Ni-sm2vQ5kbi3N_OhExNrMmHw@mail.gmail.com>
 <20171225212934.GA1410@arch-chirva.localdomain>
 <CA+55aFxWRTsU-ys1BjabhrxoikFnYspePm37UQcsBwC9vD9roA@mail.gmail.com>
 <20171226231900.GB1410@arch-chirva.localdomain>
 <CA+55aFzU8tRSuCoS2TmXjGZ5WRUzMCrG6Jk4eNCQZjP+5sVE9w@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+55aFzU8tRSuCoS2TmXjGZ5WRUzMCrG6Jk4eNCQZjP+5sVE9w@mail.gmail.com>
User-Agent: Mutt/1.9.2 (2017-12-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3374
Lines: 92

Sounds like it's been pinned down then. Just confirming this:

On Tue, Dec 26, 2017 at 06:16:37PM -0800, Linus Torvalds wrote:
> On Tue, Dec 26, 2017 at 3:19 PM, Alexandru Chirvasitu
> <achirvasub@gmail.com> wrote:
> >
> > I went back to the initial problematic commit e802a51 and modified it as you suggest:
> 
> Thank you.
> 
> > This did not work out for me, but now it fails differently. Both
> > (kexec -l + kexec -e) and (kexec -p + echo c > /proc/sysrq-trigger)
> > end in call traces and freezes.
> >
> > It does seem to be tied to idt_invalidate. One of the last things I
> > see on the screen (which is ends up frozen with the computer inactive)
> > is
> >
> > EIP: idt_invalidate+0x6/0x40 SS:ESP: 0068:f6c47cd0
> 
> Yes, interesting, it's the stack canary load access there:
> 
>         mov    %gs:0x14,%edx
> 
> that traps.
> 
> And that actually makes a lot of sense: the load_segments() call just
> above has rloaded all segments with __KERNEL_DS.
> 
> So while the stack canary access *intends* to load it from the magic
> stack canary segment (offset 0x14), we've just reset all segments to
> the standard zero-based full-sized ones, and obviously that will take
> a page fault at 0x14.
> 
> And the reason you now actually *see* the page fault is that we
> haven't completely buggered the CPU state now, so the trap handler
> actually works. With the GDT reset before, it used to take that same
> trap, but now the trap handler itself would fault, and cause a triple
> fault - which resets the machine.
> 
> So it wasn't actually tracing, it was the stack canary all along. So
> at least it's truly root-caused now.
> 
> But the fix is the same: we just can't afford to do any function calls.
> 
> Alternatively, we should just fix that insane "load_segments()". I'm
> not sure why the code insists on reloading the segments in the first
> place.
> 
> So you could try just to remove the "load_segments()" line entirely.
>

This works. The kernel that was showing me the page fault, with the
load_segments line removed from machine_kexec_32, has no issues
now. This is the diff to that prior commit:

--------------------------------------------------------

    remove load_segments line from kexec

diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
index 71bd3c0..7d5e655 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -218,17 +218,6 @@ void machine_kexec(struct kimage *image)
                                                << PAGE_SHIFT);
 
        /*
-        * The segment registers are funny things, they have both a
-        * visible and an invisible part.  Whenever the visible part is
-        * set to a specific selector, the invisible part is loaded
-        * with from a table in memory.  At no other time is the
-        * descriptor table in memory accessed.
-        *
-        * I take advantage of this here by force loading the
-        * segments, before I zap the gdt with an invalid value.
-        */
-       load_segments();
-       /*
         * The gdt & idt are now invalid.
         * If you want to load them you must set up your own idt & gdt.
         */

--------------------------------------------------------

> Thanks for spending the time testing things out,
>

Well, it sure as hell is nothing if not instructive. My pleasure.


Alex