DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D68AB219AD
MIME-Version: 1.0
In-Reply-To: <CA+55aFw7JJjdmG0R9Vwz8NLOS9npDfyLd8zJ=TP68JufUoph1A@mail.gmail.com>
References: <001a1145e8548cbd3d055f73374f@google.com> <alpine.DEB.2.20.1712141807400.4998@nanos>
 <CA+55aFzLw629nbk0GL=9=x3sjkxkKjiVW=mL6Pjm7i2vTLwyVw@mail.gmail.com>
 <CALCETrVRq9OKYu+rbvKeuY+pD14X8etW3hzVxJqznzH9T_PvMg@mail.gmail.com> <CA+55aFw7JJjdmG0R9Vwz8NLOS9npDfyLd8zJ=TP68JufUoph1A@mail.gmail.com>
From: Andy Lutomirski <luto@kernel.org>
Date: Thu, 14 Dec 2017 13:27:36 -0800
Message-ID: <CALCETrWgfgewYk1tkBE6cLqF-mVhRNOJm9_Z45_shsiiNJc1Fw@mail.gmail.com>
Subject: Re: BUG: unable to handle kernel paging request in __switch_to
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        syzbot 
        <bot+1f445b1009b8eeededa30fe62ccf685f2ec9d155@syzkaller.appspotmail.com>,
        Borislav Petkov <bp@suse.de>, Dmitry Safonov <dsafonov@virtuozzo.com>,
        Peter Anvin <hpa@zytor.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Kyle Huey <me@kylehuey.com>, Ingo Molnar <mingo@redhat.com>,
        syzkaller-bugs@googlegroups.com,
        "the arch/x86 maintainers" <x86@kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2586
Lines: 59

On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 10:54 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> 2. It actually tries to handle the breakpoint.  A breakpoint is a
>> benign exception, so any exception encountered while delivering it
>> would result in serial delivery.
>
> I don't think that's the case. "int3" is entirely synchronous, and
> doesn't have the same odd issues as a breakpoint trap (which honors RF
> etc). It's literally just a one-byte shorthand for "int $3".
>

The SDM says precisely the same thing about INT N, so, whichever way
you dice it, int3 is a benign exception.

> There should be no serial delivery, although obviously if it's a trap
> gate (as opposed to an interrupt gate), you can get a normal external
> interrupt on the first instruction of the exception handler.
>
> But that's not what the oops says: it says it happens on the "int3" instruction.
>
> Now, it is possible that the "int3" was written _after_ the CPU took a
> real page fault on the original instruction, and that the original
> instruction actually caused a perfectly normal page fault, and then we
> just report the "int3" because another CPU overwrote the instruction
> after the original instruction had already trapped.
>
> But that makes very little sense either. I really do think it's the
> "int3" itself that causes the page fault due to some IDT/GDT change.
> Because that would actually make sense considering what has changed in
> the tree that Thomas is running.

I still have trouble figuring what IDT or GDT error would cause a page
fault and not a double-fault or triple-fault.  So I like my
bogus-IST-in-the-TSS theory more, even if I have no idea how it would
happen.  Entry stack underflow?  Overflow of whatever is mapped just
above the TSS in that kernel?  Some kind of fuckup where ioperm()
overwrote the IST?  (I tested that, but who knows?  This is a fuzz
test, after all.)

0xfffffffffffffff8 is *exactly* where the fault would be if the
microcoded push of SS faulted if the IST contained zeros.

Hmm.  There is another way that could happen.  If the IDT ended up
with the wrong IST entry, we could get the same failure.  But I don't
see how that would happen either.

Maybe it's the bloody debug_idt thing blowing up?

>
> Plus I think the instruction that gets overwritten is just a 5-byte
> nop isn't it? So it really shouldn't take a fault without the "int3"
> overwriting.

Unless it was being overwritten the other way and the oops hit while
tracing was being turned *off*.

--Andy