MIME-Version: 1.0
In-Reply-To: <CA+55aFxxJ3aGmu+ELGZf_+OpDM1vcu4d9=nEaodrKPGPtNV-5w@mail.gmail.com>
References: <001a1145e8548cbd3d055f73374f@google.com> <alpine.DEB.2.20.1712141807400.4998@nanos>
 <CA+55aFzLw629nbk0GL=9=x3sjkxkKjiVW=mL6Pjm7i2vTLwyVw@mail.gmail.com>
 <CALCETrVRq9OKYu+rbvKeuY+pD14X8etW3hzVxJqznzH9T_PvMg@mail.gmail.com>
 <CA+55aFw7JJjdmG0R9Vwz8NLOS9npDfyLd8zJ=TP68JufUoph1A@mail.gmail.com>
 <CALCETrWgfgewYk1tkBE6cLqF-mVhRNOJm9_Z45_shsiiNJc1Fw@mail.gmail.com> <CA+55aFxxJ3aGmu+ELGZf_+OpDM1vcu4d9=nEaodrKPGPtNV-5w@mail.gmail.com>
From: Dmitry Vyukov <dvyukov@google.com>
Date: Fri, 15 Dec 2017 10:07:50 +0100
Message-ID: <CACT4Y+YBMFq0JdwF2K7xfGC7sPxdJOzj9V2TPEKGuJgqaknu9A@mail.gmail.com>
Subject: Re: BUG: unable to handle kernel paging request in __switch_to
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        syzbot 
        <bot+1f445b1009b8eeededa30fe62ccf685f2ec9d155@syzkaller.appspotmail.com>,
        Borislav Petkov <bp@suse.de>, Dmitry Safonov <dsafonov@virtuozzo.com>,
        Peter Anvin <hpa@zytor.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Kyle Huey <me@kylehuey.com>, Ingo Molnar <mingo@redhat.com>,
        syzkaller-bugs@googlegroups.com,
        "the arch/x86 maintainers" <x86@kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3552
Lines: 81

On Thu, Dec 14, 2017 at 10:39 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> I don't think that's the case. "int3" is entirely synchronous, and
>>> doesn't have the same odd issues as a breakpoint trap (which honors RF
>>> etc). It's literally just a one-byte shorthand for "int $3".
>>
>> The SDM says precisely the same thing about INT N, so, whichever way
>> you dice it, int3 is a benign exception.
>
> That just means that it doesn't double-fault when it takes the page fault.
>
> Which we already know, because we see a page fault, not a double fault.
>
>> 0xfffffffffffffff8 is *exactly* where the fault would be if the
>> microcoded push of SS faulted if the IST contained zeros.
>
> Yes, I suspect it's the stack that is buggered for some reason.
>
>>> Plus I think the instruction that gets overwritten is just a 5-byte
>>> nop isn't it? So it really shouldn't take a fault without the "int3"
>>> overwriting.
>>
>> Unless it was being overwritten the other way and the oops hit while
>> tracing was being turned *off*.
>
> Doesn't really matter. The two forms of that instruction are "5-byte
> nop" and "unconditional branch".
>
> Neither of them will write to anything - the only page fault they
> could take is for instruction fetch.
>
> So it really must be the "int3" that fails. Unless we're looking at
> some odd CPU errata, which sounds very very unlikely.

FTR the commit is:

commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
Author: Stephen Rothwell <sfr@canb.auug.org.au>
Date:   Wed Nov 29 14:09:56 2017 +1100
    Add linux-next specific files for 20171129

You can get it from
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git
Compiler is this: https://storage.googleapis.com/syzkaller/gcc-7.tar.gz
Config was attached.

I've built this exact kernel and here is __switch_to disasm:
https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt

__switch_to+0x95b seems to point to (?):

ffffffff81252f6b: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)

which is branch target alignment nop.

We have a bunch of semi-similar non-sense crashes on syzbot:

https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/9nMSJo9jmGs/tkRYgZ-XAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/04-q4OZrerA/XfYdNnWXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/6iC6rPtAHKQ/UiZ4fnWXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/2zSDbzRIH_k/SLCMqmeXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/uEsjx8VISco/Mwu_pbGWAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/kZ6Z7UQLbCQ/JHpjTGeXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/UjYsJxiGxwU/mponQq2XAwAJ

Lots of them are on 0xfffffffffffffff8 address.

I have some suspicion towards KVM. Potentially a nested KVM messed
host processor state (CRn or page tables) so that then we get these
weird crashes.

One question: how would triple-fault look like? I am asking because we
have hundreds of cases where kernel just starts silently rebooting
while running some unprivileged syscalls:
https://groups.google.com/forum/#!msg/syzkaller-bugs/w8dkVNrgzrc/4mLJLOAbCgAJ
Can these be triple faults? Reproducer for that one also seems to be
related to KVM.