MIME-Version: 1.0
In-Reply-To: <CACT4Y+buFtqoO6KF0ZW1HQTMC3rVZ=_0FQq826TmWLZyxkPcwA@mail.gmail.com>
References: <001a1145e8548cbd3d055f73374f@google.com> <alpine.DEB.2.20.1712141807400.4998@nanos>
 <CA+55aFzLw629nbk0GL=9=x3sjkxkKjiVW=mL6Pjm7i2vTLwyVw@mail.gmail.com>
 <CALCETrVRq9OKYu+rbvKeuY+pD14X8etW3hzVxJqznzH9T_PvMg@mail.gmail.com>
 <CA+55aFw7JJjdmG0R9Vwz8NLOS9npDfyLd8zJ=TP68JufUoph1A@mail.gmail.com>
 <CALCETrWgfgewYk1tkBE6cLqF-mVhRNOJm9_Z45_shsiiNJc1Fw@mail.gmail.com>
 <CA+55aFxxJ3aGmu+ELGZf_+OpDM1vcu4d9=nEaodrKPGPtNV-5w@mail.gmail.com>
 <CACT4Y+YBMFq0JdwF2K7xfGC7sPxdJOzj9V2TPEKGuJgqaknu9A@mail.gmail.com>
 <CACT4Y+bMCBv_eEm7gPAh4M-oVsgf1c-TxDoH-TbBBty5_xkiWA@mail.gmail.com> <CACT4Y+buFtqoO6KF0ZW1HQTMC3rVZ=_0FQq826TmWLZyxkPcwA@mail.gmail.com>
From: Wanpeng Li <kernellwp@gmail.com>
Date: Fri, 15 Dec 2017 17:40:35 +0800
Message-ID: <CANRm+CwGr2kciYYjJ7wC+iOhBwfDtw6MVO4Wn2=7eKOmcCOVdw@mail.gmail.com>
Subject: Re: BUG: unable to handle kernel paging request in __switch_to
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        syzbot 
        <bot+1f445b1009b8eeededa30fe62ccf685f2ec9d155@syzkaller.appspotmail.com>,
        Borislav Petkov <bp@suse.de>, Dmitry Safonov <dsafonov@virtuozzo.com>,
        Peter Anvin <hpa@zytor.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Kyle Huey <me@kylehuey.com>, Ingo Molnar <mingo@redhat.com>,
        syzkaller-bugs@googlegroups.com,
        "the arch/x86 maintainers" <x86@kernel.org>,
        Paolo Bonzini <pbonzini@redhat.com>,
        =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>,
        KVM list <kvm@vger.kernel.org>, "Lan, Tianyu" <tianyu.lan@intel.com>,
        James Mattson <jmattson@google.com>,
        David Hildenbrand <david@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 7846
Lines: 182

2017-12-15 17:38 GMT+08:00 Dmitry Vyukov <dvyukov@google.com>:
> On Fri, Dec 15, 2017 at 10:13 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Fri, Dec 15, 2017 at 10:07 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Thu, Dec 14, 2017 at 10:39 PM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>> On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
>>>>> <torvalds@linux-foundation.org> wrote:
>>>>>> I don't think that's the case. "int3" is entirely synchronous, and
>>>>>> doesn't have the same odd issues as a breakpoint trap (which honors RF
>>>>>> etc). It's literally just a one-byte shorthand for "int $3".
>>>>>
>>>>> The SDM says precisely the same thing about INT N, so, whichever way
>>>>> you dice it, int3 is a benign exception.
>>>>
>>>> That just means that it doesn't double-fault when it takes the page fault.
>>>>
>>>> Which we already know, because we see a page fault, not a double fault.
>>>>
>>>>> 0xfffffffffffffff8 is *exactly* where the fault would be if the
>>>>> microcoded push of SS faulted if the IST contained zeros.
>>>>
>>>> Yes, I suspect it's the stack that is buggered for some reason.
>>>>
>>>>>> Plus I think the instruction that gets overwritten is just a 5-byte
>>>>>> nop isn't it? So it really shouldn't take a fault without the "int3"
>>>>>> overwriting.
>>>>>
>>>>> Unless it was being overwritten the other way and the oops hit while
>>>>> tracing was being turned *off*.
>>>>
>>>> Doesn't really matter. The two forms of that instruction are "5-byte
>>>> nop" and "unconditional branch".
>>>>
>>>> Neither of them will write to anything - the only page fault they
>>>> could take is for instruction fetch.
>>>>
>>>> So it really must be the "int3" that fails. Unless we're looking at
>>>> some odd CPU errata, which sounds very very unlikely.
>>>
>>> FTR the commit is:
>>>
>>> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
>>> Author: Stephen Rothwell <sfr@canb.auug.org.au>
>>> Date:   Wed Nov 29 14:09:56 2017 +1100
>>>     Add linux-next specific files for 20171129
>>>
>>> You can get it from
>>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git
>>> Compiler is this: https://storage.googleapis.com/syzkaller/gcc-7.tar.gz
>>> Config was attached.
>>>
>>> I've built this exact kernel and here is __switch_to disasm:
>>> https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt
>>>
>>> __switch_to+0x95b seems to point to (?):
>>>
>>> ffffffff81252f6b: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
>>>
>>> which is branch target alignment nop.
>>>
>>> We have a bunch of semi-similar non-sense crashes on syzbot:
>>>
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/9nMSJo9jmGs/tkRYgZ-XAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/04-q4OZrerA/XfYdNnWXAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/6iC6rPtAHKQ/UiZ4fnWXAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/2zSDbzRIH_k/SLCMqmeXAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/uEsjx8VISco/Mwu_pbGWAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/kZ6Z7UQLbCQ/JHpjTGeXAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/UjYsJxiGxwU/mponQq2XAwAJ
>>>
>>> Lots of them are on 0xfffffffffffffff8 address.
>>>
>>> I have some suspicion towards KVM. Potentially a nested KVM messed
>>> host processor state (CRn or page tables) so that then we get these
>>> weird crashes.
>>>
>>> One question: how would triple-fault look like? I am asking because we
>>> have hundreds of cases where kernel just starts silently rebooting
>>> while running some unprivileged syscalls:
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/w8dkVNrgzrc/4mLJLOAbCgAJ
>>> Can these be triple faults? Reproducer for that one also seems to be
>>> related to KVM.
>>
>>
>>
>> Well, actually replying log for this crash and for
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
>> with:
>>
>> ./syz-execprog -procs=10 -sandbox=namespace -repeat=0 raw.txt
>> (you can find exact instructions on how to do this here
>> https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md)
>>
>> I've got:
>>
>>
>> [  121.553588] binder: 3856:3857 ioctl 40046205 0 returned -22
>> [  121.557656] binder: 3856:3857 ERROR: BC_REGISTER_LOOPER called
>> without request
>> [  121.559744] binder: 3857 RLIMIT_NICE not set
>> [  121.586339] binder: 3857 RLIMIT_NICE not set
>> [  121.591764] binder: 3856:3857 unknown command 1400526783
>> [  121.593226] binder: 3856:3857 ioctl c0306201 20002fd0 returned -22
>> [  121.598292] binder: 3857 RLIMIT_NICE not set
>> [  121.600827] binder: 3856:3857 ioctl c018620b 20000fe8 returned -14
>> [  121.618284] binder: 3856:3857 BC_FREE_BUFFER uffffffffffffffff no match
>> [  121.622181] binder: 3856:3857 got reply transaction with no transaction stack
>> [  121.626345] binder: 3856:3857 transaction failed 29201/-71, size
>> 72-56 line 2747
>> [  121.628912] binder: 3856:3857 ioctl c0306201 20005fd0 returned -14
>> [  121.635620] binder: unexpected work type, 4, not freed
>> [  121.639753] binder: undelivered TRANSACTION_COMPLETE
>> [  121.645213] binder: undelivered TRANSACTION_ERROR: 29201
>> [  121.654860] binder: 3856:3857 BC_FREE_BUFFER u00000000ffffffff no match
>> [  121.667216] *** Guest State ***
>> [  121.667728] CR0: actual=0x0000000000000030,
>> shadow=0x0000000060000010, gh_mask=fffffffffffffff7
>> early console in extract_kernel
>> input_data: 0x0000000005f13276
>> input_len: 0x0000000001e7fa4c
>> output: 0x0000000001000000
>> output_len: 0x0000000005c85958
>> kernel_total_size: 0x0000000006db2000
>>
>> Decompressing Linux... Parsing ELF... done.
>> Booting the kernel.
>> [    0.000000] Linux version 4.15.0-rc1-next-20171129
>> (dvyukov@dvyukov-z840.muc.corp.google.com) (gcc version 7.1.1 20170620
>> (GCC)) #1 SMP Fri Dec 15 09:25:01 CET 2017
>> [    0.000000] Command line: kvm-intel.nested=1
>> kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
>> kvm-intel.flexpriority=1 kvm-intel.vpid=1
>> kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
>> kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
>> kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
>> earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
>> panic_on_warn=1 panic=86400
>> [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
>> point registers'
>> [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
>> [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
>> [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
>> [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is
>> 832 bytes, using 'standard' format.
>> [    0.000000] e820: BIOS-provided physical RAM map:
>> ...
>
>
> Well, the crash was minimized down to:
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #define _GNU_SOURCE
> #include <sys/syscall.h>
> #include <sys/ioctl.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <linux/kvm.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <string.h>
>
> int main()
> {
>   int fd = open("/dev/kvm", 0x80102ul);
>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
>   ioctl(cpu, KVM_RUN, 0);
>   return 0;
> }
>
> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
> Am I missing something here?
>
> +kvm maintainers, you can see full thread here:
> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw

I will have a try.

Regards,
Wanpeng Li