Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755424AbdLOJkl (ORCPT ); Fri, 15 Dec 2017 04:40:41 -0500 Received: from mail-oi0-f68.google.com ([209.85.218.68]:40696 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754635AbdLOJkh (ORCPT ); Fri, 15 Dec 2017 04:40:37 -0500 X-Google-Smtp-Source: ACJfBov2OYBrkYLczuHStKr5BFliFws9UEFESCS0KOy190FeTxCrlHbJqlAAQ9d1g9I/LdhV2mq23t5/wh6dEuRSZpA= MIME-Version: 1.0 In-Reply-To: References: <001a1145e8548cbd3d055f73374f@google.com> From: Wanpeng Li Date: Fri, 15 Dec 2017 17:40:35 +0800 Message-ID: Subject: Re: BUG: unable to handle kernel paging request in __switch_to To: Dmitry Vyukov Cc: Linus Torvalds , Andy Lutomirski , Thomas Gleixner , syzbot , Borislav Petkov , Dmitry Safonov , Peter Anvin , Linux Kernel Mailing List , Kyle Huey , Ingo Molnar , syzkaller-bugs@googlegroups.com, "the arch/x86 maintainers" , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , "Lan, Tianyu" , James Mattson , David Hildenbrand Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7846 Lines: 182 2017-12-15 17:38 GMT+08:00 Dmitry Vyukov : > On Fri, Dec 15, 2017 at 10:13 AM, Dmitry Vyukov wrote: >> On Fri, Dec 15, 2017 at 10:07 AM, Dmitry Vyukov wrote: >>> On Thu, Dec 14, 2017 at 10:39 PM, Linus Torvalds >>> wrote: >>>> On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski wrote: >>>>> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds >>>>> wrote: >>>>>> I don't think that's the case. "int3" is entirely synchronous, and >>>>>> doesn't have the same odd issues as a breakpoint trap (which honors RF >>>>>> etc). It's literally just a one-byte shorthand for "int $3". >>>>> >>>>> The SDM says precisely the same thing about INT N, so, whichever way >>>>> you dice it, int3 is a benign exception. >>>> >>>> That just means that it doesn't double-fault when it takes the page fault. >>>> >>>> Which we already know, because we see a page fault, not a double fault. >>>> >>>>> 0xfffffffffffffff8 is *exactly* where the fault would be if the >>>>> microcoded push of SS faulted if the IST contained zeros. >>>> >>>> Yes, I suspect it's the stack that is buggered for some reason. >>>> >>>>>> Plus I think the instruction that gets overwritten is just a 5-byte >>>>>> nop isn't it? So it really shouldn't take a fault without the "int3" >>>>>> overwriting. >>>>> >>>>> Unless it was being overwritten the other way and the oops hit while >>>>> tracing was being turned *off*. >>>> >>>> Doesn't really matter. The two forms of that instruction are "5-byte >>>> nop" and "unconditional branch". >>>> >>>> Neither of them will write to anything - the only page fault they >>>> could take is for instruction fetch. >>>> >>>> So it really must be the "int3" that fails. Unless we're looking at >>>> some odd CPU errata, which sounds very very unlikely. >>> >>> FTR the commit is: >>> >>> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129) >>> Author: Stephen Rothwell >>> Date: Wed Nov 29 14:09:56 2017 +1100 >>> Add linux-next specific files for 20171129 >>> >>> You can get it from >>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git >>> Compiler is this: https://storage.googleapis.com/syzkaller/gcc-7.tar.gz >>> Config was attached. >>> >>> I've built this exact kernel and here is __switch_to disasm: >>> https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt >>> >>> __switch_to+0x95b seems to point to (?): >>> >>> ffffffff81252f6b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >>> >>> which is branch target alignment nop. >>> >>> We have a bunch of semi-similar non-sense crashes on syzbot: >>> >>> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ >>> https://groups.google.com/forum/#!msg/syzkaller-bugs/9nMSJo9jmGs/tkRYgZ-XAwAJ >>> https://groups.google.com/forum/#!msg/syzkaller-bugs/04-q4OZrerA/XfYdNnWXAwAJ >>> https://groups.google.com/forum/#!msg/syzkaller-bugs/6iC6rPtAHKQ/UiZ4fnWXAwAJ >>> https://groups.google.com/forum/#!msg/syzkaller-bugs/2zSDbzRIH_k/SLCMqmeXAwAJ >>> https://groups.google.com/forum/#!msg/syzkaller-bugs/uEsjx8VISco/Mwu_pbGWAwAJ >>> https://groups.google.com/forum/#!msg/syzkaller-bugs/kZ6Z7UQLbCQ/JHpjTGeXAwAJ >>> https://groups.google.com/forum/#!msg/syzkaller-bugs/UjYsJxiGxwU/mponQq2XAwAJ >>> >>> Lots of them are on 0xfffffffffffffff8 address. >>> >>> I have some suspicion towards KVM. Potentially a nested KVM messed >>> host processor state (CRn or page tables) so that then we get these >>> weird crashes. >>> >>> One question: how would triple-fault look like? I am asking because we >>> have hundreds of cases where kernel just starts silently rebooting >>> while running some unprivileged syscalls: >>> https://groups.google.com/forum/#!msg/syzkaller-bugs/w8dkVNrgzrc/4mLJLOAbCgAJ >>> Can these be triple faults? Reproducer for that one also seems to be >>> related to KVM. >> >> >> >> Well, actually replying log for this crash and for >> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ >> with: >> >> ./syz-execprog -procs=10 -sandbox=namespace -repeat=0 raw.txt >> (you can find exact instructions on how to do this here >> https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md) >> >> I've got: >> >> >> [ 121.553588] binder: 3856:3857 ioctl 40046205 0 returned -22 >> [ 121.557656] binder: 3856:3857 ERROR: BC_REGISTER_LOOPER called >> without request >> [ 121.559744] binder: 3857 RLIMIT_NICE not set >> [ 121.586339] binder: 3857 RLIMIT_NICE not set >> [ 121.591764] binder: 3856:3857 unknown command 1400526783 >> [ 121.593226] binder: 3856:3857 ioctl c0306201 20002fd0 returned -22 >> [ 121.598292] binder: 3857 RLIMIT_NICE not set >> [ 121.600827] binder: 3856:3857 ioctl c018620b 20000fe8 returned -14 >> [ 121.618284] binder: 3856:3857 BC_FREE_BUFFER uffffffffffffffff no match >> [ 121.622181] binder: 3856:3857 got reply transaction with no transaction stack >> [ 121.626345] binder: 3856:3857 transaction failed 29201/-71, size >> 72-56 line 2747 >> [ 121.628912] binder: 3856:3857 ioctl c0306201 20005fd0 returned -14 >> [ 121.635620] binder: unexpected work type, 4, not freed >> [ 121.639753] binder: undelivered TRANSACTION_COMPLETE >> [ 121.645213] binder: undelivered TRANSACTION_ERROR: 29201 >> [ 121.654860] binder: 3856:3857 BC_FREE_BUFFER u00000000ffffffff no match >> [ 121.667216] *** Guest State *** >> [ 121.667728] CR0: actual=0x0000000000000030, >> shadow=0x0000000060000010, gh_mask=fffffffffffffff7 >> early console in extract_kernel >> input_data: 0x0000000005f13276 >> input_len: 0x0000000001e7fa4c >> output: 0x0000000001000000 >> output_len: 0x0000000005c85958 >> kernel_total_size: 0x0000000006db2000 >> >> Decompressing Linux... Parsing ELF... done. >> Booting the kernel. >> [ 0.000000] Linux version 4.15.0-rc1-next-20171129 >> (dvyukov@dvyukov-z840.muc.corp.google.com) (gcc version 7.1.1 20170620 >> (GCC)) #1 SMP Fri Dec 15 09:25:01 CET 2017 >> [ 0.000000] Command line: kvm-intel.nested=1 >> kvm-intel.unrestricted_guest=1 kvm-intel.ept=1 >> kvm-intel.flexpriority=1 kvm-intel.vpid=1 >> kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1 >> kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1 >> kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda >> earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic >> panic_on_warn=1 panic=86400 >> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating >> point registers' >> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' >> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' >> [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 >> [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is >> 832 bytes, using 'standard' format. >> [ 0.000000] e820: BIOS-provided physical RAM map: >> ... > > > Well, the crash was minimized down to: > > // autogenerated by syzkaller (http://github.com/google/syzkaller) > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > #include > > int main() > { > int fd = open("/dev/kvm", 0x80102ul); > int vm = ioctl(fd, KVM_CREATE_VM, 0); > int cpu = ioctl(vm, KVM_CREATE_VCPU, 4); > ioctl(cpu, KVM_RUN, 0); > return 0; > } > > And, yes, this in fact triggers instant reboot of kernel (running in qemu). > Am I missing something here? > > +kvm maintainers, you can see full thread here: > https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw I will have a try. Regards, Wanpeng Li