Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755445AbdLOJix (ORCPT ); Fri, 15 Dec 2017 04:38:53 -0500 Received: from mail-pg0-f45.google.com ([74.125.83.45]:33673 "EHLO mail-pg0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755116AbdLOJio (ORCPT ); Fri, 15 Dec 2017 04:38:44 -0500 X-Google-Smtp-Source: ACJfBoumTbnuCBZxePjFEQOuMxBLTjxQ0osAdjM75x+M+AfLFfyXlEdA8cawx+F2duRTxNy0sjSLlNEyAWAvugH+NK0= MIME-Version: 1.0 In-Reply-To: References: <001a1145e8548cbd3d055f73374f@google.com> From: Dmitry Vyukov Date: Fri, 15 Dec 2017 10:38:22 +0100 Message-ID: Subject: Re: BUG: unable to handle kernel paging request in __switch_to To: Linus Torvalds Cc: Andy Lutomirski , Thomas Gleixner , syzbot , Borislav Petkov , Dmitry Safonov , Peter Anvin , Linux Kernel Mailing List , Kyle Huey , Ingo Molnar , syzkaller-bugs@googlegroups.com, "the arch/x86 maintainers" , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , tianyu.lan@intel.com, James Mattson , Wanpeng Li , David Hildenbrand Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7542 Lines: 176 On Fri, Dec 15, 2017 at 10:13 AM, Dmitry Vyukov wrote: > On Fri, Dec 15, 2017 at 10:07 AM, Dmitry Vyukov wrote: >> On Thu, Dec 14, 2017 at 10:39 PM, Linus Torvalds >> wrote: >>> On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski wrote: >>>> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds >>>> wrote: >>>>> I don't think that's the case. "int3" is entirely synchronous, and >>>>> doesn't have the same odd issues as a breakpoint trap (which honors RF >>>>> etc). It's literally just a one-byte shorthand for "int $3". >>>> >>>> The SDM says precisely the same thing about INT N, so, whichever way >>>> you dice it, int3 is a benign exception. >>> >>> That just means that it doesn't double-fault when it takes the page fault. >>> >>> Which we already know, because we see a page fault, not a double fault. >>> >>>> 0xfffffffffffffff8 is *exactly* where the fault would be if the >>>> microcoded push of SS faulted if the IST contained zeros. >>> >>> Yes, I suspect it's the stack that is buggered for some reason. >>> >>>>> Plus I think the instruction that gets overwritten is just a 5-byte >>>>> nop isn't it? So it really shouldn't take a fault without the "int3" >>>>> overwriting. >>>> >>>> Unless it was being overwritten the other way and the oops hit while >>>> tracing was being turned *off*. >>> >>> Doesn't really matter. The two forms of that instruction are "5-byte >>> nop" and "unconditional branch". >>> >>> Neither of them will write to anything - the only page fault they >>> could take is for instruction fetch. >>> >>> So it really must be the "int3" that fails. Unless we're looking at >>> some odd CPU errata, which sounds very very unlikely. >> >> FTR the commit is: >> >> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129) >> Author: Stephen Rothwell >> Date: Wed Nov 29 14:09:56 2017 +1100 >> Add linux-next specific files for 20171129 >> >> You can get it from >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git >> Compiler is this: https://storage.googleapis.com/syzkaller/gcc-7.tar.gz >> Config was attached. >> >> I've built this exact kernel and here is __switch_to disasm: >> https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt >> >> __switch_to+0x95b seems to point to (?): >> >> ffffffff81252f6b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >> >> which is branch target alignment nop. >> >> We have a bunch of semi-similar non-sense crashes on syzbot: >> >> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ >> https://groups.google.com/forum/#!msg/syzkaller-bugs/9nMSJo9jmGs/tkRYgZ-XAwAJ >> https://groups.google.com/forum/#!msg/syzkaller-bugs/04-q4OZrerA/XfYdNnWXAwAJ >> https://groups.google.com/forum/#!msg/syzkaller-bugs/6iC6rPtAHKQ/UiZ4fnWXAwAJ >> https://groups.google.com/forum/#!msg/syzkaller-bugs/2zSDbzRIH_k/SLCMqmeXAwAJ >> https://groups.google.com/forum/#!msg/syzkaller-bugs/uEsjx8VISco/Mwu_pbGWAwAJ >> https://groups.google.com/forum/#!msg/syzkaller-bugs/kZ6Z7UQLbCQ/JHpjTGeXAwAJ >> https://groups.google.com/forum/#!msg/syzkaller-bugs/UjYsJxiGxwU/mponQq2XAwAJ >> >> Lots of them are on 0xfffffffffffffff8 address. >> >> I have some suspicion towards KVM. Potentially a nested KVM messed >> host processor state (CRn or page tables) so that then we get these >> weird crashes. >> >> One question: how would triple-fault look like? I am asking because we >> have hundreds of cases where kernel just starts silently rebooting >> while running some unprivileged syscalls: >> https://groups.google.com/forum/#!msg/syzkaller-bugs/w8dkVNrgzrc/4mLJLOAbCgAJ >> Can these be triple faults? Reproducer for that one also seems to be >> related to KVM. > > > > Well, actually replying log for this crash and for > https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ > with: > > ./syz-execprog -procs=10 -sandbox=namespace -repeat=0 raw.txt > (you can find exact instructions on how to do this here > https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md) > > I've got: > > > [ 121.553588] binder: 3856:3857 ioctl 40046205 0 returned -22 > [ 121.557656] binder: 3856:3857 ERROR: BC_REGISTER_LOOPER called > without request > [ 121.559744] binder: 3857 RLIMIT_NICE not set > [ 121.586339] binder: 3857 RLIMIT_NICE not set > [ 121.591764] binder: 3856:3857 unknown command 1400526783 > [ 121.593226] binder: 3856:3857 ioctl c0306201 20002fd0 returned -22 > [ 121.598292] binder: 3857 RLIMIT_NICE not set > [ 121.600827] binder: 3856:3857 ioctl c018620b 20000fe8 returned -14 > [ 121.618284] binder: 3856:3857 BC_FREE_BUFFER uffffffffffffffff no match > [ 121.622181] binder: 3856:3857 got reply transaction with no transaction stack > [ 121.626345] binder: 3856:3857 transaction failed 29201/-71, size > 72-56 line 2747 > [ 121.628912] binder: 3856:3857 ioctl c0306201 20005fd0 returned -14 > [ 121.635620] binder: unexpected work type, 4, not freed > [ 121.639753] binder: undelivered TRANSACTION_COMPLETE > [ 121.645213] binder: undelivered TRANSACTION_ERROR: 29201 > [ 121.654860] binder: 3856:3857 BC_FREE_BUFFER u00000000ffffffff no match > [ 121.667216] *** Guest State *** > [ 121.667728] CR0: actual=0x0000000000000030, > shadow=0x0000000060000010, gh_mask=fffffffffffffff7 > early console in extract_kernel > input_data: 0x0000000005f13276 > input_len: 0x0000000001e7fa4c > output: 0x0000000001000000 > output_len: 0x0000000005c85958 > kernel_total_size: 0x0000000006db2000 > > Decompressing Linux... Parsing ELF... done. > Booting the kernel. > [ 0.000000] Linux version 4.15.0-rc1-next-20171129 > (dvyukov@dvyukov-z840.muc.corp.google.com) (gcc version 7.1.1 20170620 > (GCC)) #1 SMP Fri Dec 15 09:25:01 CET 2017 > [ 0.000000] Command line: kvm-intel.nested=1 > kvm-intel.unrestricted_guest=1 kvm-intel.ept=1 > kvm-intel.flexpriority=1 kvm-intel.vpid=1 > kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1 > kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1 > kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda > earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic > panic_on_warn=1 panic=86400 > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating > point registers' > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' > [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 > [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is > 832 bytes, using 'standard' format. > [ 0.000000] e820: BIOS-provided physical RAM map: > ... Well, the crash was minimized down to: // autogenerated by syzkaller (http://github.com/google/syzkaller) #define _GNU_SOURCE #include #include #include #include #include #include #include #include int main() { int fd = open("/dev/kvm", 0x80102ul); int vm = ioctl(fd, KVM_CREATE_VM, 0); int cpu = ioctl(vm, KVM_CREATE_VCPU, 4); ioctl(cpu, KVM_RUN, 0); return 0; } And, yes, this in fact triggers instant reboot of kernel (running in qemu). Am I missing something here? +kvm maintainers, you can see full thread here: https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw