DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C45F4218DA
MIME-Version: 1.0
In-Reply-To: <CA+55aFzLw629nbk0GL=9=x3sjkxkKjiVW=mL6Pjm7i2vTLwyVw@mail.gmail.com>
References: <001a1145e8548cbd3d055f73374f@google.com> <alpine.DEB.2.20.1712141807400.4998@nanos>
 <CA+55aFzLw629nbk0GL=9=x3sjkxkKjiVW=mL6Pjm7i2vTLwyVw@mail.gmail.com>
From: Andy Lutomirski <luto@kernel.org>
Date: Thu, 14 Dec 2017 10:54:46 -0800
Message-ID: <CALCETrVRq9OKYu+rbvKeuY+pD14X8etW3hzVxJqznzH9T_PvMg@mail.gmail.com>
Subject: Re: BUG: unable to handle kernel paging request in __switch_to
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        syzbot 
        <bot+1f445b1009b8eeededa30fe62ccf685f2ec9d155@syzkaller.appspotmail.com>,
        Borislav Petkov <bp@suse.de>, Dmitry Safonov <dsafonov@virtuozzo.com>,
        Peter Anvin <hpa@zytor.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Andrew Lutomirski <luto@kernel.org>, Kyle Huey <me@kylehuey.com>,
        Ingo Molnar <mingo@redhat.com>, syzkaller-bugs@googlegroups.com,
        "the arch/x86 maintainers" <x86@kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3608
Lines: 84

On Thu, Dec 14, 2017 at 10:42 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 9:12 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Sun, 3 Dec 2017, syzbot wrote:
>>> BUG: unable to handle kernel paging request at fffffffffffffff8
>>> Oops: 0002 [#1] SMP KASAN
>
> System write of a non-existent page.
>
>>> RIP: 0010:switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535 [inline]
>>> RIP: 0010:__switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407
>
> This says it's
>
>      old_fpu->last_cpu = cpu;
>
> and the code disassembly ends up looking something like this:
>
>    0: 48 c1 ea 03          shr    $0x3,%rdx
>    4: 0f b6 04 02          movzbl (%rdx,%rax,1),%eax
>    8: 84 c0                test   %al,%al
>    a: 74 08                je     0x14
>    c: 3c 03                cmp    $0x3,%al
>    e: 0f 8e d5 06 00 00    jle    0x6e9
>   14: 8b 85 70 fe ff ff    mov    -0x190(%rbp),%eax
>   1a: 41 89 84 24 c0 15 00 mov    %eax,0x15c0(%r12)
>   21: 00
>   22:* cc                    int3    <-- trapping instruction
>
> where that preceding two "mov" instructions look like it might indeed be that
>
>      old_fpu->last_cpu = cpu;
>
> thing, and the register state doesn't look insane for this.
>
> So I think the RIP->line encoding is slightly off, and that "int3" is
> almost certainly due to the very next thing after the write:
>
>                 trace_x86_fpu_regs_deactivated(old_fpu);
>
> and that actually makes sense if the test robot is doing some tracing,
> particularly if it's just about to _start_ tracing, and it has
> replaced the first byte of the instruction with 'int3' and is in the
> process of doing the rewrite.
>
> The fact that it then takes a system write fault is because some GDT
> or IDT setup is screwed up. Or possibly the stack is screwed up and
> started out as 0, and then the push to the stack would decrement the
> stack pointer and try to push the error state or something.
>
>> That's the second report I'm staring at today which has CR2
>> fffffffffffffffx and points to a faulting instruction which does not make
>> any sense at all.
>
> That actually does make sense - see above.  It just requires that race
> with the instruction rewriting.
>
> *Normally* we never actually take the "int3" exception, because
> normally we'll have completed the rewrite before another CPU actually
> executes the instruction that is being rewritten.
>
> So I'm assuming this is with the page table isolation, and some
> unusual case in exception handling got screwed up.

SDM time.  Assuming the CPU actually decoded int3 and tried to execute
it, I can see a couple possible outcomes:

1. Something's wrong with the IDT and it can't read the vector.  I
think this would end up triple-faulting, though.

2. It actually tries to handle the breakpoint.  A breakpoint is a
benign exception, so any exception encountered while delivering it
would result in serial delivery.  I've never thought that serial
delivery made any sense -- presumably it just cancels the breakpoint
and delivers the other exception.  So this *could* be a page fault hit
during delivery of the int3 exception.  I don't believe it's a GDT
problem, though, because that would also likely lead to a triple
fault.  What I *would* believe is that the IST table got messed up and
we're seeing the result of trying to push to the stack with the
initial RSP=0 so the fault hits at address -8.

I have no idea how that would happen, though.  Especially since int3
from userspace would have exactly the same problem, and we exercise
that code in the selftests.