Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754461AbdLNT2V (ORCPT ); Thu, 14 Dec 2017 14:28:21 -0500 Received: from mail-it0-f42.google.com ([209.85.214.42]:35885 "EHLO mail-it0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753743AbdLNT2U (ORCPT ); Thu, 14 Dec 2017 14:28:20 -0500 X-Google-Smtp-Source: ACJfBosguv3Ks2s+J+1lNlB4K4aLmbk6w/HKUshdZBtszZ0Px89sn8vhwXpzCfjzrxd+inW6WRyRfayfsdTLC5kYqDY= MIME-Version: 1.0 In-Reply-To: References: <001a1145e8548cbd3d055f73374f@google.com> From: Linus Torvalds Date: Thu, 14 Dec 2017 11:28:18 -0800 X-Google-Sender-Auth: ORoyW-_pfsFYcVccFIkU6Tw9dHM Message-ID: Subject: Re: BUG: unable to handle kernel paging request in __switch_to To: Andy Lutomirski Cc: Thomas Gleixner , syzbot , Borislav Petkov , Dmitry Safonov , Peter Anvin , Linux Kernel Mailing List , Kyle Huey , Ingo Molnar , syzkaller-bugs@googlegroups.com, "the arch/x86 maintainers" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1824 Lines: 46 On Thu, Dec 14, 2017 at 10:54 AM, Andy Lutomirski wrote: > > 2. It actually tries to handle the breakpoint. A breakpoint is a > benign exception, so any exception encountered while delivering it > would result in serial delivery. I don't think that's the case. "int3" is entirely synchronous, and doesn't have the same odd issues as a breakpoint trap (which honors RF etc). It's literally just a one-byte shorthand for "int $3". There should be no serial delivery, although obviously if it's a trap gate (as opposed to an interrupt gate), you can get a normal external interrupt on the first instruction of the exception handler. But that's not what the oops says: it says it happens on the "int3" instruction. Now, it is possible that the "int3" was written _after_ the CPU took a real page fault on the original instruction, and that the original instruction actually caused a perfectly normal page fault, and then we just report the "int3" because another CPU overwrote the instruction after the original instruction had already trapped. But that makes very little sense either. I really do think it's the "int3" itself that causes the page fault due to some IDT/GDT change. Because that would actually make sense considering what has changed in the tree that Thomas is running. Plus I think the instruction that gets overwritten is just a 5-byte nop isn't it? So it really shouldn't take a fault without the "int3" overwriting. [ Goes back to the original report ] Yeah, so looking back at the "Code:" line, the faulting instruction looked like this: 1f 44 00 00 and a P6_NOP5 is #define P6_NOP5 0x0f,0x1f,0x44,0x00,0 so it's definitely "first byte of a 5-byte nop has been overwritten with a 'int3' instruction". The nop does not fault on its own. Linus