Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753341AbbLNRwe (ORCPT ); Mon, 14 Dec 2015 12:52:34 -0500 Received: from mail-io0-f181.google.com ([209.85.223.181]:34657 "EHLO mail-io0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752590AbbLNRwb (ORCPT ); Mon, 14 Dec 2015 12:52:31 -0500 MIME-Version: 1.0 In-Reply-To: <20151214092837.GA30347@gmail.com> References: <20151214080914.GA20556@gmail.com> <20151214092837.GA30347@gmail.com> Date: Mon, 14 Dec 2015 10:52:30 -0700 Message-ID: Subject: Re: [PATCH 1/1] Fix int1 recursion when no perf_bp_event is registeredy From: Jeff Merkey To: Ingo Molnar Cc: Andy Lutomirski , Thomas Gleixner , LKML , Ingo Molnar , "H. Peter Anvin" , X86 ML , Peter Zijlstra , Andy Lutomirski , Masami Hiramatsu , Steven Rostedt , Borislav Petkov , Jiri Olsa Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5159 Lines: 122 On 12/14/15, Ingo Molnar wrote: > > * Jeff Merkey wrote: > >> On 12/14/15, Ingo Molnar wrote: >> > >> > A: Because it messes up the order in which people normally read text. >> > Q: Why is top-posting such a bad thing? >> > A: Top-posting. >> > Q: What is the most annoying thing in e-mail? >> > >> > * Jeff Merkey wrote: >> > >> >> I trigger it by writing to the dr7 and dr1, 2, 3 or four register and >> >> set an >> >> execute breakpoint without going through arch_install_hw_breakpoint. >> >> When >> >> the breakpoint fires, the system crashes and hangs on the processor >> >> stuck in >> >> an endless loop inside the int1 handler in hw_breakpoint.c -- >> > >> > What is still not clear to me, can you trigger the hang not via some >> > special >> > >> > kernel driver that goes outside regular APIs and messes with the state >> > of the >> > debug registers, but via the proper access methods, i.e. various >> > user-space >> > ABIs? >> >> Any process that can get access to the debug registers can trigger this >> condition. [...] > > A process on an unmodified Linux kernel can only modify debug registers via > the > proper APIs: > >> [...] As it stands, if restricted to the established API in >> hw_breakpoint.c >> this bug should not occur unless someone triggers an errant breakpoint. >> [...] > > So am I interpreting your report correctly: > > "If the Linux kernel is modified to change debug registers without using > the > proper APIs (such as loading a module that changes hardware registers in > a raw > fashion), things may break and a difficult to debug hang may occur." > > right? > > This key piece of information should have been part of the original report. > > So I'm wondering, why does your module modify debug registers in a raw > fashion? > Why doesn't it use the proper APIs? > > Thanks, > > Ingo > Hi Ingo, This will be a lengthy reply to properly explain this to you. First some fundamental assumptions to clear up. 1. The MDB Debugger Module does not cause this problem. This is an existing bug in the kernel in an exception code path. 2. This bug was discovered and triggered while running a TEST HARNESS I use to test the debugger. Among other things, I check for unregistered breakpoints while performing tests of debugging blacked-out sections of the OS in a special mode the debugger can employ called DIRECT MODE. In normal mode the MDB Debugger uses the established breakpoint API. 3. The Breakpoint API in linux was not designed for debuggers. It was designed for probe and application profiling. It has no concept of global SMP breakpoints, no facilities to manage them, no on/off settings for debugger entry conditions, requiring kernel debuggers to do it themselves. 4. I handle all of these cases correctly as does kgdb and kdb and have to use this severely deficient interface if not in direct mode. 5. Direct mode in a debugger allows the debugger to essentially be a sliding window over sections of code that are normally blacked out. For example, if placed in direct mode, MDB can debug the linux debugger API itself because it is not calling it and is using the registers directly without a software layer the debugger is dependent on. Using this mode, its possible to debug across interrupts, syscalls, and areas of the OS normally "blacked out" to the debugger. 6. This interface has a bug in its main execution path for handling int1 exceptions. It asks the OS whether or not a breakpoint is an execute breakpoint rather than querying the hardware dr7 register which just delivered the interrupt, and since no bp is present, the system hangs at the same address getting the interrupt over and over again because the resume flag was never set. 7. The way an int1 exception works ingo is when the address of an execute breakpoint is hit, the processor will interrupt through the int1 handler and will keep asserting the same interrupt over and over again unless the resume flag is set and reloaded into the processor. This is how intel processors are designed to work. 8. If any process or module sets a breakpoint outside of linux breakpoint API in this code path the system will crash. Its A BUG, and it's been in linux 13 years. I am certain people have seen it while running perf stuff but since it provides no diagnostic info, someone would just reset the system. 9. This breakpoint API needs to be rewritten to be global breakpoint aware, have an on/off switch so when a debugger enters an int1 exception, breakpoints are globally disabled (a requirement), among other things. The patch simply fixes the bug in the int handler that will cause a lockup. The perf event system, kgdb, kdb, and any one of a number of programs can trigger this bug, and probably have. People would blame the debugger when its a bug in the int handler. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/