Please consider the attached patch.
SUMMARY
This patch corrects a hard lockup failure of the system kernel if the
operating system receives a breakpoint exception at a code execution
address which was not registered with the operating system. The patch
allows kernel debuggers, application profiling and performance modules,
and external debugging tools to work better together at sharing the
breakpoint registers on the platform in a way that they do not cause
errors and system faults, and enables the full feature set in the
breakpoint API. If a kernel application triggers a breakpoint
or programs one in error, this patch will catch the condition and report
it to the system log without the operating system experiencing a system
fault. There are several consumers of the Linux Breakpoint API and all
of them can and sometimes do cause the condition this patch corrects.
TESTING AND REVIEW PERFORMED
I have reviewed all the code that touches this patch and have
determined it will function and support all of the software that
depends on this handler properly. I have compiled and tested this
patch with a test harness that tests the robustness of the linux
breakpoint API and handlers in the following ways:
1. Setting multiple conditional breakpoints through
arch_install_hw_breakpoint API across four processors to test the rate
at which the interface can handle breakpoint exceptions
2. Setting unregistered breakpoints to test the handlers robustness
in dealing with error handling conditions and errant or spurious
hardware conditions and to simulate actual "lazy debug register
switching" with null bp handlers to test the
robustness of the handlers.
3. Clearing and setting breakpoints across multiple processors then
triggering concurrent exceptions in both interrupt and process
contexts.
This patch improves robustness in several ways in the linux kernel:
1. Corrects bug in handling unregistered breakpoints.
2. Provides hardware check of dr7 to determine source of breakpoint
if OS cannot ascertain the int1 source from its own state and
variables.
3. Actually allows "lazy debug register switching" to function
correctly.
Signed-off-by: Jeff Merkey <[email protected]>
---
arch/x86/kernel/hw_breakpoint.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index 50a3fad..ca13db0 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -444,7 +444,7 @@ EXPORT_SYMBOL_GPL(hw_breakpoint_restore);
static int hw_breakpoint_handler(struct die_args *args)
{
int i, cpu, rc = NOTIFY_STOP;
- struct perf_event *bp;
+ struct perf_event *bp = NULL;
unsigned long dr7, dr6;
unsigned long *dr6_p;
@@ -475,6 +475,13 @@ static int hw_breakpoint_handler(struct die_args *args)
for (i = 0; i < HBP_NUM; ++i) {
if (likely(!(dr6 & (DR_TRAP0 << i))))
continue;
+ /*
+ * check if we got an execute breakpoint
+ * from the dr7 register. if we did, set
+ * the resume flag to avoid int1 recursion.
+ */
+ if ((dr7 & (3 << ((i * 4) + 16))) == 0)
+ args->regs->flags |= X86_EFLAGS_RF;
/*
* The counter may be concurrently released but that can only
@@ -503,7 +510,9 @@ static int hw_breakpoint_handler(struct die_args *args)
/*
* Set up resume flag to avoid breakpoint recursion when
- * returning back to origin.
+ * returning back to origin. Perform the check
+ * twice in case the event handler altered the
+ * system flags.
*/
if (bp->hw.info.type == X86_BREAKPOINT_EXECUTE)
args->regs->flags |= X86_EFLAGS_RF;
@@ -519,6 +528,18 @@ static int hw_breakpoint_handler(struct die_args *args)
(dr6 & (~DR_TRAP_BITS)))
rc = NOTIFY_DONE;
+ /*
+ * if we are about to signal to
+ * do_debug() to stop further processing
+ * and we have not ascertained the source
+ * of the breakpoint, log it as spurious.
+ */
+ if (rc == NOTIFY_STOP && !bp) {
+ printk_ratelimited(KERN_INFO
+ "INFO: spurious INT1 exception dr6: 0x%lX dr7: 0x%lX\n",
+ dr6, dr7);
+ }
+
set_debugreg(dr7, 7);
put_cpu();
--
1.8.3.1
On 12/14/15, Jeff Merkey <[email protected]> wrote:
> Please consider the attached patch.
>
> SUMMARY
>
> This patch corrects a hard lockup failure of the system kernel if the
> operating system receives a breakpoint exception at a code execution
> address which was not registered with the operating system. The patch
> allows kernel debuggers, application profiling and performance modules,
> and external debugging tools to work better together at sharing the
> breakpoint registers on the platform in a way that they do not cause
> errors and system faults, and enables the full feature set in the
> breakpoint API. If a kernel application triggers a breakpoint
> or programs one in error, this patch will catch the condition and report
> it to the system log without the operating system experiencing a system
> fault. There are several consumers of the Linux Breakpoint API and all
> of them can and sometimes do cause the condition this patch corrects.
>
Peter,
Is that the type of summary that meets the standards. I tried to be concise.
Jeff
On December 14, 2015 4:49:51 PM PST, Jeff Merkey <[email protected]> wrote:
>On 12/14/15, Jeff Merkey <[email protected]> wrote:
>> Please consider the attached patch.
>>
>> SUMMARY
>>
>> This patch corrects a hard lockup failure of the system kernel if the
>> operating system receives a breakpoint exception at a code execution
>> address which was not registered with the operating system. The
>patch
>> allows kernel debuggers, application profiling and performance
>modules,
>> and external debugging tools to work better together at sharing the
>> breakpoint registers on the platform in a way that they do not cause
>> errors and system faults, and enables the full feature set in the
>> breakpoint API. If a kernel application triggers a breakpoint
>> or programs one in error, this patch will catch the condition and
>report
>> it to the system log without the operating system experiencing a
>system
>> fault. There are several consumers of the Linux Breakpoint API and
>all
>> of them can and sometimes do cause the condition this patch corrects.
>>
>
>Peter,
>
>Is that the type of summary that meets the standards. I tried to be
>concise.
>
>Jeff
It would be good if you could be a bit more precise. In particular, explain how and why the condition can appear rather than just explain that it can.
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
On 12/14/15, H. Peter Anvin <[email protected]> wrote:
> On December 14, 2015 4:49:51 PM PST, Jeff Merkey <[email protected]>
> wrote:
>>On 12/14/15, Jeff Merkey <[email protected]> wrote:
>>> Please consider the attached patch.
>>>
>>> SUMMARY
>>>
>>> This patch corrects a hard lockup failure of the system kernel if the
>>> operating system receives a breakpoint exception at a code execution
>>> address which was not registered with the operating system. The
>>patch
>>> allows kernel debuggers, application profiling and performance
>>modules,
>>> and external debugging tools to work better together at sharing the
>>> breakpoint registers on the platform in a way that they do not cause
>>> errors and system faults, and enables the full feature set in the
>>> breakpoint API. If a kernel application triggers a breakpoint
>>> or programs one in error, this patch will catch the condition and
>>report
>>> it to the system log without the operating system experiencing a
>>system
>>> fault. There are several consumers of the Linux Breakpoint API and
>>all
>>> of them can and sometimes do cause the condition this patch corrects.
>>>
>>
>>Peter,
>>
>>Is that the type of summary that meets the standards. I tried to be
>>concise.
>>
>>Jeff
>
> It would be good if you could be a bit more precise. In particular, explain
> how and why the condition can appear rather than just explain that it can.
> --
> Sent from my Android device with K-9 Mail. Please excuse brevity and
> formatting.
>
Yes sir. I'll be more concise and re-submit
Jeff