2010-01-27 22:27:17

by Jason Wessel

[permalink] [raw]
Subject: [PATCH 0/3] V2 kgdb regression fixes for 2.6.33

I would like to get acks from the respective parties to fix these
reported regressions against kgdb in 2.6.33.

The constraints for the hw breakpoint API is possibly was definitely a
dicey issue, but is hopefully resolved in this series. Even without
the constraints patch it is possible to use hw breakpoints in the
kernel debugger in the same manner that has existed since 2.6.26 (only
kgdb gets to use hw breakpoints).

The regression are:
* hw breakpoints no longer work on x86 after the perf API merge
* softlockup watchdog can reboot the system while using the kernel debugger
*** This has been in linux-next for several months waiting for acks

Dropped from the series was the clocksource patch, it was resolved
separately.

I collected all the patches which could go into the tip branch or the
kgdb for_linus branch at the following location depending on the
status of the discussion that ensues.

git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git for_igno

Thanks,
Jason.

---

The following changes since commit be8cde8b24c9dca1e54598690115eee5b1476519:
Linus Torvalds (1):
Merge git://git.kernel.org/.../jejb/scsi-rc-fixes-2.6

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git for_ingo

Jason Wessel (3):
softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume
x86,hw_breakpoint,kgdb: kgdb to use hw_breakpoint API
perf,hw_breakpoint,kgdb: No mutex taken for kernel debugger

arch/x86/kernel/hw_breakpoint.c | 5 +-
arch/x86/kernel/kgdb.c | 251 ++++++++++++++++++++++++++++++---------
include/linux/hw_breakpoint.h | 2 +
include/linux/sched.h | 4 +
kernel/hw_breakpoint.c | 52 +++++++--
kernel/kgdb.c | 9 +-
kernel/softlockup.c | 15 +++
7 files changed, 264 insertions(+), 74 deletions(-)


2010-01-27 22:27:30

by Jason Wessel

[permalink] [raw]
Subject: [PATCH 1/3] softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume

When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, sched_clock() gets the
time from hardware such as the TSC on x86. In this configuration kgdb
will report a softlock warning message on resuming or detaching from a
debug session.

Sequence of events in the problem case:

1) "cpu sched clock" and "hardware time" are at 100 sec prior
to a call to kgdb_handle_exception()

2) Debugger waits in kgdb_handle_exception() for 80 sec and on exit
the following is called ... touch_softlockup_watchdog() -->
__raw_get_cpu_var(touch_timestamp) = 0;

3) "cpu sched clock" = 100s (it was not updated, because the interrupt
was disabled in kgdb) but the "hardware time" = 180 sec

4) The first timer interrupt after resuming from kgdb_handle_exception
updates the watchdog from the "cpu sched clock"

update_process_times() { ... run_local_timers() --> softlockup_tick()
--> check (touch_timestamp == 0) (it is "YES" here, we have set
"touch_timestamp = 0" at kgdb) --> __touch_softlockup_watchdog()
***(A)--> reset "touch_timestamp" to "get_timestamp()" (Here, the
"touch_timestamp" will still be set to 100s.) ...

scheduler_tick() ***(B)--> sched_clock_tick() (update "cpu sched
clock" to "hardware time" = 180s) ... }

5) The Second timer interrupt handler appears to have a large jump and
trips the softlockup warning.

update_process_times() { ... run_local_timers() --> softlockup_tick()
--> "cpu sched clock" - "touch_timestamp" = 180s-100s > 60s --> printk
"soft lockup error messages" ... }

note: ***(A) reset "touch_timestamp" to "get_timestamp(this_cpu)"

Why is "touch_timestamp" 100 sec, instead of 180 sec?

When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, the call trace of
get_timestamp() is:

get_timestamp(this_cpu) -->cpu_clock(this_cpu)
-->sched_clock_cpu(this_cpu) -->__update_sched_clock(sched_clock_data,
now)

The __update_sched_clock() function uses the GTOD tick value to create
a window to normalize the "now" values. So if "now" value is too big
for sched_clock_data, it will be ignored.

The fix is to invoke sched_clock_tick() to update "cpu sched clock" in
order to recover from this state. This is done by introducing the
function touch_softlockup_watchdog_sync(). This allows kgdb to request
that the sched clock is updated when the watchdog thread runs the
first time after a resume from kgdb.

[[email protected]: Use per cpu instead of an array]
Signed-off-by: Jason Wessel <[email protected]>
Signed-off-by: Dongdong Deng <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: [email protected]
---
include/linux/sched.h | 4 ++++
kernel/kgdb.c | 6 +++---
kernel/softlockup.c | 15 +++++++++++++++
3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6f7bba9..8923215 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -310,6 +310,7 @@ extern void sched_show_task(struct task_struct *p);
#ifdef CONFIG_DETECT_SOFTLOCKUP
extern void softlockup_tick(void);
extern void touch_softlockup_watchdog(void);
+extern void touch_softlockup_watchdog_sync(void);
extern void touch_all_softlockup_watchdogs(void);
extern int proc_dosoftlockup_thresh(struct ctl_table *table, int write,
void __user *buffer,
@@ -323,6 +324,9 @@ static inline void softlockup_tick(void)
static inline void touch_softlockup_watchdog(void)
{
}
+static inline void touch_softlockup_watchdog_sync(void)
+{
+}
static inline void touch_all_softlockup_watchdogs(void)
{
}
diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index 2eb517e..87f2cc5 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -596,7 +596,7 @@ static void kgdb_wait(struct pt_regs *regs)

/* Signal the primary CPU that we are done: */
atomic_set(&cpu_in_kgdb[cpu], 0);
- touch_softlockup_watchdog();
+ touch_softlockup_watchdog_sync();
clocksource_touch_watchdog();
local_irq_restore(flags);
}
@@ -1450,7 +1450,7 @@ acquirelock:
(kgdb_info[cpu].task &&
kgdb_info[cpu].task->pid != kgdb_sstep_pid) && --sstep_tries) {
atomic_set(&kgdb_active, -1);
- touch_softlockup_watchdog();
+ touch_softlockup_watchdog_sync();
clocksource_touch_watchdog();
local_irq_restore(flags);

@@ -1550,7 +1550,7 @@ kgdb_restore:
}
/* Free kgdb_active */
atomic_set(&kgdb_active, -1);
- touch_softlockup_watchdog();
+ touch_softlockup_watchdog_sync();
clocksource_touch_watchdog();
local_irq_restore(flags);

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index d225790..0d4c789 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -25,6 +25,7 @@ static DEFINE_SPINLOCK(print_lock);
static DEFINE_PER_CPU(unsigned long, softlockup_touch_ts); /* touch timestamp */
static DEFINE_PER_CPU(unsigned long, softlockup_print_ts); /* print timestamp */
static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
+static DEFINE_PER_CPU(bool, softlock_touch_sync);

static int __read_mostly did_panic;
int __read_mostly softlockup_thresh = 60;
@@ -79,6 +80,12 @@ void touch_softlockup_watchdog(void)
}
EXPORT_SYMBOL(touch_softlockup_watchdog);

+void touch_softlockup_watchdog_sync(void)
+{
+ __raw_get_cpu_var(softlock_touch_sync) = true;
+ __raw_get_cpu_var(softlockup_touch_ts) = 0;
+}
+
void touch_all_softlockup_watchdogs(void)
{
int cpu;
@@ -118,6 +125,14 @@ void softlockup_tick(void)
}

if (touch_ts == 0) {
+ if (unlikely(per_cpu(softlock_touch_sync, this_cpu))) {
+ /*
+ * If the time stamp was touched atomically
+ * make sure the scheduler tick is up to date.
+ */
+ per_cpu(softlock_touch_sync, this_cpu) = false;
+ sched_clock_tick();
+ }
__touch_softlockup_watchdog();
return;
}
--
1.6.4.rc1

2010-01-27 22:27:40

by Jason Wessel

[permalink] [raw]
Subject: [PATCH 2/3] x86,hw_breakpoint,kgdb: kgdb to use hw_breakpoint API

In the 2.6.33 kernel, the hw_breakpoint API is now used for the
performance event counters. The hw_breakpoint_handler() now consumes
the hw breakpoints that were previously set by kgdb arch specific
code. In order for kgdb to work in conjunction with this core API
change, kgdb must use some of the low level functions of the
hw_breakpoint API to install, uninstall, and receive call backs for hw
breakpoints.

The kgdb core needs to call kgdb_disable_hw_debug anytime a slave cpu
enters kgdb_wait() in order to keep all the hw breakpoints in sync as
well as to prevent hitting a hw breakpoint while kgdb is active.

During the architecture specific initialization of kgdb, it will
pre-allocate 4 disabled (struct perf event **) structures. Kgdb will
use these to manage the capabilities for the 4 hw breakpoint
registers. Right now the hw_breakpoint API does not have a way to ask
how many breakpoints are available, on each CPU so it is possible that
the install of a breakpoint might fail when kgdb restores the system
to the run state. The intent of this patch is to first get the basic
functionality of hw breakpoints working and leave it to the person
debugging the kernel to understand what hw breakpoints are in use and
what restrictions have been imposed as a result.

While atomic, the x86 specific kgdb code will call
arch_uninstall_hw_breakpoint() and arch_install_hw_breakpoint() to
manage the cpu specific hw breakpoints.

The arch specific hw_breakpoint_handler() was changed to restore the
cpu specific dr7 instead of the dr7 that was locally saved, because
the dr7 can be modified while in a call back to kgdb.

The net result of these changes allow kgdb to use the same pool of
hw_breakpoints that are used by the perf event API, but neither knows
about future reservations for the available hw breakpoint slots.

CC: Frederic Weisbecker <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: K.Prasad <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Alan Stern <[email protected]>
Signed-off-by: Jason Wessel <[email protected]>
---
arch/x86/kernel/hw_breakpoint.c | 5 +-
arch/x86/kernel/kgdb.c | 206 ++++++++++++++++++++++++++++-----------
kernel/kgdb.c | 3 +
3 files changed, 152 insertions(+), 62 deletions(-)

diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index 05d5fec..cbf19e0 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -466,7 +466,7 @@ static int __kprobes hw_breakpoint_handler(struct die_args *args)
{
int i, cpu, rc = NOTIFY_STOP;
struct perf_event *bp;
- unsigned long dr7, dr6;
+ unsigned long dr6;
unsigned long *dr6_p;

/* The DR6 value is pointed by args->err */
@@ -477,7 +477,6 @@ static int __kprobes hw_breakpoint_handler(struct die_args *args)
if ((dr6 & DR_TRAP_BITS) == 0)
return NOTIFY_DONE;

- get_debugreg(dr7, 7);
/* Disable breakpoints during exception handling */
set_debugreg(0UL, 7);
/*
@@ -525,7 +524,7 @@ static int __kprobes hw_breakpoint_handler(struct die_args *args)
if (dr6 & (~DR_TRAP_BITS))
rc = NOTIFY_DONE;

- set_debugreg(dr7, 7);
+ set_debugreg(__get_cpu_var(cpu_dr7), 7);
put_cpu();

return rc;
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index dd74fe7..9f47cd3 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -42,6 +42,7 @@
#include <linux/init.h>
#include <linux/smp.h>
#include <linux/nmi.h>
+#include <linux/hw_breakpoint.h>

#include <asm/debugreg.h>
#include <asm/apicdef.h>
@@ -204,40 +205,38 @@ void gdb_regs_to_pt_regs(unsigned long *gdb_regs, struct pt_regs *regs)

static struct hw_breakpoint {
unsigned enabled;
- unsigned type;
- unsigned len;
unsigned long addr;
+ int len;
+ int type;
+ struct perf_event **pev;
} breakinfo[4];

static void kgdb_correct_hw_break(void)
{
- unsigned long dr7;
- int correctit = 0;
- int breakbit;
int breakno;

- get_debugreg(dr7, 7);
for (breakno = 0; breakno < 4; breakno++) {
- breakbit = 2 << (breakno << 1);
- if (!(dr7 & breakbit) && breakinfo[breakno].enabled) {
- correctit = 1;
- dr7 |= breakbit;
- dr7 &= ~(0xf0000 << (breakno << 2));
- dr7 |= ((breakinfo[breakno].len << 2) |
- breakinfo[breakno].type) <<
- ((breakno << 2) + 16);
- set_debugreg(breakinfo[breakno].addr, breakno);
-
- } else {
- if ((dr7 & breakbit) && !breakinfo[breakno].enabled) {
- correctit = 1;
- dr7 &= ~breakbit;
- dr7 &= ~(0xf0000 << (breakno << 2));
- }
- }
+ struct perf_event *bp;
+ struct arch_hw_breakpoint *info;
+ int val;
+ int cpu = raw_smp_processor_id();
+ if (!breakinfo[breakno].enabled)
+ continue;
+ bp = *per_cpu_ptr(breakinfo[breakno].pev, cpu);
+ info = counter_arch_bp(bp);
+ if (bp->attr.disabled != 1)
+ continue;
+ bp->attr.bp_addr = breakinfo[breakno].addr;
+ bp->attr.bp_len = breakinfo[breakno].len;
+ bp->attr.bp_type = breakinfo[breakno].type;
+ info->address = breakinfo[breakno].addr;
+ info->len = breakinfo[breakno].len;
+ info->type = breakinfo[breakno].type;
+ val = arch_install_hw_breakpoint(bp);
+ if (!val)
+ bp->attr.disabled = 0;
}
- if (correctit)
- set_debugreg(dr7, 7);
+ hw_breakpoint_restore();
}

static int
@@ -259,15 +258,23 @@ kgdb_remove_hw_break(unsigned long addr, int len, enum kgdb_bptype bptype)
static void kgdb_remove_all_hw_break(void)
{
int i;
+ int cpu = raw_smp_processor_id();
+ struct perf_event *bp;

- for (i = 0; i < 4; i++)
- memset(&breakinfo[i], 0, sizeof(struct hw_breakpoint));
+ for (i = 0; i < 4; i++) {
+ if (!breakinfo[i].enabled)
+ continue;
+ bp = *per_cpu_ptr(breakinfo[i].pev, cpu);
+ if (bp->attr.disabled == 1)
+ continue;
+ arch_uninstall_hw_breakpoint(bp);
+ bp->attr.disabled = 1;
+ }
}

static int
kgdb_set_hw_break(unsigned long addr, int len, enum kgdb_bptype bptype)
{
- unsigned type;
int i;

for (i = 0; i < 4; i++)
@@ -278,27 +285,38 @@ kgdb_set_hw_break(unsigned long addr, int len, enum kgdb_bptype bptype)

switch (bptype) {
case BP_HARDWARE_BREAKPOINT:
- type = 0;
- len = 1;
+ len = 1;
+ breakinfo[i].type = X86_BREAKPOINT_EXECUTE;
break;
case BP_WRITE_WATCHPOINT:
- type = 1;
+ breakinfo[i].type = X86_BREAKPOINT_WRITE;
break;
case BP_ACCESS_WATCHPOINT:
- type = 3;
+ breakinfo[i].type = X86_BREAKPOINT_RW;
break;
default:
return -1;
}
-
- if (len == 1 || len == 2 || len == 4)
- breakinfo[i].len = len - 1;
- else
- return -1;
-
- breakinfo[i].enabled = 1;
+ switch (len) {
+ case 1:
+ breakinfo[i].len = X86_BREAKPOINT_LEN_1;
+ break;
+ case 2:
+ breakinfo[i].len = X86_BREAKPOINT_LEN_2;
+ break;
+ case 4:
+ breakinfo[i].len = X86_BREAKPOINT_LEN_4;
+ break;
+#ifdef CONFIG_X86_64
+ case 8:
+ breakinfo[i].len = X86_BREAKPOINT_LEN_8;
+ break;
+#endif
+ default:
+ return -1;
+ }
breakinfo[i].addr = addr;
- breakinfo[i].type = type;
+ breakinfo[i].enabled = 1;

return 0;
}
@@ -313,8 +331,21 @@ kgdb_set_hw_break(unsigned long addr, int len, enum kgdb_bptype bptype)
*/
void kgdb_disable_hw_debug(struct pt_regs *regs)
{
+ int i;
+ int cpu = raw_smp_processor_id();
+ struct perf_event *bp;
+
/* Disable hardware debugging while we are in kgdb: */
set_debugreg(0UL, 7);
+ for (i = 0; i < 4; i++) {
+ if (!breakinfo[i].enabled)
+ continue;
+ bp = *per_cpu_ptr(breakinfo[i].pev, cpu);
+ if (bp->attr.disabled == 1)
+ continue;
+ arch_uninstall_hw_breakpoint(bp);
+ bp->attr.disabled = 1;
+ }
}

/**
@@ -378,7 +409,6 @@ int kgdb_arch_handle_exception(int e_vector, int signo, int err_code,
struct pt_regs *linux_regs)
{
unsigned long addr;
- unsigned long dr6;
char *ptr;
int newPC;

@@ -404,20 +434,6 @@ int kgdb_arch_handle_exception(int e_vector, int signo, int err_code,
raw_smp_processor_id());
}

- get_debugreg(dr6, 6);
- if (!(dr6 & 0x4000)) {
- int breakno;
-
- for (breakno = 0; breakno < 4; breakno++) {
- if (dr6 & (1 << breakno) &&
- breakinfo[breakno].type == 0) {
- /* Set restore flag: */
- linux_regs->flags |= X86_EFLAGS_RF;
- break;
- }
- }
- }
- set_debugreg(0UL, 6);
kgdb_correct_hw_break();

return 0;
@@ -448,10 +464,12 @@ single_step_cont(struct pt_regs *regs, struct die_args *args)
}

static int was_in_debug_nmi[NR_CPUS];
+static int recieved_hw_brk[NR_CPUS];

static int __kgdb_notify(struct die_args *args, unsigned long cmd)
{
struct pt_regs *regs = args->regs;
+ unsigned long *dr6_p;

switch (cmd) {
case DIE_NMI:
@@ -485,16 +503,24 @@ static int __kgdb_notify(struct die_args *args, unsigned long cmd)
break;

case DIE_DEBUG:
- if (atomic_read(&kgdb_cpu_doing_single_step) ==
- raw_smp_processor_id()) {
+ dr6_p = (unsigned long *)ERR_PTR(args->err);
+ if (recieved_hw_brk[raw_smp_processor_id()] == 1) {
+ recieved_hw_brk[raw_smp_processor_id()] = 0;
+ return NOTIFY_STOP;
+ } else if (atomic_read(&kgdb_cpu_doing_single_step) != -1) {
+ if (dr6_p && (*dr6_p & DR_STEP) == 0)
+ return NOTIFY_DONE;
if (user_mode(regs))
return single_step_cont(regs, args);
break;
- } else if (test_thread_flag(TIF_SINGLESTEP))
+ } else if (test_thread_flag(TIF_SINGLESTEP)) {
/* This means a user thread is single stepping
* a system call which should be ignored
*/
return NOTIFY_DONE;
+ } else if (dr6_p && (*dr6_p & DR_TRAP_BITS) == 0) {
+ return NOTIFY_DONE;
+ }
/* fall through */
default:
if (user_mode(regs))
@@ -531,6 +557,25 @@ static struct notifier_block kgdb_notifier = {
.priority = -INT_MAX,
};

+static void kgdb_hw_bp(struct perf_event *bp, int nmi,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct die_args args;
+ int cpu = raw_smp_processor_id();
+
+ args.trapnr = 0;
+ args.signr = 5;
+ args.err = 0;
+ args.regs = regs;
+ args.str = "debug";
+ recieved_hw_brk[cpu] = 0;
+ if (__kgdb_notify(&args, DIE_DEBUG) == NOTIFY_STOP)
+ recieved_hw_brk[cpu] = 1;
+ else
+ recieved_hw_brk[cpu] = 0;
+}
+
/**
* kgdb_arch_init - Perform any architecture specific initalization.
*
@@ -539,7 +584,43 @@ static struct notifier_block kgdb_notifier = {
*/
int kgdb_arch_init(void)
{
- return register_die_notifier(&kgdb_notifier);
+ int i, cpu;
+ int ret;
+ struct perf_event_attr attr;
+ struct perf_event **pevent;
+
+ ret = register_die_notifier(&kgdb_notifier);
+ if (ret != 0)
+ return ret;
+ /*
+ * Pre-allocate the hw breakpoint structions in the non-atomic
+ * portion of kgdb because this operation requires mutexs to
+ * complete.
+ */
+ attr.bp_addr = (unsigned long)kgdb_arch_init;
+ attr.type = PERF_TYPE_BREAKPOINT;
+ attr.bp_len = HW_BREAKPOINT_LEN_1;
+ attr.bp_type = HW_BREAKPOINT_X;
+ attr.disabled = 1;
+ for (i = 0; i < 4; i++) {
+ breakinfo[i].pev = register_wide_hw_breakpoint(&attr,
+ kgdb_hw_bp);
+ if (IS_ERR(breakinfo[i].pev)) {
+ printk(KERN_ERR "kgdb: Could not allocate hw breakpoints\n");
+ breakinfo[i].pev = NULL;
+ kgdb_arch_exit();
+ return -1;
+ }
+ for_each_online_cpu(cpu) {
+ pevent = per_cpu_ptr(breakinfo[i].pev, cpu);
+ pevent[0]->hw.sample_period = 1;
+ if (pevent[0]->destroy != NULL) {
+ pevent[0]->destroy = NULL;
+ release_bp_slot(*pevent);
+ }
+ }
+ }
+ return ret;
}

/**
@@ -550,6 +631,13 @@ int kgdb_arch_init(void)
*/
void kgdb_arch_exit(void)
{
+ int i;
+ for (i = 0; i < 4; i++) {
+ if (breakinfo[i].pev) {
+ unregister_wide_hw_breakpoint(breakinfo[i].pev);
+ breakinfo[i].pev = NULL;
+ }
+ }
unregister_die_notifier(&kgdb_notifier);
}

diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index 87f2cc5..761fdd2 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -583,6 +583,9 @@ static void kgdb_wait(struct pt_regs *regs)
smp_wmb();
atomic_set(&cpu_in_kgdb[cpu], 1);

+ /* Disable any cpu specific hw breakpoints */
+ kgdb_disable_hw_debug(regs);
+
/* Wait till primary CPU is done with debugging */
while (atomic_read(&passive_cpu_wait[cpu]))
cpu_relax();
--
1.6.4.rc1

2010-01-27 22:32:28

by Jason Wessel

[permalink] [raw]
Subject: [PATCH 3/3] perf,hw_breakpoint,kgdb: No mutex taken for kernel debugger

The kernel debugger cannot use any mutex_lock() calls because it can
start the kernel running from an invalid context.

The possibility for a breakpoint reservation to be concurrently
processed at the time that kgdb interrupts the system is improbable.
As a safety check against this condition the kernel debugger will
prohibit updating the hardware breakpoint reservations and an error
will be returned to the end user.

Any time the kernel debugger reserves a hardware breakpoint it will be
a system wide reservation.

CC: Frederic Weisbecker <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: K.Prasad <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Alan Stern <[email protected]>
Signed-off-by: Jason Wessel <[email protected]>
---
arch/x86/kernel/kgdb.c | 49 +++++++++++++++++++++++++++++++++++++-
include/linux/hw_breakpoint.h | 2 +
kernel/hw_breakpoint.c | 52 +++++++++++++++++++++++++++++++++--------
3 files changed, 92 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 9f47cd3..7c3e929 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -239,6 +239,45 @@ static void kgdb_correct_hw_break(void)
hw_breakpoint_restore();
}

+static int hw_break_reserve_slot(int breakno)
+{
+ int cpu;
+ int cnt = 0;
+ struct perf_event **pevent;
+
+ for_each_online_cpu(cpu) {
+ cnt++;
+ pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
+ if (dbg_reserve_bp_slot(*pevent))
+ goto fail;
+ }
+
+ return 0;
+
+fail:
+ for_each_online_cpu(cpu) {
+ cnt--;
+ if (!cnt)
+ break;
+ pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
+ dbg_release_bp_slot(*pevent);
+ }
+ return -1;
+}
+
+static int hw_break_release_slot(int breakno)
+{
+ struct perf_event **pevent;
+ int ret;
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
+ ret = dbg_release_bp_slot(*pevent);
+ }
+ return ret;
+}
+
static int
kgdb_remove_hw_break(unsigned long addr, int len, enum kgdb_bptype bptype)
{
@@ -250,6 +289,10 @@ kgdb_remove_hw_break(unsigned long addr, int len, enum kgdb_bptype bptype)
if (i == 4)
return -1;

+ if (hw_break_release_slot(i)) {
+ printk(KERN_ERR "Cannot remove hw breakpoint at %lx\n", addr);
+ return -1;
+ }
breakinfo[i].enabled = 0;

return 0;
@@ -313,9 +356,13 @@ kgdb_set_hw_break(unsigned long addr, int len, enum kgdb_bptype bptype)
break;
#endif
default:
- return -1;
+ return -1;
}
breakinfo[i].addr = addr;
+ if (hw_break_reserve_slot(i)) {
+ breakinfo[i].addr = 0;
+ return -1;
+ }
breakinfo[i].enabled = 1;

return 0;
diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 41235c9..070ba06 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -75,6 +75,8 @@ extern int __register_perf_hw_breakpoint(struct perf_event *bp);
extern void unregister_hw_breakpoint(struct perf_event *bp);
extern void unregister_wide_hw_breakpoint(struct perf_event **cpu_events);

+extern int dbg_reserve_bp_slot(struct perf_event *bp);
+extern int dbg_release_bp_slot(struct perf_event *bp);
extern int reserve_bp_slot(struct perf_event *bp);
extern void release_bp_slot(struct perf_event *bp);

diff --git a/kernel/hw_breakpoint.c b/kernel/hw_breakpoint.c
index 50dbd59..e5aa380 100644
--- a/kernel/hw_breakpoint.c
+++ b/kernel/hw_breakpoint.c
@@ -243,38 +243,70 @@ static void toggle_bp_slot(struct perf_event *bp, bool enable)
* ((per_cpu(nr_bp_flexible, *) > 1) + max(per_cpu(nr_cpu_bp_pinned, *))
* + max(per_cpu(nr_task_bp_pinned, *))) < HBP_NUM
*/
-int reserve_bp_slot(struct perf_event *bp)
+static int __reserve_bp_slot(struct perf_event *bp)
{
struct bp_busy_slots slots = {0};
- int ret = 0;
-
- mutex_lock(&nr_bp_mutex);

fetch_bp_busy_slots(&slots, bp);

/* Flexible counters need to keep at least one slot */
- if (slots.pinned + (!!slots.flexible) == HBP_NUM) {
- ret = -ENOSPC;
- goto end;
- }
+ if (slots.pinned + (!!slots.flexible) == HBP_NUM)
+ return -ENOSPC;

toggle_bp_slot(bp, true);

-end:
+ return 0;
+}
+
+int reserve_bp_slot(struct perf_event *bp)
+{
+ int ret;
+
+ mutex_lock(&nr_bp_mutex);
+
+ ret = __reserve_bp_slot(bp);
+
mutex_unlock(&nr_bp_mutex);

return ret;
}

+static void __release_bp_slot(struct perf_event *bp)
+{
+ toggle_bp_slot(bp, false);
+}
+
void release_bp_slot(struct perf_event *bp)
{
mutex_lock(&nr_bp_mutex);

- toggle_bp_slot(bp, false);
+ __release_bp_slot(bp);

mutex_unlock(&nr_bp_mutex);
}

+/*
+ * Allow the kernel debugger to reserve breakpoint slots without
+ * taking a lock using the dbg_* variant of for the reserve and
+ * release breakpoint slots.
+ */
+int dbg_reserve_bp_slot(struct perf_event *bp)
+{
+ if (mutex_is_locked(&nr_bp_mutex))
+ return -1;
+
+ return __reserve_bp_slot(bp);
+}
+
+int dbg_release_bp_slot(struct perf_event *bp)
+{
+ if (mutex_is_locked(&nr_bp_mutex))
+ return -1;
+
+ __release_bp_slot(bp);
+
+ return 0;
+}

int register_perf_hw_breakpoint(struct perf_event *bp)
{
--
1.6.4.rc1

2010-01-28 17:33:13

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 3/3] perf,hw_breakpoint,kgdb: No mutex taken for kernel debugger

On Wed, Jan 27, 2010 at 04:25:24PM -0600, Jason Wessel wrote:
> The kernel debugger cannot use any mutex_lock() calls because it can
> start the kernel running from an invalid context.
>
> The possibility for a breakpoint reservation to be concurrently
> processed at the time that kgdb interrupts the system is improbable.
> As a safety check against this condition the kernel debugger will
> prohibit updating the hardware breakpoint reservations and an error
> will be returned to the end user.
>
> Any time the kernel debugger reserves a hardware breakpoint it will be
> a system wide reservation.
>
> CC: Frederic Weisbecker <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: K.Prasad <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> CC: Alan Stern <[email protected]>
> Signed-off-by: Jason Wessel <[email protected]>
> ---
> arch/x86/kernel/kgdb.c | 49 +++++++++++++++++++++++++++++++++++++-
> include/linux/hw_breakpoint.h | 2 +
> kernel/hw_breakpoint.c | 52 +++++++++++++++++++++++++++++++++--------
> 3 files changed, 92 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index 9f47cd3..7c3e929 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -239,6 +239,45 @@ static void kgdb_correct_hw_break(void)
> hw_breakpoint_restore();
> }
>
> +static int hw_break_reserve_slot(int breakno)
> +{
> + int cpu;
> + int cnt = 0;
> + struct perf_event **pevent;
> +
> + for_each_online_cpu(cpu) {
> + cnt++;
> + pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
> + if (dbg_reserve_bp_slot(*pevent))
> + goto fail;
> + }
> +
> + return 0;
> +
> +fail:
> + for_each_online_cpu(cpu) {
> + cnt--;
> + if (!cnt)
> + break;
> + pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
> + dbg_release_bp_slot(*pevent);
> + }
> + return -1;
> +}
> +
> +static int hw_break_release_slot(int breakno)
> +{
> + struct perf_event **pevent;
> + int ret;
> + int cpu;
> +
> + for_each_online_cpu(cpu) {
> + pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
> + ret = dbg_release_bp_slot(*pevent);



So, you are missing some return errors there. Actually, a slot
release shouldn't return an error.



> +/*
> + * Allow the kernel debugger to reserve breakpoint slots without
> + * taking a lock using the dbg_* variant of for the reserve and
> + * release breakpoint slots.
> + */
> +int dbg_reserve_bp_slot(struct perf_event *bp)
> +{
> + if (mutex_is_locked(&nr_bp_mutex))
> + return -1;
> +
> + return __reserve_bp_slot(bp);
> +}
> +
> +int dbg_release_bp_slot(struct perf_event *bp)
> +{
> + if (mutex_is_locked(&nr_bp_mutex))
> + return -1;
> +
> + __release_bp_slot(bp);
> +
> + return 0;
> +}



Ok, best effort fits well for reserve, but is certainly not
suitable for release. We can't leave a fake occupied slot like
this. If it fails, we should do this asynchronously, using the
usual release_bp_slot, may be toward the workqueues.




>
> int register_perf_hw_breakpoint(struct perf_event *bp)
> {
> --
> 1.6.4.rc1
>

2010-01-28 17:50:18

by Jason Wessel

[permalink] [raw]
Subject: Re: [PATCH 3/3] perf,hw_breakpoint,kgdb: No mutex taken for kerneldebugger

Frederic Weisbecker wrote:
> On Wed, Jan 27, 2010 at 04:25:24PM -0600, Jason Wessel wrote:
>
>> The kernel debugger cannot use any mutex_lock() calls because it can
>> start the kernel running from an invalid context.
>>
>> The possibility for a breakpoint reservation to be concurrently
>> processed at the time that kgdb interrupts the system is improbable.
>> As a safety check against this condition the kernel debugger will
>> prohibit updating the hardware breakpoint reservations and an error
>> will be returned to the end user.
>>
>> Any time the kernel debugger reserves a hardware breakpoint it will be
>> a system wide reservation.
>>
>> CC: Frederic Weisbecker <[email protected]>
>> CC: Ingo Molnar <[email protected]>
>> CC: K.Prasad <[email protected]>
>> CC: Peter Zijlstra <[email protected]>
>> CC: Alan Stern <[email protected]>
>> Signed-off-by: Jason Wessel <[email protected]>
>> ---
>> arch/x86/kernel/kgdb.c | 49 +++++++++++++++++++++++++++++++++++++-
>> include/linux/hw_breakpoint.h | 2 +
>> kernel/hw_breakpoint.c | 52 +++++++++++++++++++++++++++++++++--------
>> 3 files changed, 92 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
>> index 9f47cd3..7c3e929 100644
>> --- a/arch/x86/kernel/kgdb.c
>> +++ b/arch/x86/kernel/kgdb.c
>> @@ -239,6 +239,45 @@ static void kgdb_correct_hw_break(void)
>> hw_breakpoint_restore();
>> }
>>
>> +static int hw_break_reserve_slot(int breakno)
>> +{
>> + int cpu;
>> + int cnt = 0;
>> + struct perf_event **pevent;
>> +
>> + for_each_online_cpu(cpu) {
>> + cnt++;
>> + pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
>> + if (dbg_reserve_bp_slot(*pevent))
>> + goto fail;
>> + }
>> +
>> + return 0;
>> +
>> +fail:
>> + for_each_online_cpu(cpu) {
>> + cnt--;
>> + if (!cnt)
>> + break;
>> + pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
>> + dbg_release_bp_slot(*pevent);
>> + }
>> + return -1;
>> +}
>> +
>> +static int hw_break_release_slot(int breakno)
>> +{
>> + struct perf_event **pevent;
>> + int ret;
>> + int cpu;
>> +
>> + for_each_online_cpu(cpu) {
>> + pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
>> + ret = dbg_release_bp_slot(*pevent);
>>
>
>
>
> So, you are missing some return errors there. Actually, a slot
> release shouldn't return an error.
>
>
>

This is a trick so to speak. Either all the slot releases will return
0 or -1 depending on if the mutex is available, so it is not really
missed.

Certainly I can change this to just quit immediately on error.

>
>> +/*
>> + * Allow the kernel debugger to reserve breakpoint slots without
>> + * taking a lock using the dbg_* variant of for the reserve and
>> + * release breakpoint slots.
>> + */
>> +int dbg_reserve_bp_slot(struct perf_event *bp)
>> +{
>> + if (mutex_is_locked(&nr_bp_mutex))
>> + return -1;
>> +
>> + return __reserve_bp_slot(bp);
>> +}
>> +
>> +int dbg_release_bp_slot(struct perf_event *bp)
>> +{
>> + if (mutex_is_locked(&nr_bp_mutex))
>> + return -1;
>> +
>> + __release_bp_slot(bp);
>> +
>> + return 0;
>> +}
>>
>
>
>
> Ok, best effort fits well for reserve, but is certainly not
> suitable for release. We can't leave a fake occupied slot like
> this. If it fails, we should do this asynchronously, using the
> usual release_bp_slot, may be toward the workqueues.
>
>
>
>

If it fails the debugger tried to remove it again later. It seems to
me like it is a don't care corner case. You get a printk if it ever
does happen (which it really shouldn't).


>
>>
>> int register_perf_hw_breakpoint(struct perf_event *bp)
>> {
>> --
>> 1.6.4.rc1
>>
>>
>
>

2010-01-28 20:10:00

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 3/3] perf,hw_breakpoint,kgdb: No mutex taken for kerneldebugger

On Thu, Jan 28, 2010 at 11:49:14AM -0600, Jason Wessel wrote:
> Frederic Weisbecker wrote:
> >> +static int hw_break_release_slot(int breakno)
> >> +{
> >> + struct perf_event **pevent;
> >> + int ret;
> >> + int cpu;
> >> +
> >> + for_each_online_cpu(cpu) {
> >> + pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
> >> + ret = dbg_release_bp_slot(*pevent);
> >>
> >
> >
> >
> > So, you are missing some return errors there. Actually, a slot
> > release shouldn't return an error.
> >
> >
> >
>
> This is a trick so to speak. Either all the slot releases will return
> 0 or -1 depending on if the mutex is available, so it is not really
> missed.



Oh right, I forgot everything was freezed here :)



> > Ok, best effort fits well for reserve, but is certainly not
> > suitable for release. We can't leave a fake occupied slot like
> > this. If it fails, we should do this asynchronously, using the
> > usual release_bp_slot, may be toward the workqueues.
> >
> >
> >
> >
>
> If it fails the debugger tried to remove it again later. It seems to
> me like it is a don't care corner case. You get a printk if it ever
> does happen (which it really shouldn't).



Yeah truly it's a corner case, especially if the debugger can handle that
later.

May be just add a comment so that future reviewers don't stick to
this part.


Thanks!

2010-01-28 20:40:28

by Jason Wessel

[permalink] [raw]
Subject: Re: [PATCH 3/3] perf,hw_breakpoint,kgdb: No mutex taken forkerneldebugger

Frederic Weisbecker wrote:

>> If it fails the debugger tried to remove it again later. It seems to
>> me like it is a don't care corner case. You get a printk if it ever
>> does happen (which it really shouldn't).
>
>
>
> Yeah truly it's a corner case, especially if the debugger can handle that
> later.
>
> May be just add a comment so that future reviewers don't stick to
> this part.


If you approve, I'll add your ack.


It looks like this now:


+static int hw_break_release_slot(int breakno)
+{
+ struct perf_event **pevent;
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ pevent = per_cpu_ptr(breakinfo[breakno].pev, cpu);
+ if (dbg_release_bp_slot(*pevent))
+ /*
+ * The debugger is responisble for handing the retry on
+ * remove failure.
+ */
+ return -1;
+ }
+ return 0;
+}
+



Thanks,
Jason.

2010-01-29 08:07:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/3] softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume


* Jason Wessel <[email protected]> wrote:

> @@ -118,6 +125,14 @@ void softlockup_tick(void)
> }
>
> if (touch_ts == 0) {
> + if (unlikely(per_cpu(softlock_touch_sync, this_cpu))) {
> + /*
> + * If the time stamp was touched atomically
> + * make sure the scheduler tick is up to date.
> + */
> + per_cpu(softlock_touch_sync, this_cpu) = false;
> + sched_clock_tick();
> + }
> __touch_softlockup_watchdog();
> return;

Shouldnt just all of touch_softlockup_watchdog() gain this new
sched_clock_tick() call, instead of doing this ugly flaggery? Or would that
lock up or misbehave in other ways in some cases?

That would also make the patch much simpler i guess, as we'd only have the
chunk above.

Ingo

2010-01-29 14:52:12

by Jason Wessel

[permalink] [raw]
Subject: Re: [PATCH 1/3] softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume

Ingo Molnar wrote:
> * Jason Wessel <[email protected]> wrote:
>
>
>> @@ -118,6 +125,14 @@ void softlockup_tick(void)
>> }
>>
>> if (touch_ts == 0) {
>> + if (unlikely(per_cpu(softlock_touch_sync, this_cpu))) {
>> + /*
>> + * If the time stamp was touched atomically
>> + * make sure the scheduler tick is up to date.
>> + */
>> + per_cpu(softlock_touch_sync, this_cpu) = false;
>> + sched_clock_tick();
>> + }
>> __touch_softlockup_watchdog();
>> return;
>>
>
> Shouldnt just all of touch_softlockup_watchdog() gain this new
> sched_clock_tick() call, instead of doing this ugly flaggery? Or would that
> lock up or misbehave in other ways in some cases?
>
> That would also make the patch much simpler i guess, as we'd only have the
> chunk above.
>

We have already been down that road, and it breaks other cases.

http://lkml.org/lkml/2009/7/28/204

Specifically the test case of:

echo 3 > /proc/sys/kernel/softlockup_thresh

And then some kernel code in a thread like:
local_irq_disable();
printk("Disable local irq for 11 seconds\n");
mdelay(11000);
local_irq_enable();


I could consider calling sched_cpu_clock() before returning the kernel
to normal execution, but that didn't look very safe to call from the
exception context, which is why it was delayed until the next time the
soft lockup code ran.

Resuming from a long sleep is a ugly problem, so I am open to short term
and long term suggestions, including a polling time API (obviously we
would prefer not to go down that rat hole :-)

Jason.

2010-02-01 05:53:22

by Dongdong Deng

[permalink] [raw]
Subject: Re: [PATCH 1/3] softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume

On Fri, Jan 29, 2010 at 10:51 PM, Jason Wessel
<[email protected]> wrote:
> Ingo Molnar wrote:
>> * Jason Wessel <[email protected]> wrote:
>>
>>
>>> @@ -118,6 +125,14 @@ void softlockup_tick(void)
>>>      }
>>>
>>>      if (touch_ts == 0) {
>>> +            if (unlikely(per_cpu(softlock_touch_sync, this_cpu))) {
>>> +                    /*
>>> +                     * If the time stamp was touched atomically
>>> +                     * make sure the scheduler tick is up to date.
>>> +                     */
>>> +                    per_cpu(softlock_touch_sync, this_cpu) = false;
>>> +                    sched_clock_tick();
>>> +            }
>>>              __touch_softlockup_watchdog();
>>>              return;
>>>
>>
>> Shouldnt just all of touch_softlockup_watchdog() gain this new
>> sched_clock_tick() call, instead of doing this ugly flaggery? Or would that
>> lock up or misbehave in other ways in some cases?
>>
>> That would also make the patch much simpler i guess, as we'd only have the
>> chunk above.
>>
>
> We have already been down that road, and it breaks other cases.
>
> http://lkml.org/lkml/2009/7/28/204
>
> Specifically the test case of:
>
> echo 3 > /proc/sys/kernel/softlockup_thresh
>
> And then some kernel code in a thread like:
>        local_irq_disable();
>        printk("Disable local irq for 11 seconds\n");
>        mdelay(11000);
>        local_irq_enable();

Hi Jason,

Maybe this problem was fixed by
commit baf48f6577e581a9adb8fe849dc80e24b21d171d - "softlock: fix false
panic which can occur if softlockup_thresh is reduced".

Thanks,
Dongdong

>
>
> I could consider calling sched_cpu_clock() before returning the kernel
> to normal execution, but that didn't look very safe to call from the
> exception context, which is why it was delayed until the next time the
> soft lockup code ran.
>
> Resuming from a long sleep is a ugly problem, so I am open to short term
> and long term suggestions, including a polling time API (obviously we
> would prefer not to go down that rat hole :-)
>
> Jason.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

2010-02-01 06:06:19

by Jason Wessel

[permalink] [raw]
Subject: Re: [PATCH 1/3] softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume

Dongdong Deng wrote:
> On Fri, Jan 29, 2010 at 10:51 PM, Jason Wessel
> <[email protected]> wrote:
>
>> echo 3 > /proc/sys/kernel/softlockup_thresh
>>
>> And then some kernel code in a thread like:
>> local_irq_disable();
>> printk("Disable local irq for 11 seconds\n");
>> mdelay(11000);
>> local_irq_enable();
>>
>
> Hi Jason,
>
> Maybe this problem was fixed by
> commit baf48f6577e581a9adb8fe849dc80e24b21d171d - "softlock: fix false
> panic which can occur if softlockup_thresh is reduced".
>


That is not the same problem. The fix you referenced is a corner case
where you end up with the stack trace at the point in time you reduce
the threshold. The only reason I reduce the threshold in the first
place is just to shorten the amount of time it takes to observe the problem.

You can just change the numbers for the mdelay and use the default
softlockup threshold values and still see the problem I reported.

Jason.

2010-02-01 06:41:10

by Dongdong Deng

[permalink] [raw]
Subject: Re: [PATCH 1/3] softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume

On Mon, Feb 1, 2010 at 2:05 PM, Jason Wessel <[email protected]> wrote:
> Dongdong Deng wrote:
>> On Fri, Jan 29, 2010 at 10:51 PM, Jason Wessel
>> <[email protected]> wrote:
>>
>>> echo 3 > /proc/sys/kernel/softlockup_thresh
>>>
>>> And then some kernel code in a thread like:
>>>        local_irq_disable();
>>>        printk("Disable local irq for 11 seconds\n");
>>>        mdelay(11000);
>>>        local_irq_enable();
>>>
>>
>> Hi Jason,
>>
>> Maybe this problem was fixed by
>> commit baf48f6577e581a9adb8fe849dc80e24b21d171d - "softlock: fix false
>> panic which can occur if softlockup_thresh is reduced".
>>
>
>
> That is not the same problem.   The fix you referenced is a corner case
> where you end up with the stack trace at the point in time you reduce
> the threshold.  The only reason I reduce the threshold in the first
> place is just to shorten the amount of time it takes to observe the problem.

Thanks for your nice explanation. :)

Dongdong

> You can just change the numbers for the mdelay and use the default
> softlockup threshold values and still see the problem I reported.
>
> Jason.
>

2010-02-01 07:28:26

by Jason Wessel

[permalink] [raw]
Subject: [tip:core/urgent] softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume

Commit-ID: d6ad3e286d2c075a60b9f11075a2c55aeeeca2ad
Gitweb: http://git.kernel.org/tip/d6ad3e286d2c075a60b9f11075a2c55aeeeca2ad
Author: Jason Wessel <[email protected]>
AuthorDate: Wed, 27 Jan 2010 16:25:22 -0600
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 1 Feb 2010 08:22:32 +0100

softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume

When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, sched_clock() gets
the time from hardware such as the TSC on x86. In this
configuration kgdb will report a softlock warning message on
resuming or detaching from a debug session.

Sequence of events in the problem case:

1) "cpu sched clock" and "hardware time" are at 100 sec prior
to a call to kgdb_handle_exception()

2) Debugger waits in kgdb_handle_exception() for 80 sec and on
exit the following is called ... touch_softlockup_watchdog() -->
__raw_get_cpu_var(touch_timestamp) = 0;

3) "cpu sched clock" = 100s (it was not updated, because the
interrupt was disabled in kgdb) but the "hardware time" = 180 sec

4) The first timer interrupt after resuming from
kgdb_handle_exception updates the watchdog from the "cpu sched clock"

update_process_times() { ... run_local_timers() -->
softlockup_tick() --> check (touch_timestamp == 0) (it is "YES"
here, we have set "touch_timestamp = 0" at kgdb) -->
__touch_softlockup_watchdog() ***(A)--> reset "touch_timestamp"
to "get_timestamp()" (Here, the "touch_timestamp" will still be
set to 100s.) ...

scheduler_tick() ***(B)--> sched_clock_tick() (update "cpu sched
clock" to "hardware time" = 180s) ... }

5) The Second timer interrupt handler appears to have a large
jump and trips the softlockup warning.

update_process_times() { ... run_local_timers() -->
softlockup_tick() --> "cpu sched clock" - "touch_timestamp" =
180s-100s > 60s --> printk "soft lockup error messages" ... }

note: ***(A) reset "touch_timestamp" to
"get_timestamp(this_cpu)"

Why is "touch_timestamp" 100 sec, instead of 180 sec?

When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, the call trace of
get_timestamp() is:

get_timestamp(this_cpu)
-->cpu_clock(this_cpu)
-->sched_clock_cpu(this_cpu)
-->__update_sched_clock(sched_clock_data, now)

The __update_sched_clock() function uses the GTOD tick value to
create a window to normalize the "now" values. So if "now"
value is too big for sched_clock_data, it will be ignored.

The fix is to invoke sched_clock_tick() to update "cpu sched
clock" in order to recover from this state. This is done by
introducing the function touch_softlockup_watchdog_sync(). This
allows kgdb to request that the sched clock is updated when the
watchdog thread runs the first time after a resume from kgdb.

[[email protected]: Use per cpu instead of an array]
Signed-off-by: Jason Wessel <[email protected]>
Signed-off-by: Dongdong Deng <[email protected]>
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/sched.h | 4 ++++
kernel/kgdb.c | 6 +++---
kernel/softlockup.c | 15 +++++++++++++++
3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6f7bba9..8923215 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -310,6 +310,7 @@ extern void sched_show_task(struct task_struct *p);
#ifdef CONFIG_DETECT_SOFTLOCKUP
extern void softlockup_tick(void);
extern void touch_softlockup_watchdog(void);
+extern void touch_softlockup_watchdog_sync(void);
extern void touch_all_softlockup_watchdogs(void);
extern int proc_dosoftlockup_thresh(struct ctl_table *table, int write,
void __user *buffer,
@@ -323,6 +324,9 @@ static inline void softlockup_tick(void)
static inline void touch_softlockup_watchdog(void)
{
}
+static inline void touch_softlockup_watchdog_sync(void)
+{
+}
static inline void touch_all_softlockup_watchdogs(void)
{
}
diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index 2eb517e..87f2cc5 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -596,7 +596,7 @@ static void kgdb_wait(struct pt_regs *regs)

/* Signal the primary CPU that we are done: */
atomic_set(&cpu_in_kgdb[cpu], 0);
- touch_softlockup_watchdog();
+ touch_softlockup_watchdog_sync();
clocksource_touch_watchdog();
local_irq_restore(flags);
}
@@ -1450,7 +1450,7 @@ acquirelock:
(kgdb_info[cpu].task &&
kgdb_info[cpu].task->pid != kgdb_sstep_pid) && --sstep_tries) {
atomic_set(&kgdb_active, -1);
- touch_softlockup_watchdog();
+ touch_softlockup_watchdog_sync();
clocksource_touch_watchdog();
local_irq_restore(flags);

@@ -1550,7 +1550,7 @@ kgdb_restore:
}
/* Free kgdb_active */
atomic_set(&kgdb_active, -1);
- touch_softlockup_watchdog();
+ touch_softlockup_watchdog_sync();
clocksource_touch_watchdog();
local_irq_restore(flags);

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index d225790..0d4c789 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -25,6 +25,7 @@ static DEFINE_SPINLOCK(print_lock);
static DEFINE_PER_CPU(unsigned long, softlockup_touch_ts); /* touch timestamp */
static DEFINE_PER_CPU(unsigned long, softlockup_print_ts); /* print timestamp */
static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
+static DEFINE_PER_CPU(bool, softlock_touch_sync);

static int __read_mostly did_panic;
int __read_mostly softlockup_thresh = 60;
@@ -79,6 +80,12 @@ void touch_softlockup_watchdog(void)
}
EXPORT_SYMBOL(touch_softlockup_watchdog);

+void touch_softlockup_watchdog_sync(void)
+{
+ __raw_get_cpu_var(softlock_touch_sync) = true;
+ __raw_get_cpu_var(softlockup_touch_ts) = 0;
+}
+
void touch_all_softlockup_watchdogs(void)
{
int cpu;
@@ -118,6 +125,14 @@ void softlockup_tick(void)
}

if (touch_ts == 0) {
+ if (unlikely(per_cpu(softlock_touch_sync, this_cpu))) {
+ /*
+ * If the time stamp was touched atomically
+ * make sure the scheduler tick is up to date.
+ */
+ per_cpu(softlock_touch_sync, this_cpu) = false;
+ sched_clock_tick();
+ }
__touch_softlockup_watchdog();
return;
}