kgdb needs IPI to be sent and handled before perf
or anything else NMI, otherwise kgdb hangs with bootup
self-tests (found on P4 HT SMP machine). Raise its priority
so that we're called first in a notifier chain.
Reported-by: Don Zickus <[email protected]>
Tested-by: Lin Ming <[email protected]>
CC: Jason Wessel <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
---
Don, Jason, take a look please.
arch/x86/kernel/kgdb.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Index: linux-2.6.git/arch/x86/kernel/kgdb.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/kgdb.c
+++ linux-2.6.git/arch/x86/kernel/kgdb.c
@@ -592,9 +592,12 @@ static struct notifier_block kgdb_notifi
.notifier_call = kgdb_notify,
/*
- * Lowest-prio notifier priority, we want to be notified last:
+ * We might need to send an IPI and
+ * do cpu roundup before anything else
+ * in notifier chain so high priority
+ * is needed.
*/
- .priority = NMI_LOCAL_LOW_PRIOR,
+ .priority = NMI_LOCAL_HIGH_PRIOR,
};
/**
--
Cyrill
On Wed, Mar 23, 2011 at 11:32:33PM +0300, Cyrill Gorcunov wrote:
> kgdb needs IPI to be sent and handled before perf
> or anything else NMI, otherwise kgdb hangs with bootup
> self-tests (found on P4 HT SMP machine). Raise its priority
> so that we're called first in a notifier chain.
This is only because P4 perf swallows all the nmis. If that is the case
you are arguing to make the perf nmi at the bottom of the priority list,
which is probably not where it should be due to its volume.
I am stuck debugging P4 problems again for RHEL-6 and I noticed a small
change that is needed (didn't help my problem though) but it looked like
an oversight that might help your case.
Cheers,
Don
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 3769ac8..d945314 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -777,6 +787,7 @@ static inline int p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc)
* the counter has reached zero value and continued counting before
* real NMI signal was received:
*/
+ rdmsrl(hwc->event_base, v);
if (!(v & ARCH_P4_UNFLAGGED_BIT))
return 1;
On 03/24/2011 12:16 AM, Don Zickus wrote:
> On Wed, Mar 23, 2011 at 11:32:33PM +0300, Cyrill Gorcunov wrote:
>> kgdb needs IPI to be sent and handled before perf
>> or anything else NMI, otherwise kgdb hangs with bootup
>> self-tests (found on P4 HT SMP machine). Raise its priority
>> so that we're called first in a notifier chain.
>
> This is only because P4 perf swallows all the nmis. If that is the case
> you are arguing to make the perf nmi at the bottom of the priority list,
> which is probably not where it should be due to its volume.
The problem is that there IPI wait cycle inside kgdb and we are to be sure
to handle it early. And perf eventually can consume kgdb NMI which would
lead to infinite wait loop so I don't see any easier way to deal with it.
>
> I am stuck debugging P4 problems again for RHEL-6 and I noticed a small
> change that is needed (didn't help my problem though) but it looked like
> an oversight that might help your case.
>
> Cheers,
> Don
>
>
> diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
> index 3769ac8..d945314 100644
> --- a/arch/x86/kernel/cpu/perf_event_p4.c
> +++ b/arch/x86/kernel/cpu/perf_event_p4.c
> @@ -777,6 +787,7 @@ static inline int p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc)
> * the counter has reached zero value and continued counting before
> * real NMI signal was received:
> */
> + rdmsrl(hwc->event_base, v);
> if (!(v & ARCH_P4_UNFLAGGED_BIT))
> return 1;
>
Good catch! Ack! It seems to be escaped in first place (I fear I forgot to
refresh patch before send it). Mind to send the full patch to Ingo?
--
Cyrill
On Thu, Mar 24, 2011 at 4:32 AM, Cyrill Gorcunov <[email protected]> wrote:
> kgdb needs IPI to be sent and handled before perf
> or anything else NMI, otherwise kgdb hangs with bootup
> self-tests (found on P4 HT SMP machine). Raise its priority
> so that we're called first in a notifier chain.
>
> Reported-by: Don Zickus <[email protected]>
> Tested-by: Lin Ming <[email protected]>
> CC: Jason Wessel <[email protected]>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> ---
>
> Don, Jason, take a look please.
>
> arch/x86/kernel/kgdb.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.git/arch/x86/kernel/kgdb.c
> =====================================================================
> --- linux-2.6.git.orig/arch/x86/kernel/kgdb.c
> +++ linux-2.6.git/arch/x86/kernel/kgdb.c
> @@ -592,9 +592,12 @@ static struct notifier_block kgdb_notifi
> .notifier_call = kgdb_notify,
>
> /*
> - * Lowest-prio notifier priority, we want to be notified last:
> + * We might need to send an IPI and
> + * do cpu roundup before anything else
> + * in notifier chain so high priority
> + * is needed.
> */
> - .priority = NMI_LOCAL_LOW_PRIOR,
> + .priority = NMI_LOCAL_HIGH_PRIOR,
> };
CC: kgdb maillist.
I quote Jason's early email to explain why debugger tends to
set the low level of die_notifier.
"The original thinking was that if you are using a low level debugger
that you would want to stop on such a event or breakpoint because there
is nothing else handling it and your system is about to print an oops
message."
For keeping above purpose, I have a "ugly" proposal that splitting
kgdb die handling to two parts.
1: The first part with HIGH priority and just handle NMI event,
if the NMI event is not belong kgdb, it return NOTIFY_DONE and pass to others.
2: The second part with low priority to handling others.
Due to above handling logic could let kgdb source get complexly, I
couldn't make sure
it is a suitable method here, if it is OK, I can send a formal patch. :-)
$git diff
-----------
arch/x86/kernel/kgdb.c | 67 +++++++++++++++++++++++++++++++++++------------
1 files changed, 50 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index dba0b36..afd18db 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -516,23 +516,6 @@ static int __kgdb_notify(struct die_args *args,
unsigned long cmd)
struct pt_regs *regs = args->regs;
switch (cmd) {
- case DIE_NMI:
- if (atomic_read(&kgdb_active) != -1) {
- /* KGDB CPU roundup */
- kgdb_nmicallback(raw_smp_processor_id(), regs);
- was_in_debug_nmi[raw_smp_processor_id()] = 1;
- touch_nmi_watchdog();
- return NOTIFY_STOP;
- }
- return NOTIFY_DONE;
-
- case DIE_NMIUNKNOWN:
- if (was_in_debug_nmi[raw_smp_processor_id()]) {
- was_in_debug_nmi[raw_smp_processor_id()] = 0;
- return NOTIFY_STOP;
- }
- return NOTIFY_DONE;
-
case DIE_DEBUG:
if (atomic_read(&kgdb_cpu_doing_single_step) != -1) {
if (user_mode(regs))
@@ -588,6 +571,52 @@ kgdb_notify(struct notifier_block *self, unsigned
long cmd, void *ptr)
return ret;
}
+static int
+kgdb_nmi_notify(struct notifier_block *self, unsigned long cmd, void *ptr)
+{
+ unsigned long flags;
+ struct pt_regs *regs = ((struct die_args*)ptr)->regs;
+ int ret = NOTIFY_DONE;
+
+ local_irq_save(flags);
+
+ switch (cmd) {
+ case DIE_NMI:
+ if (atomic_read(&kgdb_active) != -1) {
+ /* KGDB CPU roundup */
+ kgdb_nmicallback(raw_smp_processor_id(), regs);
+ was_in_debug_nmi[raw_smp_processor_id()] = 1;
+ touch_nmi_watchdog();
+ ret = NOTIFY_STOP;
+ }
+ break;
+
+ case DIE_NMIUNKNOWN:
+ if (was_in_debug_nmi[raw_smp_processor_id()]) {
+ was_in_debug_nmi[raw_smp_processor_id()] = 0;
+ ret = NOTIFY_STOP;
+ }
+ break;
+ default:
+ break;
+ }
+
+ local_irq_restore(flags);
+ return ret;
+}
+
+static struct notifier_block kgdb_nmi_notifier = {
+ .notifier_call = kgdb_nmi_notify,
+
+ /*
+ * We might need to send an IPI and
+ * do cpu roundup before anything else
+ * in notifier chain so high priority
+ * is needed.
+ */
+ .priority = NMI_LOCAL_HIGH_PRIOR,
+};
+
static struct notifier_block kgdb_notifier = {
.notifier_call = kgdb_notify,
@@ -605,6 +634,9 @@ static struct notifier_block kgdb_notifier = {
*/
int kgdb_arch_init(void)
{
+ int err = register_die_notifier(&kgdb_nmi_notifier);
+ if (err)
+ return err;
return register_die_notifier(&kgdb_notifier);
}
@@ -673,6 +705,7 @@ void kgdb_arch_exit(void)
breakinfo[i].pev = NULL;
}
}
+ unregister_die_notifier(&kgdb_nmi_notifier);
unregister_die_notifier(&kgdb_notifier);
}
--
1.6.0.4
If Jason is ok with such splitting -- I dont mind either ;)
/sorry for toppost/
On Thursday, March 24, 2011, Dongdong Deng <[email protected]> wrote:
> On Thu, Mar 24, 2011 at 4:32 AM, Cyrill Gorcunov <[email protected]> wrote:
>> kgdb needs IPI to be sent and handled before perf
>> or anything else NMI, otherwise kgdb hangs with bootup
>> self-tests (found on P4 HT SMP machine). Raise its priority
>> so that we're called first in a notifier chain.
>>
>> Reported-by: Don Zickus <[email protected]>
>> Tested-by: Lin Ming <[email protected]>
>> CC: Jason Wessel <[email protected]>
>> Signed-off-by: Cyrill Gorcunov <[email protected]>
>> ---
>>
>> Don, Jason, take a look please.
>>
>> ?arch/x86/kernel/kgdb.c | ? ?7 +++++--
>> ?1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6.git/arch/x86/kernel/kgdb.c
>> =====================================================================
>> --- linux-2.6.git.orig/arch/x86/kernel/kgdb.c
>> +++ linux-2.6.git/arch/x86/kernel/kgdb.c
>> @@ -592,9 +592,12 @@ static struct notifier_block kgdb_notifi
>> ? ? ? ?.notifier_call ?= kgdb_notify,
>>
>> ? ? ? ?/*
>> - ? ? ? ?* Lowest-prio notifier priority, we want to be notified last:
>> + ? ? ? ?* We might need to send an IPI and
>> + ? ? ? ?* do cpu roundup before anything else
>> + ? ? ? ?* in notifier chain so high priority
>> + ? ? ? ?* is needed.
>> ? ? ? ? */
>> - ? ? ? .priority ? ? ? = NMI_LOCAL_LOW_PRIOR,
>> + ? ? ? .priority ? ? ? = NMI_LOCAL_HIGH_PRIOR,
>> ?};
>
> CC: kgdb maillist.
>
> I quote Jason's early email to explain why debugger tends to
> set the low level of die_notifier.
>
> "The original thinking was that if you are using a low level debugger
> that you would want to stop on such a event or breakpoint because there
> is nothing else handling it and your system is about to print an oops
> message."
>
> For keeping above purpose, I have a "ugly" proposal that splitting
> kgdb die handling to two parts.
>
> 1: The first part with HIGH priority and just handle NMI event,
> ?if the NMI event is not belong kgdb, it return NOTIFY_DONE and pass to others.
>
> 2: The second part with low priority to handling others.
>
> Due to above handling logic could let kgdb source get complexly, I
> couldn't make sure
> it is a suitable method here, if it is OK, I can send a formal patch. :-)
>
>
> $git diff
> -----------
> arch/x86/kernel/kgdb.c | ? 67 +++++++++++++++++++++++++++++++++++------------
> ?1 files changed, 50 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index dba0b36..afd18db 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -516,23 +516,6 @@ static int __kgdb_notify(struct die_args *args,
> unsigned long cmd)
> ? ? ? ?struct pt_regs *regs = args->regs;
>
> ? ? ? ?switch (cmd) {
> - ? ? ? case DIE_NMI:
> - ? ? ? ? ? ? ? if (atomic_read(&kgdb_active) != -1) {
> - ? ? ? ? ? ? ? ? ? ? ? /* KGDB CPU roundup */
> - ? ? ? ? ? ? ? ? ? ? ? kgdb_nmicallback(raw_smp_processor_id(), regs);
> - ? ? ? ? ? ? ? ? ? ? ? was_in_debug_nmi[raw_smp_processor_id()] = 1;
> - ? ? ? ? ? ? ? ? ? ? ? touch_nmi_watchdog();
> - ? ? ? ? ? ? ? ? ? ? ? return NOTIFY_STOP;
> - ? ? ? ? ? ? ? }
> - ? ? ? ? ? ? ? return NOTIFY_DONE;
> -
> - ? ? ? case DIE_NMIUNKNOWN:
> - ? ? ? ? ? ? ? if (was_in_debug_nmi[raw_smp_processor_id()]) {
> - ? ? ? ? ? ? ? ? ? ? ? was_in_debug_nmi[raw_smp_processor_id()] = 0;
> - ? ? ? ? ? ? ? ? ? ? ? return NOTIFY_STOP;
> - ? ? ? ? ? ? ? }
> - ? ? ? ? ? ? ? return NOTIFY_DONE;
> -
> ? ? ? ?case DIE_DEBUG:
> ? ? ? ? ? ? ? ?if (atomic_read(&kgdb_cpu_doing_single_step) != -1) {
> ? ? ? ? ? ? ? ? ? ? ? ?if (user_mode(regs))
> @@ -588,6 +571,52 @@ kgdb_notify(struct notifier_block *self, unsigned
> long cmd, void *ptr)
> ? ? ? ?return ret;
> ?}
>
> +static int
> +kgdb_nmi_notify(struct notifier_block *self, unsigned long cmd, void *ptr)
> +{
> + ? ? ? unsigned long flags;
> + ? ? ? struct pt_regs *regs = ((struct die_args*)ptr)->regs;
> + ? ? ? int ret = NOTIFY_DONE;
> +
> + ? ? ? local_irq_save(flags);
> +
> + ? ? ? switch (cmd) {
> + ? ? ? case DIE_NMI:
> + ? ? ? ? ? ? ? if (atomic_read(&kgdb_active) != -1) {
> + ? ? ? ? ? ? ? ? ? ? ? /* KGDB CPU roundup */
> + ? ? ? ? ? ? ? ? ? ? ? kgdb_nmicallback(raw_smp_processor_id(), regs);
> + ? ? ? ? ? ? ? ? ? ? ? was_in_debug_nmi[raw_smp_processor_id()] = 1;
> + ? ? ? ? ? ? ? ? ? ? ? touch_nmi_watchdog();
> + ? ? ? ? ? ? ? ? ? ? ? ret = NOTIFY_STOP;
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? break;
> +
> + ? ? ? case DIE_NMIUNKNOWN:
> + ? ? ? ? ? ? ? if (was_in_debug_nmi[raw_smp_processor_id()]) {
> + ? ? ? ? ? ? ? ? ? ? ? was_in_debug_nmi[raw_smp_processor_id()] = 0;
> + ? ? ? ? ? ? ? ? ? ? ? ret = NOTIFY_STOP;
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? break;
> + ? ? ? default:
> + ? ? ? ? ? ? ? break;
> + ? ? ? }
> +
> + ? ? ? local_irq_restore(flags);
> + ? ? ? return ret;
> +}
> +
> +static struct notifier_block kgdb_nmi_notifier = {
> + ? ? ? .notifier_call ?= kgdb_nmi_notify,
> +
> + ? ? ? /*
> + ? ? ? ?* We might need to send an IPI and
> + ? ? ? ?* do cpu roundup before anything else
> + ? ? ? ?* in notifier chain so high priority
> + ? ? ? ?* is needed.
> + ? ? ? ?*/
> + ? ? ? .priority ? ? ? = NMI_LOCAL_HIGH_PRIOR,
> +};
> +
> ?static struct notifier_block kgdb_notifier = {
> ? ? ? ?.notifier_call ?= kgdb_notify,
>
> @@ -605,6 +634,9 @@ static struct notifier_block kgdb_notifier = {
> ?*/
> ?int kgdb_arch_init(void)
> ?{
> + ? ? ? int err = register_die_notifier(&kgdb_nmi_notifier);
> + ? ? ? if (err)
> + ? ? ? ? ? ? ? return err;
> ? ? ? ?return register_die_notifier(&kgdb_notifier);
> ?}
>
> @@ -673,6 +705,7 @@ void kgdb_arch_exit(void)
> ? ? ? ? ? ? ? ? ? ? ? ?breakinfo[i].pev = NULL;
> ? ? ? ? ? ? ? ?}
> ? ? ? ?}
> + ? ? ? unregister_die_notifier(&kgdb_nmi_notifier);
> ? ? ? ?unregister_die_notifier(&kgdb_notifier);
> ?}
>
> --
> 1.6.0.4
>
On 03/24/2011 12:24 AM, Cyrill Gorcunov wrote:
> If Jason is ok with such splitting -- I dont mind either ;)
>
> On Thursday, March 24, 2011, Dongdong Deng <[email protected]> wrote:
>> On Thu, Mar 24, 2011 at 4:32 AM, Cyrill Gorcunov <[email protected]> wrote:
>>> kgdb needs IPI to be sent and handled before perf
>>> or anything else NMI, otherwise kgdb hangs with bootup
>>> self-tests (found on P4 HT SMP machine). Raise its priority
>>> so that we're called first in a notifier chain.
>>>
I talked with Cyrill outside the mailing list since he pinged me and I will summarize here.
My initial thought about the patch Deng Dongdong posted was that it was really ugly to have kgdb registered in the notifier chain twice. I would be willing to live with this for now if we agree that when jump labels are merged to the kernel that we can make use of that instead.
The jump labels would allow us to invoke the debugger directly when the debugger is active much like we do when CONFIG_KGDB_LOW_LEVEL_TRAP is set. In fact the code that is ifdef'ed with CONFIG_KGDB_LOW_LEVEL_TRAP can make use of the same jump label as the NMI entry and no longer be #ifdef'ed when jump labels come to pass.
The very discussion of the patch raised the question of "why not always have the debugger be first?" The answer for that lies in that some code needs to run before the debugger to keep the system running assuming you are planning on restarting it after entering the debugger. The generic die notifier is used for lots of circumstances and the priority the debugger cares about only matter for a select few exception types.
The kmmio, mce-inject, and crash_nmi_nb (from reboot.c) are good examples of in tree code that should run with a higher priority than the debugger because the debugger doesn't know what to do with these code paths, so it sits last in line hoping someone else will deal with the exception else enter the debugger. For the trap paths the debugger needs to be first in line to deal with the case where where a breakpoint is in a notifier to avoid non-recoverable recursive faults. For NMI it appears we need to run before the perf code or the perf code will eat an nmi event intended for kgdb and result in a dead locked system.
The net result. I'll sign-off on the kgdb change and add a TODO item to wait for the jump patching to enter the kernel.
Cyrill, I am assuming this is something we want to aim to merge into the 2.6.39 as a regression fix? I'll try to get a version of Deng Dongdong's patch into linux-next as soon as possible in the mean time.
Cheers,
Jason.
On 03/31/2011 09:40 PM, Jason Wessel wrote:
...
> The net result. I'll sign-off on the kgdb change and add a TODO item to wait for the jump patching to enter the kernel.
>
> Cyrill, I am assuming this is something we want to aim to merge into the 2.6.39 as a regression fix?
> I'll try to get a version of Deng Dongdong's patch into linux-next as soon as possible in the mean time.
>
> Cheers,
> Jason.
Hi Jason, I think Deng Dongdong's patch is a good candidate for a while. Though Don mention another issue
on his Celerone machine so I need to think about the reason for unknown NMIs he observes. I try to get access
to p4 machine tomorrow and test some ideas. Will keep you in touch! (initially I thought to write a long post
about possible hang scenarios but then found they all are crap and I better to check details on real p4 machine
before claiming anything :).
--
Cyrill
On Fri, Apr 1, 2011 at 1:40 AM, Jason Wessel <[email protected]> wrote:
> On 03/24/2011 12:24 AM, Cyrill Gorcunov wrote:
>> If Jason is ok with such splitting -- I dont mind either ;)
>>
>> On Thursday, March 24, 2011, Dongdong Deng <[email protected]> wrote:
>>> On Thu, Mar 24, 2011 at 4:32 AM, Cyrill Gorcunov <[email protected]> wrote:
>>>> kgdb needs IPI to be sent and handled before perf
>>>> or anything else NMI, otherwise kgdb hangs with bootup
>>>> self-tests (found on P4 HT SMP machine). Raise its priority
>>>> so that we're called first in a notifier chain.
>>>>
>
> I talked with Cyrill outside the mailing list since he pinged me and I will summarize here.
>
> My initial thought about the patch Deng Dongdong posted was that it was really ugly to have kgdb registered in the notifier chain twice. I would be willing to live with this for now if we agree that when jump labels are merged to the kernel that we can make use of that instead.
>
> The jump labels would allow us to invoke the debugger directly when the debugger is active much like we do when CONFIG_KGDB_LOW_LEVEL_TRAP is set. In fact the code that is ifdef'ed with CONFIG_KGDB_LOW_LEVEL_TRAP can make use of the same jump label as the NMI entry and no longer be #ifdef'ed when jump labels come to pass.
Thanks to teach kgdb state with jump labels. :-)
>
> The very discussion of the patch raised the question of "why not always have the debugger be first?" The answer for that lies in that some code needs to run before the debugger to keep the system running assuming you are planning on restarting it after entering the debugger. The generic die notifier is used for lots of circumstances and the priority the debugger cares about only matter for a select few exception types.
>
> The kmmio, mce-inject, and crash_nmi_nb (from reboot.c) are good examples of in tree code that should run with a higher priority than the debugger because the debugger doesn't know what to do with these code paths, so it sits last in line hoping someone else will deal with the exception else enter the debugger. For the trap paths the debugger needs to be first in line to deal with the case where where a breakpoint is in a notifier to avoid non-recoverable recursive faults. For NMI it appears we need to run before the perf code or the perf code will eat an nmi event intended for kgdb and result in a dead locked system.
>
> The net result. I'll sign-off on the kgdb change and add a TODO item to wait for the jump patching to enter the kernel.
>
> Cyrill, I am assuming this is something we want to aim to merge into the 2.6.39 as a regression fix? I'll try to get a version of Deng Dongdong's patch into linux-next as soon as possible in the mean time.
Hi Jason,
Do I need to send a patch to you? :-)
BR,
Dongdong
On 04/01/2011 01:26 PM, Dongdong Deng wrote:
...
>>
>> The net result. I'll sign-off on the kgdb change and add a TODO item to wait for the jump patching to enter the kernel.
>>
>> Cyrill, I am assuming this is something we want to aim to merge into the 2.6.39 as a regression fix? I'll try to get a
>> version of Deng Dongdong's patch into linux-next as soon as possible in the mean time.
>
> Hi Jason,
>
> Do I need to send a patch to you? :-)
>
> BR,
> Dongdong
I've started latest -tip on some ancient p4 machine today and got unknown nmi issue even without kgdb compiled :(
Didn't reveal all the details yet (seems I'll be able to continue investigation at monday only in best case).
But still your patch looks correct to me.
--
Cyrill