2012-05-23 01:59:54

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH] printk: ignore recursion_bug flag when MCE in progress

From: ShuoX Liu <[email protected]>

When MCE happens in printk, we ignore recursion_bug to make sure
some MCE logs printed out. Re-use mce_entry variable.

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
I found mce_entry was introduced by commit 553f265f, but it's not
used now. Why not removed?
---
arch/x86/include/asm/mce.h | 2 --
arch/x86/kernel/cpu/mcheck/mce.c | 2 --
include/linux/kernel.h | 1 +
kernel/printk.c | 4 +++-
4 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 441520e..aeda4cc 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);

-extern atomic_t mce_entry;
-
typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 11c9166..6073354 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -61,8 +61,6 @@ int mce_disabled __read_mostly;

#define SPINUNIT 100 /* 100ns */

-atomic_t mce_entry;
-
DEFINE_PER_CPU(unsigned, mce_exception_count);

/*
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 645231c..24af685 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -354,6 +354,7 @@ unsigned long int_sqrt(unsigned long);
extern void bust_spinlocks(int yes);
extern void wake_up_klogd(void);
extern int oops_in_progress; /* If set, an oops, panic(), BUG() or die() is in progress */
+extern atomic_t mce_entry;
extern int panic_timeout;
extern int panic_on_oops;
extern int panic_on_unrecovered_nmi;
diff --git a/kernel/printk.c b/kernel/printk.c
index 473afdb..2bae087 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -79,6 +79,7 @@ int console_printk[4] = {
int oops_in_progress;
EXPORT_SYMBOL(oops_in_progress);

+atomic_t mce_entry;
/*
* console_sem protects the console_drivers list, and also
* provides serialisation for access to the entire console
@@ -864,7 +865,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment:
*/
- if (!oops_in_progress && !lockdep_recursing(current)) {
+ if (!oops_in_progress && !atomic_read(&mce_entry)
+ && !lockdep_recursing(current)) {
recursion_bug = 1;
goto out_restore_irqs;
}
--
1.7.1


2012-05-23 10:01:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] printk: ignore recursion_bug flag when MCE in progress

+ Tony

On Wed, May 23, 2012 at 09:58:34AM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <[email protected]>
>
> When MCE happens in printk, we ignore recursion_bug to make sure
> some MCE logs printed out. Re-use mce_entry variable.
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>
> ---
> I found mce_entry was introduced by commit 553f265f, but it's not
> used now. Why not removed?
> ---
> arch/x86/include/asm/mce.h | 2 --
> arch/x86/kernel/cpu/mcheck/mce.c | 2 --
> include/linux/kernel.h | 1 +
> kernel/printk.c | 4 +++-
> 4 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 441520e..aeda4cc 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
> DECLARE_PER_CPU(unsigned, mce_exception_count);
> DECLARE_PER_CPU(unsigned, mce_poll_count);
>
> -extern atomic_t mce_entry;
> -
> typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
> DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 11c9166..6073354 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -61,8 +61,6 @@ int mce_disabled __read_mostly;
>
> #define SPINUNIT 100 /* 100ns */
>
> -atomic_t mce_entry;
> -
> DEFINE_PER_CPU(unsigned, mce_exception_count);
>
> /*
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 645231c..24af685 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -354,6 +354,7 @@ unsigned long int_sqrt(unsigned long);
> extern void bust_spinlocks(int yes);
> extern void wake_up_klogd(void);
> extern int oops_in_progress; /* If set, an oops, panic(), BUG() or die() is in progress */
> +extern atomic_t mce_entry;
> extern int panic_timeout;
> extern int panic_on_oops;
> extern int panic_on_unrecovered_nmi;
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 473afdb..2bae087 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -79,6 +79,7 @@ int console_printk[4] = {
> int oops_in_progress;
> EXPORT_SYMBOL(oops_in_progress);
>
> +atomic_t mce_entry;
> /*
> * console_sem protects the console_drivers list, and also
> * provides serialisation for access to the entire console
> @@ -864,7 +865,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
> * recursion and return - but flag the recursion so that
> * it can be printed at the next appropriate moment:
> */
> - if (!oops_in_progress && !lockdep_recursing(current)) {
> + if (!oops_in_progress && !atomic_read(&mce_entry)

This is leaking x86-specific (MCE) stuff in generic kernel code. I think
it would be more appropriate to add a in_hw_error() helper or similar
and define it on each arch. I can very well imagine other architectures
would like to print hw error info too...

Hmmm.

--
Regards/Gruss,
Boris.

2012-05-24 00:37:27

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH] printk: ignore recursion_bug flag when MCE in progress

On Wed, 2012-05-23 at 12:01 +0200, Borislav Petkov wrote:
> + Tony
>
> On Wed, May 23, 2012 at 09:58:34AM +0800, ShuoX Liu wrote:
> > From: ShuoX Liu <[email protected]>
> >
> > When MCE happens in printk, we ignore recursion_bug to make sure
> > some MCE logs printed out. Re-use mce_entry variable.
> >
> > Signed-off-by: Yanmin Zhang <[email protected]>
> > Signed-off-by: ShuoX Liu <[email protected]>
> > ---
> > I found mce_entry was introduced by commit 553f265f, but it's not
> > used now. Why not removed?
> > ---
> > arch/x86/include/asm/mce.h | 2 --
> > arch/x86/kernel/cpu/mcheck/mce.c | 2 --
> > include/linux/kernel.h | 1 +
> > kernel/printk.c | 4 +++-
> > 4 files changed, 4 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> > index 441520e..aeda4cc 100644
> > --- a/arch/x86/include/asm/mce.h
> > +++ b/arch/x86/include/asm/mce.h
> > @@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
> > DECLARE_PER_CPU(unsigned, mce_exception_count);
> > DECLARE_PER_CPU(unsigned, mce_poll_count);
> >
> > -extern atomic_t mce_entry;
> > -
> > typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
> > DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 11c9166..6073354 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -61,8 +61,6 @@ int mce_disabled __read_mostly;
> >
> > #define SPINUNIT 100 /* 100ns */
> >
> > -atomic_t mce_entry;
> > -
> > DEFINE_PER_CPU(unsigned, mce_exception_count);
> >
> > /*
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 645231c..24af685 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -354,6 +354,7 @@ unsigned long int_sqrt(unsigned long);
> > extern void bust_spinlocks(int yes);
> > extern void wake_up_klogd(void);
> > extern int oops_in_progress; /* If set, an oops, panic(), BUG() or die() is in progress */
> > +extern atomic_t mce_entry;
> > extern int panic_timeout;
> > extern int panic_on_oops;
> > extern int panic_on_unrecovered_nmi;
> > diff --git a/kernel/printk.c b/kernel/printk.c
> > index 473afdb..2bae087 100644
> > --- a/kernel/printk.c
> > +++ b/kernel/printk.c
> > @@ -79,6 +79,7 @@ int console_printk[4] = {
> > int oops_in_progress;
> > EXPORT_SYMBOL(oops_in_progress);
> >
> > +atomic_t mce_entry;
> > /*
> > * console_sem protects the console_drivers list, and also
> > * provides serialisation for access to the entire console
> > @@ -864,7 +865,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
> > * recursion and return - but flag the recursion so that
> > * it can be printed at the next appropriate moment:
> > */
> > - if (!oops_in_progress && !lockdep_recursing(current)) {
> > + if (!oops_in_progress && !atomic_read(&mce_entry)
>
> This is leaking x86-specific (MCE) stuff in generic kernel code. I think
> it would be more appropriate to add a in_hw_error() helper or similar
> and define it on each arch. I can very well imagine other architectures
> would like to print hw error info too...
Good idea. We would do so to make it more generic.

Thanks,
Yanmin

2012-05-24 06:01:05

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v2] printk: ignore recursion_bug flag in HW error handle process

From: ShuoX Liu <[email protected]>

When MCE happens in printk, we ignore recursion_bug to make sure MCE logs printed out.

According to Boris' suggestion, we add some helper functions.
1) hw_error_enter: Call it when specific arch begins to process a hardware error.
2) hw_error_exit: Call it when specific arch finishes the processing of a hardware error.
3) in_hw_error():indicates whether HW error handling is in processing.

Each arch could call the helpers in their arch-dependent HW error handlers.

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
arch/x86/include/asm/mce.h | 2 --
arch/x86/kernel/cpu/mcheck/mce.c | 6 ++----
include/linux/cpu.h | 17 +++++++++++++++++
kernel/cpu.c | 3 +++
kernel/printk.c | 3 ++-
5 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 441520e..aeda4cc 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);

-extern atomic_t mce_entry;
-
typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..aaf41d2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -61,8 +61,6 @@ int mce_disabled __read_mostly;

#define SPINUNIT 100 /* 100ns */

-atomic_t mce_entry;
-
DEFINE_PER_CPU(unsigned, mce_exception_count);

/*
@@ -1015,7 +1013,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
char *msg = "Unknown";

- atomic_inc(&mce_entry);
+ hw_error_enter();

this_cpu_inc(mce_exception_count);

@@ -1143,7 +1141,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_report_event(regs);
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
out:
- atomic_dec(&mce_entry);
+ hw_error_exit();
sync_core();
}
EXPORT_SYMBOL_GPL(do_machine_check);
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 7230bb5..beb56d0 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -210,4 +210,21 @@ static inline int disable_nonboot_cpus(void) { return 0; }
static inline void enable_nonboot_cpus(void) {}
#endif /* !CONFIG_PM_SLEEP_SMP */

+/* HW error handle status helpers */
+extern atomic_t hw_error;
+static inline void hw_error_enter(void)
+{
+ atomic_inc(&hw_error);
+}
+
+static inline void hw_error_exit(void)
+{
+ atomic_dec(&hw_error);
+}
+
+static inline int in_hw_error(void)
+{
+ return atomic_read(&hw_error);
+}
+
#endif /* _LINUX_CPU_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 0e6353c..54b9c1f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -679,3 +679,6 @@ void init_cpu_online(const struct cpumask *src)
{
cpumask_copy(to_cpumask(cpu_online_bits), src);
}
+
+/* hardware error status */
+atomic_t hw_error __read_mostly = ATOMIC_INIT(0);
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..6f0f216 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1295,7 +1295,8 @@ asmlinkage int vprintk_emit(int facility, int level,
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment:
*/
- if (!oops_in_progress && !lockdep_recursing(current)) {
+ if (!oops_in_progress && !in_hw_error()
+ && !lockdep_recursing(current)) {
recursion_bug = 1;
goto out_restore_irqs;
}
--
1.7.1

2012-05-24 06:11:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] printk: ignore recursion_bug flag in HW error handle process

On Thu, May 24, 2012 at 01:59:38PM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <[email protected]>
>
> When MCE happens in printk, we ignore recursion_bug to make sure MCE logs printed out.
>
> According to Boris' suggestion, we add some helper functions.
> 1) hw_error_enter: Call it when specific arch begins to process a hardware error.
> 2) hw_error_exit: Call it when specific arch finishes the processing of a hardware error.
> 3) in_hw_error():indicates whether HW error handling is in processing.
>
> Each arch could call the helpers in their arch-dependent HW error handlers.

Yep, looks better, thanks. Design question though:

>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>
> ---
> arch/x86/include/asm/mce.h | 2 --
> arch/x86/kernel/cpu/mcheck/mce.c | 6 ++----
> include/linux/cpu.h | 17 +++++++++++++++++
> kernel/cpu.c | 3 +++
> kernel/printk.c | 3 ++-
> 5 files changed, 24 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 441520e..aeda4cc 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
> DECLARE_PER_CPU(unsigned, mce_exception_count);
> DECLARE_PER_CPU(unsigned, mce_poll_count);
>
> -extern atomic_t mce_entry;
> -
> typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
> DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..aaf41d2 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -61,8 +61,6 @@ int mce_disabled __read_mostly;
>
> #define SPINUNIT 100 /* 100ns */
>
> -atomic_t mce_entry;
> -
> DEFINE_PER_CPU(unsigned, mce_exception_count);
>
> /*
> @@ -1015,7 +1013,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> DECLARE_BITMAP(toclear, MAX_NR_BANKS);
> char *msg = "Unknown";
>
> - atomic_inc(&mce_entry);
> + hw_error_enter();
>
> this_cpu_inc(mce_exception_count);
>
> @@ -1143,7 +1141,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> mce_report_event(regs);
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> out:
> - atomic_dec(&mce_entry);
> + hw_error_exit();
> sync_core();
> }
> EXPORT_SYMBOL_GPL(do_machine_check);
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 7230bb5..beb56d0 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -210,4 +210,21 @@ static inline int disable_nonboot_cpus(void) { return 0; }
> static inline void enable_nonboot_cpus(void) {}
> #endif /* !CONFIG_PM_SLEEP_SMP */
>
> +/* HW error handle status helpers */
> +extern atomic_t hw_error;
> +static inline void hw_error_enter(void)
> +{
> + atomic_inc(&hw_error);
> +}
> +
> +static inline void hw_error_exit(void)
> +{
> + atomic_dec(&hw_error);
> +}
> +
> +static inline int in_hw_error(void)
> +{
> + return atomic_read(&hw_error);
> +}

Shouldn't those be generic empty functions and each arch implement their
own with the stuff they want to do on the respective architecture when
they get a hardware error?

Andrew, Ingo?

Thanks.

--
Regards/Gruss,
Boris.

2012-05-24 22:56:17

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2] printk: ignore recursion_bug flag in HW error handle process

On Thu, 24 May 2012 08:11:45 +0200
Borislav Petkov <[email protected]> wrote:

> > +/* HW error handle status helpers */
> > +extern atomic_t hw_error;
> > +static inline void hw_error_enter(void)
> > +{
> > + atomic_inc(&hw_error);
> > +}
> > +
> > +static inline void hw_error_exit(void)
> > +{
> > + atomic_dec(&hw_error);
> > +}
> > +
> > +static inline int in_hw_error(void)
> > +{
> > + return atomic_read(&hw_error);
> > +}
>
> Shouldn't those be generic empty functions and each arch implement their
> own with the stuff they want to do on the respective architecture when
> they get a hardware error?

This code needs documentation.

Specifically, it should clearly explain (and hence define) what a
"hardware error" *is*, and for what purpose this code exists.

Because as it stands, this interface is hopelessly vague. Once one
sees that it is *specifically* used for handling mce within a printk,
it all makes sense.

And with that understanding comes the realisation that the interface is
poorly named. It will not be used for any purpose other than adjusting
printk() behavior so it should mention printk() in its name and in its
comments and probably it should all be moved into printk.h.

Futhermore, this code is not really related to MCE or hardware or
anything else. It is simply a way in which callers can suppress
printk()'s recursion check. Callers are free to use it for reasons
other than "hardware errors".

And once all that is done, and this interface becomes part of printk()
then no, there is no need to add per-arch hooks. An arch can call into
printk_recursion_check_disable() and printk_recursion_chack_enable() -
nice and simple.


IOW, the title of this patch should be

[patch 1/2] printk: add interface for disabling recursion check
[patch 2/2] x86 mce: use new printk recursion disabling interface

2012-05-25 00:29:20

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH v2] printk: ignore recursion_bug flag in HW error handle process

On Thu, 2012-05-24 at 15:56 -0700, Andrew Morton wrote:
> On Thu, 24 May 2012 08:11:45 +0200
> Borislav Petkov <[email protected]> wrote:
>
> > > +/* HW error handle status helpers */
> > > +extern atomic_t hw_error;
> > > +static inline void hw_error_enter(void)
> > > +{
> > > + atomic_inc(&hw_error);
> > > +}
> > > +
> > > +static inline void hw_error_exit(void)
> > > +{
> > > + atomic_dec(&hw_error);
> > > +}
> > > +
> > > +static inline int in_hw_error(void)
> > > +{
> > > + return atomic_read(&hw_error);
> > > +}
> >
> > Shouldn't those be generic empty functions and each arch implement their
> > own with the stuff they want to do on the respective architecture when
> > they get a hardware error?
>
> This code needs documentation.
>
> Specifically, it should clearly explain (and hence define) what a
> "hardware error" *is*, and for what purpose this code exists.
>
> Because as it stands, this interface is hopelessly vague. Once one
> sees that it is *specifically* used for handling mce within a printk,
> it all makes sense.
>
> And with that understanding comes the realisation that the interface is
> poorly named. It will not be used for any purpose other than adjusting
> printk() behavior so it should mention printk() in its name and in its
> comments and probably it should all be moved into printk.h.
>
> Futhermore, this code is not really related to MCE or hardware or
> anything else. It is simply a way in which callers can suppress
> printk()'s recursion check. Callers are free to use it for reasons
> other than "hardware errors".
>
> And once all that is done, and this interface becomes part of printk()
> then no, there is no need to add per-arch hooks. An arch can call into
> printk_recursion_check_disable() and printk_recursion_chack_enable() -
> nice and simple.
>
>
> IOW, the title of this patch should be
>
> [patch 1/2] printk: add interface for disabling recursion check
> [patch 2/2] x86 mce: use new printk recursion disabling interface
Andrew,

Thanks for the detailed comment. It's more reasonable to bind it to printk.
We would follow it to create new patches.

Yanmin

2012-05-25 07:21:19

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH 1/2] printk: add interface for disabling recursion check

From: ShuoX Liu <[email protected]>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to print some
important information. So we add these interfaces in printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
include/linux/printk.h | 3 +++
kernel/printk.c | 16 +++++++++++++++-
2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
console_loglevel = 15;
}

+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
struct va_format {
const char *fmt;
va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..61067fe 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,18 @@ int console_printk[4] = {
int oops_in_progress;
EXPORT_SYMBOL(oops_in_progress);

+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+ atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+ atomic_dec(&recursion_check_disabled);
+}
+
/*
* console_sem protects the console_drivers list, and also
* provides serialisation for access to the entire console
@@ -1295,7 +1307,9 @@ asmlinkage int vprintk_emit(int facility, int level,
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment:
*/
- if (!oops_in_progress && !lockdep_recursing(current)) {
+ if (!atomic_read(&recursion_check_disabled)
+ && !oops_in_progress
+ && !lockdep_recursing(current)) {
recursion_bug = 1;
goto out_restore_irqs;
}
--
1.7.1

2012-05-25 07:22:42

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH 2/2] x86 mce: use new printk recursion disabling interface

From: ShuoX Liu <[email protected]>

Disable printk recursion to make sure MCE logs printed out.

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..365c35d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
char *msg = "Unknown";

+ printk_recursion_check_disable();
atomic_inc(&mce_entry);

this_cpu_inc(mce_exception_count);
@@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
out:
atomic_dec(&mce_entry);
+ printk_recursion_check_enable();
sync_core();
}
EXPORT_SYMBOL_GPL(do_machine_check);
--
1.7.1

2012-05-25 07:41:20

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface

On Fri, May 25, 2012 at 03:21:12PM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <[email protected]>
>
> Disable printk recursion to make sure MCE logs printed out.
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..365c35d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> DECLARE_BITMAP(toclear, MAX_NR_BANKS);
> char *msg = "Unknown";
>
> + printk_recursion_check_disable();
> atomic_inc(&mce_entry);
>
> this_cpu_inc(mce_exception_count);
> @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> out:
> atomic_dec(&mce_entry);
> + printk_recursion_check_enable();

Looks like those should be at the beginning and the end of print_mce() -
do_machine_check() could exit without printing an MCE and disabling the
recursion check then is superfluous, methinks.

Thanks.

--
Regards/Gruss,
Boris.

2012-05-25 08:01:54

by Liu ShuoX

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface

On 2012年05月25日 15:41, Borislav Petkov wrote:

> On Fri, May 25, 2012 at 03:21:12PM +0800, ShuoX Liu wrote:
>> From: ShuoX Liu <[email protected]>
>>
>> Disable printk recursion to make sure MCE logs printed out.
>>
>> Signed-off-by: Yanmin Zhang <[email protected]>
>> Signed-off-by: ShuoX Liu <[email protected]>
>> ---
>> arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>> index 2afcbd2..365c35d 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>> @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>> DECLARE_BITMAP(toclear, MAX_NR_BANKS);
>> char *msg = "Unknown";
>>
>> + printk_recursion_check_disable();
>> atomic_inc(&mce_entry);
>>
>> this_cpu_inc(mce_exception_count);
>> @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>> out:
>> atomic_dec(&mce_entry);
>> + printk_recursion_check_enable();
>
> Looks like those should be at the beginning and the end of print_mce() -
> do_machine_check() could exit without printing an MCE and disabling the
> recursion check then is superfluous, methinks.

Thanks for comment. It seems you are right. I will check the code next
Monday.

>
> Thanks.
>

2012-05-25 16:09:36

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH 1/2] printk: add interface for disabling recursion check

+void printk_recursion_check_enable(void)
+{
+ atomic_dec(&recursion_check_disabled);
+}


Is it worth a BUG_ON() in here to check that recursion_check_disabled
is >=1 before blindly decrementing it? Or is this interface so simple
that nobody would ever get this wrong?

-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-05-28 00:30:05

by Yanmin Zhang

[permalink] [raw]
Subject: RE: [PATCH 1/2] printk: add interface for disabling recursion check

On Fri, 2012-05-25 at 16:09 +0000, Luck, Tony wrote:
> +void printk_recursion_check_enable(void)
> +{
> + atomic_dec(&recursion_check_disabled);
> +}
>
>
> Is it worth a BUG_ON() in here to check that recursion_check_disabled
> is >=1 before blindly decrementing it? Or is this interface so simple
> that nobody would ever get this wrong?
Tony,

The interface is clear and simple. But a WARN_ON checking is better
to have. We would add WARN_ON.

Thanks,
Yanmin

2012-05-28 02:09:42

by Liu ShuoX

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface

On 2012年05月25日 15:41, Borislav Petkov wrote:

> On Fri, May 25, 2012 at 03:21:12PM +0800, ShuoX Liu wrote:
>> From: ShuoX Liu <[email protected]>
>>
>> Disable printk recursion to make sure MCE logs printed out.
>>
>> Signed-off-by: Yanmin Zhang <[email protected]>
>> Signed-off-by: ShuoX Liu <[email protected]>
>> ---
>> arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>> index 2afcbd2..365c35d 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>> @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>> DECLARE_BITMAP(toclear, MAX_NR_BANKS);
>> char *msg = "Unknown";
>>
>> + printk_recursion_check_disable();
>> atomic_inc(&mce_entry);
>>
>> this_cpu_inc(mce_exception_count);
>> @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>> out:
>> atomic_dec(&mce_entry);
>> + printk_recursion_check_enable();
>
> Looks like those should be at the beginning and the end of print_mce() -
> do_machine_check() could exit without printing an MCE and disabling the
> recursion check then is superfluous, methinks.

Boris,
I checked code and found some other functions in do_machine_check() also
would printk something. Such as add_taint(). So i think we'd better
place the recursion check at the beginning and the end of
do_machine_check(). Also more printks later(maybe) added will benefit
from this. Do you agree?

>
> Thanks.
>

2012-05-28 02:56:50

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v3 1/2] printk: add interface for disabling recursion check

From: ShuoX Liu <[email protected]>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
include/linux/printk.h | 3 +++
kernel/printk.c | 17 ++++++++++++++++-
2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
console_loglevel = 15;
}

+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
struct va_format {
const char *fmt;
va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
int oops_in_progress;
EXPORT_SYMBOL(oops_in_progress);

+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+ atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+ WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+ atomic_dec(&recursion_check_disabled);
+}
+
/*
* console_sem protects the console_drivers list, and also
* provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment:
*/
- if (!oops_in_progress && !lockdep_recursing(current)) {
+ if (!atomic_read(&recursion_check_disabled)
+ && !oops_in_progress
+ && !lockdep_recursing(current)) {
recursion_bug = 1;
goto out_restore_irqs;
}
--
1.7.1

2012-05-28 02:57:46

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface

From: ShuoX Liu <[email protected]>

Disable printk recursion to make sure MCE logs printed out.

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..365c35d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
char *msg = "Unknown";

+ printk_recursion_check_disable();
atomic_inc(&mce_entry);

this_cpu_inc(mce_exception_count);
@@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
out:
atomic_dec(&mce_entry);
+ printk_recursion_check_enable();
sync_core();
}
EXPORT_SYMBOL_GPL(do_machine_check);
--
1.7.1

2012-05-30 09:08:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface

On Mon, May 28, 2012 at 10:07:59AM +0800, ShuoX Liu wrote:
> Boris,
> I checked code and found some other functions in do_machine_check() also
> would printk something. Such as add_taint(). So i think we'd better
> place the recursion check at the beginning and the end of
> do_machine_check(). Also more printks later(maybe) added will benefit
> from this. Do you agree?

I'm not sure we want to disable printk recursion for add_taint() - it
doesn't spit out any useful information wrt MCE so we could ignore it.

Btw, I forgot to ask: this printk recursion disabling, do you have a
real usecase where you don't get the MCE info in dmesg and with your
patch it works or is this purely hypothetical?

Thanks.

--
Regards/Gruss,
Boris.

2012-05-31 00:29:38

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface

On Wed, 2012-05-30 at 11:08 +0200, Borislav Petkov wrote:
> On Mon, May 28, 2012 at 10:07:59AM +0800, ShuoX Liu wrote:
> > Boris,
> > I checked code and found some other functions in do_machine_check() also
> > would printk something. Such as add_taint(). So i think we'd better
> > place the recursion check at the beginning and the end of
> > do_machine_check(). Also more printks later(maybe) added will benefit
> > from this. Do you agree?
>
> I'm not sure we want to disable printk recursion for add_taint() - it
> doesn't spit out any useful information wrt MCE so we could ignore it.
add_taint might be not a good case here. We could move the recursion check
flag setting around mce_panic.


>
> Btw, I forgot to ask: this printk recursion disabling, do you have a
> real usecase where you don't get the MCE info in dmesg and with your
> patch it works or is this purely hypothetical?
We hit it when running a MTBF testing on a Android atom mobile.

Thanks,
Yanmin

2012-06-04 03:07:08

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v4 1/2] printk: add interface for disabling recursion check

From: ShuoX Liu <[email protected]>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
include/linux/printk.h | 3 +++
kernel/printk.c | 17 ++++++++++++++++-
2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
console_loglevel = 15;
}

+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
struct va_format {
const char *fmt;
va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
int oops_in_progress;
EXPORT_SYMBOL(oops_in_progress);

+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+ atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+ WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+ atomic_dec(&recursion_check_disabled);
+}
+
/*
* console_sem protects the console_drivers list, and also
* provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment:
*/
- if (!oops_in_progress && !lockdep_recursing(current)) {
+ if (!atomic_read(&recursion_check_disabled)
+ && !oops_in_progress
+ && !lockdep_recursing(current)) {
recursion_bug = 1;
goto out_restore_irqs;
}
--
1.7.1

2012-06-04 03:09:26

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v4 2/2] x86 mce: use new printk recursion disabling interface

From: ShuoX Liu <[email protected]>

Disable printk recursion to make sure MCE logs printed out.

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
We hit it when running a MTBF testing on a Android atom mobile.
---
arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..906e838 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
{
int i, apei_err = 0;

+ printk_recursion_check_disable();
if (!fake_panic) {
/*
* Make sure only one CPU runs in machine check panic
@@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
panic(msg);
} else
pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
+ printk_recursion_check_enable();
}

/* Support code for software error injection */
--
1.7.1

2012-06-04 17:12:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface

On Mon, May 28, 2012 at 10:56:04AM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <[email protected]>
>
> Disable printk recursion to make sure MCE logs printed out.
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..365c35d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> DECLARE_BITMAP(toclear, MAX_NR_BANKS);
> char *msg = "Unknown";
>
> + printk_recursion_check_disable();
> atomic_inc(&mce_entry);
>
> this_cpu_inc(mce_exception_count);
> @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> out:
> atomic_dec(&mce_entry);
> + printk_recursion_check_enable();
> sync_core();

This is still adding the recursion check things in do_machine_check
instead of in print_mce.

Also, this commit message above needs more explanation why we want to
disable the recursion check on an MCE, maybe an example...

Thanks.

--
Regards/Gruss,
Boris.

2012-06-05 00:31:37

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface

On Mon, 2012-06-04 at 19:12 +0200, Borislav Petkov wrote:
> On Mon, May 28, 2012 at 10:56:04AM +0800, ShuoX Liu wrote:
> > From: ShuoX Liu <[email protected]>
> >
> > Disable printk recursion to make sure MCE logs printed out.
> >
> > Signed-off-by: Yanmin Zhang <[email protected]>
> > Signed-off-by: ShuoX Liu <[email protected]>
> > ---
> > arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
> > 1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 2afcbd2..365c35d 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> > DECLARE_BITMAP(toclear, MAX_NR_BANKS);
> > char *msg = "Unknown";
> >
> > + printk_recursion_check_disable();
> > atomic_inc(&mce_entry);
> >
> > this_cpu_inc(mce_exception_count);
> > @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> > mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> > out:
> > atomic_dec(&mce_entry);
> > + printk_recursion_check_enable();
> > sync_core();
>
> This is still adding the recursion check things in do_machine_check
> instead of in print_mce.
Boris,

You are seeing an old patch. The new patches V4 are:
https://lkml.org/lkml/2012/6/3/176
https://lkml.org/lkml/2012/6/3/177


The new patches change mce_panic instead of do_machine_check.

>
> Also, this commit message above needs more explanation why we want to
> disable the recursion check on an MCE, maybe an example...
V4 doesn't have the comment. We would add it in V5.

Thanks.

2012-06-05 08:14:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface

On Tue, Jun 05, 2012 at 08:32:40AM +0800, Yanmin Zhang wrote:
> You are seeing an old patch. The new patches V4 are:
> https://lkml.org/lkml/2012/6/3/176
> https://lkml.org/lkml/2012/6/3/177
>
> The new patches change mce_panic instead of do_machine_check.

Ok.

> > Also, this commit message above needs more explanation why we want to
> > disable the recursion check on an MCE, maybe an example...
> V4 doesn't have the comment. We would add it in V5.

Cool, thanks.

--
Regards/Gruss,
Boris.

2012-06-05 09:56:08

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v5 1/2] printk: add interface for disabling recursion check

From: ShuoX Liu <[email protected]>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
include/linux/printk.h | 3 +++
kernel/printk.c | 17 ++++++++++++++++-
2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
console_loglevel = 15;
}

+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
struct va_format {
const char *fmt;
va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
int oops_in_progress;
EXPORT_SYMBOL(oops_in_progress);

+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+ atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+ WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+ atomic_dec(&recursion_check_disabled);
+}
+
/*
* console_sem protects the console_drivers list, and also
* provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment:
*/
- if (!oops_in_progress && !lockdep_recursing(current)) {
+ if (!atomic_read(&recursion_check_disabled)
+ && !oops_in_progress
+ && !lockdep_recursing(current)) {
recursion_bug = 1;
goto out_restore_irqs;
}
--
1.7.1

2012-06-05 09:57:42

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v5 2/2] x86 mce: use new printk recursion disabling interface

From: ShuoX Liu <[email protected]>

On x86 machines, some times MCE happens just when kernel calls printk
to output some log info to serial console, while usually MCE module in
kernel is used to print out some hardware error information, such like
bad cache or bad memory bank. That causes printk recursion and printk
would omit MCE printk output.

We hit it when running MTBF testing on Android ATOM mobiles.

Here in mce_panic, we choose to disable printk recursion to make sure
MCE logs printed out.

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..906e838 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
{
int i, apei_err = 0;

+ printk_recursion_check_disable();
if (!fake_panic) {
/*
* Make sure only one CPU runs in machine check panic
@@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
panic(msg);
} else
pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
+ printk_recursion_check_enable();
}

/* Support code for software error injection */
--
1.7.1

2012-06-05 15:15:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] x86 mce: use new printk recursion disabling interface

On Tue, Jun 05, 2012 at 05:55:22PM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <[email protected]>
>
> On x86 machines, some times MCE happens just when kernel calls printk
> to output some log info to serial console, while usually MCE module in
> kernel is used to print out some hardware error information, such like
> bad cache or bad memory bank. That causes printk recursion and printk
> would omit MCE printk output.
>
> We hit it when running MTBF testing on Android ATOM mobiles.
>
> Here in mce_panic, we choose to disable printk recursion to make sure
> MCE logs printed out.
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..906e838 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
> {
> int i, apei_err = 0;
>
> + printk_recursion_check_disable();
> if (!fake_panic) {
> /*
> * Make sure only one CPU runs in machine check panic
> @@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
> panic(msg);
> } else
> pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
> + printk_recursion_check_enable();

Ok, let me ask this again: why not disable the printk recursion check in
the function that actually _prints_ the MCE, i.e. print_mce() instead of
here in mce_panic() which does a whole bunch of other stuff and it can
also return without printing any MCE to dmesg?

Are you interested in seeing the printk's from mce_panic? Why?

Thanks.

--
Regards/Gruss,
Boris.

2012-06-06 00:35:01

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] x86 mce: use new printk recursion disabling interface

On Tue, 2012-06-05 at 17:15 +0200, Borislav Petkov wrote:
> On Tue, Jun 05, 2012 at 05:55:22PM +0800, ShuoX Liu wrote:
> > From: ShuoX Liu <[email protected]>
> >
> > On x86 machines, some times MCE happens just when kernel calls printk
> > to output some log info to serial console, while usually MCE module in
> > kernel is used to print out some hardware error information, such like
> > bad cache or bad memory bank. That causes printk recursion and printk
> > would omit MCE printk output.
> >
> > We hit it when running MTBF testing on Android ATOM mobiles.
> >
> > Here in mce_panic, we choose to disable printk recursion to make sure
> > MCE logs printed out.
> >
> > Signed-off-by: Yanmin Zhang <[email protected]>
> > Signed-off-by: ShuoX Liu <[email protected]>
> > ---
> > arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
> > 1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 2afcbd2..906e838 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
> > {
> > int i, apei_err = 0;
> >
> > + printk_recursion_check_disable();
> > if (!fake_panic) {
> > /*
> > * Make sure only one CPU runs in machine check panic
> > @@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
> > panic(msg);
> > } else
> > pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
> > + printk_recursion_check_enable();
>
> Ok, let me ask this again: why not disable the printk recursion check in
> the function that actually _prints_ the MCE, i.e. print_mce() instead of
> here in mce_panic() which does a whole bunch of other stuff and it can
> also return without printing any MCE to dmesg?
Sorry for forgetting the return checking.

> Are you interested in seeing the printk's from mce_panic? Why?
How about moving the disabling checking in print_mce? It seems not clean if we disable
the printk recursion checking just around every printk statement.

2012-06-06 08:33:50

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v6 1/2] printk: add interface for disabling recursion check

From: ShuoX Liu <[email protected]>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
include/linux/printk.h | 3 +++
kernel/printk.c | 17 ++++++++++++++++-
2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
console_loglevel = 15;
}

+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
struct va_format {
const char *fmt;
va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
int oops_in_progress;
EXPORT_SYMBOL(oops_in_progress);

+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+ atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+ WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+ atomic_dec(&recursion_check_disabled);
+}
+
/*
* console_sem protects the console_drivers list, and also
* provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment:
*/
- if (!oops_in_progress && !lockdep_recursing(current)) {
+ if (!atomic_read(&recursion_check_disabled)
+ && !oops_in_progress
+ && !lockdep_recursing(current)) {
recursion_bug = 1;
goto out_restore_irqs;
}
--
1.7.1

2012-06-06 08:36:47

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v6 2/2] x86 mce: use new printk recursion disabling interface

From: ShuoX Liu <[email protected]>

On x86 machines, some times MCE happens just when kernel calls printk
to output some log info to serial console, while usually MCE module in
kernel is used to print out some hardware error information, such like
bad cache or bad memory bank. That causes printk recursion and printk
would omit MCE printk output.

We hit it when running MTBF testing on Android ATOM mobiles.

Here in mce_panic, we choose to disable printk recursion to make sure
MCE logs printed out.

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
v6: move the disabling checking in print_mce

---
arch/x86/kernel/cpu/mcheck/mce.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..6056e94 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -242,6 +242,7 @@ static void print_mce(struct mce *m)
{
int ret = 0;

+ printk_recursion_check_disable();
pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
m->extcpu, m->mcgstatus, m->bank, m->status);

@@ -275,10 +276,13 @@ static void print_mce(struct mce *m)
* (if the CPU has an implementation for that)
*/
ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
- if (ret == NOTIFY_STOP)
+ if (ret == NOTIFY_STOP) {
+ printk_recursion_check_enable();
return;
+ }

pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
+ printk_recursion_check_enable();
}

#define PANIC_TIMEOUT 5 /* 5 seconds */
--
1.7.1

2012-06-06 15:22:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] x86 mce: use new printk recursion disabling interface

On Wed, Jun 06, 2012 at 04:34:21PM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <[email protected]>
>
> On x86 machines, some times MCE happens just when kernel calls printk
> to output some log info to serial console, while usually MCE module in
> kernel is used to print out some hardware error information, such like
> bad cache or bad memory bank. That causes printk recursion and printk
> would omit MCE printk output.
>
> We hit it when running MTBF testing on Android ATOM mobiles.
>
> Here in mce_panic, we choose to disable printk recursion to make sure
> MCE logs printed out.

Just a minor nitpick: this should say "print_mce" or you can simply
remove the whole sentence - commit message is fine without it too.

Aside of that, those look ok to me.

Thanks.

--
Regards/Gruss,
Boris.

2012-06-07 02:12:21

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] x86 mce: use new printk recursion disabling interface

On Wed, 2012-06-06 at 17:22 +0200, Borislav Petkov wrote:
> On Wed, Jun 06, 2012 at 04:34:21PM +0800, ShuoX Liu wrote:
> > From: ShuoX Liu <[email protected]>
> >
> > On x86 machines, some times MCE happens just when kernel calls printk
> > to output some log info to serial console, while usually MCE module in
> > kernel is used to print out some hardware error information, such like
> > bad cache or bad memory bank. That causes printk recursion and printk
> > would omit MCE printk output.
> >
> > We hit it when running MTBF testing on Android ATOM mobiles.
> >
> > Here in mce_panic, we choose to disable printk recursion to make sure
> > MCE logs printed out.
>
> Just a minor nitpick: this should say "print_mce" or you can simply
> remove the whole sentence - commit message is fine without it too.
Thanks for your very careful/detailed comments. We would make it perfect.

2012-06-07 02:59:55

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v7 1/2] printk: add interface for disabling recursion check

From: ShuoX Liu <[email protected]>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
include/linux/printk.h | 3 +++
kernel/printk.c | 17 ++++++++++++++++-
2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
console_loglevel = 15;
}

+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
struct va_format {
const char *fmt;
va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
int oops_in_progress;
EXPORT_SYMBOL(oops_in_progress);

+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+ atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+ WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+ atomic_dec(&recursion_check_disabled);
+}
+
/*
* console_sem protects the console_drivers list, and also
* provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment:
*/
- if (!oops_in_progress && !lockdep_recursing(current)) {
+ if (!atomic_read(&recursion_check_disabled)
+ && !oops_in_progress
+ && !lockdep_recursing(current)) {
recursion_bug = 1;
goto out_restore_irqs;
}
--
1.7.1

2012-06-07 03:02:32

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface

From: ShuoX Liu <[email protected]>

On x86 machines, some times MCE happens just when kernel calls printk
to output some log info to serial console, while usually MCE module in
kernel is used to print out some hardware error information, such like
bad cache or bad memory bank. That causes printk recursion and printk
would omit MCE printk output.

We hit it when running MTBF testing on Android ATOM mobiles.

Here in print_mce, we choose to disable printk recursion to make sure
MCE logs printed out.

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: ShuoX Liu <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..6056e94 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -242,6 +242,7 @@ static void print_mce(struct mce *m)
{
int ret = 0;

+ printk_recursion_check_disable();
pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
m->extcpu, m->mcgstatus, m->bank, m->status);

@@ -275,10 +276,13 @@ static void print_mce(struct mce *m)
* (if the CPU has an implementation for that)
*/
ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
- if (ret == NOTIFY_STOP)
+ if (ret == NOTIFY_STOP) {
+ printk_recursion_check_enable();
return;
+ }

pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
+ printk_recursion_check_enable();
}

#define PANIC_TIMEOUT 5 /* 5 seconds */
--
1.7.1

2012-06-07 13:19:19

by bing deng

[permalink] [raw]
Subject: Re: [PATCH v7 1/2] printk: add interface for disabling recursion check

On Thu, Jun 7, 2012 at 2:57 AM, ShuoX Liu <[email protected]> wrote:
> From: ShuoX Liu <[email protected]>
>
> With some special scenario, such as Machine Check Exception happened
> in printk, we want to bypass printk recursion check to printk some
> important information. So we add these interfaces of printk.
>
> 1) printk_recursion_check_disable() for disabling recursion check
> 2) printk_recursion_check_enable() for enabling recursion check
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>
> ---
> ?include/linux/printk.h | ? ?3 +++
> ?kernel/printk.c ? ? ? ?| ? 17 ++++++++++++++++-
> ?2 files changed, 19 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 1bec2f7..da48ec7 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -42,6 +42,9 @@ static inline void console_verbose(void)
> ? ? ? ? ? ? ? ?console_loglevel = 15;
> ?}
>
> +void printk_recursion_check_disable(void);
> +void printk_recursion_check_enable(void);
> +
> ?struct va_format {
> ? ? ? ?const char *fmt;
> ? ? ? ?va_list *va;
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 32462d2..0580f67 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -78,6 +78,19 @@ int console_printk[4] = {
> ?int oops_in_progress;
> ?EXPORT_SYMBOL(oops_in_progress);
>
> +static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
> +
> +void printk_recursion_check_disable(void)
> +{
> + ? ? ? atomic_inc(&recursion_check_disabled);
> +}
> +
> +void printk_recursion_check_enable(void)
> +{
> + ? ? ? WARN_ON(atomic_read(&recursion_check_disabled) < 1);
> + ? ? ? atomic_dec(&recursion_check_disabled);
> +}
> +
> ?/*
> ?* console_sem protects the console_drivers list, and also
> ?* provides serialisation for access to the entire console
> @@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
> ? ? ? ? ? ? ? ? * recursion and return - but flag the recursion so that
> ? ? ? ? ? ? ? ? * it can be printed at the next appropriate moment:
> ? ? ? ? ? ? ? ? */
> - ? ? ? ? ? ? ? if (!oops_in_progress && !lockdep_recursing(current)) {
> + ? ? ? ? ? ? ? if (!atomic_read(&recursion_check_disabled)
> + ? ? ? ? ? ? ? ? ? ? ? && !oops_in_progress
> + ? ? ? ? ? ? ? ? ? ? ? && !lockdep_recursing(current)) {
> ? ? ? ? ? ? ? ? ? ? ? ?recursion_bug = 1;
> ? ? ? ? ? ? ? ? ? ? ? ?goto out_restore_irqs;
> ? ? ? ? ? ? ? ?}
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/

Hi Shuo,

Should the two function be export by EXPORT_SYMBOL? The the other
module can use it.

--
Best Regards,
Bing

2012-06-07 13:38:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v7 1/2] printk: add interface for disabling recursion check

On Thu, Jun 07, 2012 at 01:19:16PM +0000, bing deng wrote:
> Should the two function be export by EXPORT_SYMBOL? The the other
> module can use it.

Not needed right now - functions are only used in built-in code and not
in modules.

--
Regards/Gruss,
Boris.

2012-06-08 12:30:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v7 1/2] printk: add interface for disabling recursion check

On Thu, Jun 07, 2012 at 10:57:26AM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <[email protected]>
>
> With some special scenario, such as Machine Check Exception happened
> in printk, we want to bypass printk recursion check to printk some
> important information. So we add these interfaces of printk.
>
> 1) printk_recursion_check_disable() for disabling recursion check
> 2) printk_recursion_check_enable() for enabling recursion check
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>

Acked-by: Borislav Petkov <[email protected]>

--
Regards/Gruss,
Boris.

2012-06-08 12:34:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface

On Thu, Jun 07, 2012 at 11:00:03AM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <[email protected]>
>
> On x86 machines, some times MCE happens just when kernel calls printk
> to output some log info to serial console, while usually MCE module in
> kernel is used to print out some hardware error information, such like
> bad cache or bad memory bank. That causes printk recursion and printk
> would omit MCE printk output.
>
> We hit it when running MTBF testing on Android ATOM mobiles.
>
> Here in print_mce, we choose to disable printk recursion to make sure
> MCE logs printed out.
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>

Ok,

those patches have to go in hand in hand and since the first one touches
generic code, maybe Andrew is the right guy for the job of picking them
up, provided there are no other objections against them.

Acked-by: Borislav Petkov <[email protected]>

Thanks.

--
Regards/Gruss,
Boris.

2012-06-22 23:41:46

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86 mce: use new printk recursion disabling interface

On Mon, 04 Jun 2012 11:07:11 +0800
ShuoX Liu <[email protected]> wrote:

> From: ShuoX Liu <[email protected]>
>
> Disable printk recursion to make sure MCE logs printed out.
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>
> ---
> We hit it when running a MTBF testing on a Android atom mobile.
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..906e838 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
> {
> int i, apei_err = 0;
>
> + printk_recursion_check_disable();
> if (!fake_panic) {
> /*
> * Make sure only one CPU runs in machine check panic
> @@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
> panic(msg);
> } else
> pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
> + printk_recursion_check_enable();
> }
>
> /* Support code for software error injection */

A couple of things here.

a) mce_panic() has a "return" statement deep inside. So we return
from mce_panic() with the recursion check disabled. whoops.

b) adding a nice comment is nice.

--- a/arch/x86/kernel/cpu/mcheck/mce.c~x86-mce-use-new-printk-recursion-disabling-interface-fix
+++ a/arch/x86/kernel/cpu/mcheck/mce.c
@@ -303,11 +303,10 @@ static void wait_for_panic(void)
panic("Panicing machine check CPU died");
}

-static void mce_panic(char *msg, struct mce *final, char *exp)
+static void __mce_panic(char *msg, struct mce *final, char *exp)
{
int i, apei_err = 0;

- printk_recursion_check_disable();
if (!fake_panic) {
/*
* Make sure only one CPU runs in machine check panic
@@ -362,6 +361,17 @@ static void mce_panic(char *msg, struct
panic(msg);
} else
pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
+}
+
+/*
+ * If an MCE happens to occur during the execution of a printk(), we want the
+ * MCE information to be displayed. But printk()'s recursion checking prevents
+ * that. So temporarily disable it.
+ */
+static void mce_panic(char *msg, struct mce *final, char *exp)
+{
+ printk_recursion_check_disable();
+ __mce_panic(msg, final, exp);
printk_recursion_check_enable();
}

_

2012-06-25 09:17:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface


* ShuoX Liu <[email protected]> wrote:

> From: ShuoX Liu <[email protected]>
>
> On x86 machines, some times MCE happens just when kernel calls printk
> to output some log info to serial console, while usually MCE module in
> kernel is used to print out some hardware error information, such like
> bad cache or bad memory bank. That causes printk recursion and printk
> would omit MCE printk output.
>
> We hit it when running MTBF testing on Android ATOM mobiles.
>
> Here in print_mce, we choose to disable printk recursion to make sure
> MCE logs printed out.
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: ShuoX Liu <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..6056e94 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -242,6 +242,7 @@ static void print_mce(struct mce *m)
> {
> int ret = 0;
>
> + printk_recursion_check_disable();
> pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
> m->extcpu, m->mcgstatus, m->bank, m->status);
>
> @@ -275,10 +276,13 @@ static void print_mce(struct mce *m)
> * (if the CPU has an implementation for that)
> */
> ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> - if (ret == NOTIFY_STOP)
> + if (ret == NOTIFY_STOP) {
> + printk_recursion_check_enable();
> return;
> + }
>
> pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
> + printk_recursion_check_enable();

Ok, this looks useful and it solves a real problem, but I'd
prefer a better interface: instead of exposing the guts of
printk to drivers in an unsafe manner (and allowing them to keep
printk in an unsafe state indefinitely), shouldn't we instead
introduce printk_emergency() (and variants) that just disable
the recursion check, do the printk and then enable them?

That way drivers cannot possibly leave the recursion check
disabled permanently and it would also be much more obvious
*which* actual printk sites are affected by this exception.

In theory this could be achieved via a new, super-high-prio
printk level: KERN_CRASH or so, which the core printk code could
check - without introducing a new side-facility.

Thanks,

Ingo

2012-06-25 13:14:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface

On Mon, 2012-06-25 at 11:17 +0200, Ingo Molnar wrote:
> * ShuoX Liu <[email protected]> wrote:
>
> > From: ShuoX Liu <[email protected]>
> >
> > On x86 machines, some times MCE happens just when kernel calls printk
> > to output some log info to serial console, while usually MCE module in
> > kernel is used to print out some hardware error information, such like
> > bad cache or bad memory bank. That causes printk recursion and printk
> > would omit MCE printk output.
> >
> > We hit it when running MTBF testing on Android ATOM mobiles.
> >
> > Here in print_mce, we choose to disable printk recursion to make sure
> > MCE logs printed out.
> >
> > Signed-off-by: Yanmin Zhang <[email protected]>
> > Signed-off-by: ShuoX Liu <[email protected]>
> > ---
> > arch/x86/kernel/cpu/mcheck/mce.c | 6 +++++-
> > 1 files changed, 5 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 2afcbd2..6056e94 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -242,6 +242,7 @@ static void print_mce(struct mce *m)
> > {
> > int ret = 0;
> >
> > + printk_recursion_check_disable();
> > pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
> > m->extcpu, m->mcgstatus, m->bank, m->status);
> >
> > @@ -275,10 +276,13 @@ static void print_mce(struct mce *m)
> > * (if the CPU has an implementation for that)
> > */
> > ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> > - if (ret == NOTIFY_STOP)
> > + if (ret == NOTIFY_STOP) {
> > + printk_recursion_check_enable();
> > return;
> > + }
> >
> > pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
> > + printk_recursion_check_enable();
>
> Ok, this looks useful and it solves a real problem, but I'd
> prefer a better interface: instead of exposing the guts of
> printk to drivers in an unsafe manner (and allowing them to keep
> printk in an unsafe state indefinitely), shouldn't we instead
> introduce printk_emergency() (and variants) that just disable
> the recursion check, do the printk and then enable them?

I really don't see why you'd do anything like the above. For one the
oops_in_progress thing is exported, so you could simply use that, you're
going to panic the machine here anyway, right?

Furthermore the whole recursion stuff is broken anyway, see:
http://marc.info/?l=linux-kernel&m=132446646923333

That also introduces recursive_printk() which you could (ab)use if
needed.

That said, its not entirely clear to me what context you're in when
you're calling this print_mce(), it looks like its only used from
mce_panic().

That already does bust_spinlocks() which already increments the
oops_in_progress thing, so WTF are you doing anyway?

2012-06-26 20:43:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface

On Mon, Jun 25, 2012 at 03:14:21PM +0200, Peter Zijlstra wrote:
> That already does bust_spinlocks() which already increments the
> oops_in_progress thing, so WTF are you doing anyway?

Hmm, that's interesting. So Shuox said they hit this in some sort of
"MTBF testing on Android ATOM mobiles":

http://marc.info/?l=linux-kernel&m=133903834107577&w=2

And I'm guessing they want this probably because they set fake_panic so
that we don't bust_spinlocks() in mce_panic().

Shoux, Yanmin, what's the dealio here?

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2012-06-26 20:46:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] x86 mce: use new printk recursion disabling interface

On Fri, Jun 22, 2012 at 04:41:43PM -0700, Andrew Morton wrote:
> A couple of things here.
>
> a) mce_panic() has a "return" statement deep inside. So we return
> from mce_panic() with the recursion check disabled. whoops.

Yeah, you're staring at v4 and we fixed this later (currently discussing
v7, please follow this thread :))

> b) adding a nice comment is nice.

Agreed, if the actual use case pans out and we really need this
interface...

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551