2017-04-11 22:46:35

by Verma, Vishal L

[permalink] [raw]
Subject: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

The NFIT MCE handler callback (for handling media errors on NVDIMMs)
takes a mutex to add the location of a memory error to a list. But since
the notifier call chain for machine checks (x86_mce_decoder_chain) is
atomic, we get a lockdep splat like:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
[..]
Call Trace:
dump_stack+0x86/0xc3
___might_sleep+0x178/0x240
__might_sleep+0x4a/0x80
mutex_lock_nested+0x43/0x3f0
? __lock_acquire+0xcbc/0x1290
nfit_handle_mce+0x33/0x180 [nfit]
notifier_call_chain+0x4a/0x70
atomic_notifier_call_chain+0x6e/0x110
? atomic_notifier_call_chain+0x5/0x110
mce_gen_pool_process+0x41/0x70

Commit 648ed94038c030245a06e4be59744fd5cdc18c40
x86/mce: Provide a lockless memory pool to save error records
Changes the mce notifier callbacks to be run in a process context, and
this can allow us to use the 'blocking' type notifier, where we can take
mutexes etc. in the call chain functions.

Reported-by: Ross Zwisler <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Vishal Verma <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce-genpool.c | 2 +-
arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++----
3 files changed, 6 insertions(+), 6 deletions(-)

While this patch almost solves the problem, I think it is not quite right.
The x86_mce_decoder_chain is also called from print_mce for fatal machine
checks, and that is, afaict, still from an atomic context. One thing Tony
suggested was splitting the notifier chain into two distinct chains, one
for regular logging and recoverable actions that allows blocking, the
other from the panic path.

diff --git a/arch/x86/kernel/cpu/mcheck/mce-genpool.c b/arch/x86/kernel/cpu/mcheck/mce-genpool.c
index 1e5a50c..217cd44 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-genpool.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-genpool.c
@@ -85,7 +85,7 @@ void mce_gen_pool_process(struct work_struct *__unused)
head = llist_reverse_order(head);
llist_for_each_entry_safe(node, tmp, head, llnode) {
mce = &node->mce;
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
+ blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
gen_pool_free(mce_evt_pool, (unsigned long)node, sizeof(*node));
}
}
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 903043e..19592ba 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -13,7 +13,7 @@ enum severity_level {
MCE_PANIC_SEVERITY,
};

-extern struct atomic_notifier_head x86_mce_decoder_chain;
+extern struct blocking_notifier_head x86_mce_decoder_chain;

#define ATTR_LEN 16
#define INITIAL_CHECK_INTERVAL 5 * 60 /* 5 minutes */
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8e9725c..b39ca29 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -121,7 +121,7 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
* CPU/chipset specific EDAC code can register a notifier call here to print
* MCE errors in a human-readable form.
*/
-ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
+BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);

/* Do initial initialization of a struct mce */
void mce_setup(struct mce *m)
@@ -218,7 +218,7 @@ void mce_register_decode_chain(struct notifier_block *nb)

WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC);

- atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
+ blocking_notifier_chain_register(&x86_mce_decoder_chain, nb);
}
EXPORT_SYMBOL_GPL(mce_register_decode_chain);

@@ -226,7 +226,7 @@ void mce_unregister_decode_chain(struct notifier_block *nb)
{
atomic_dec(&num_notifiers);

- atomic_notifier_chain_unregister(&x86_mce_decoder_chain, nb);
+ blocking_notifier_chain_unregister(&x86_mce_decoder_chain, nb);
}
EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);

@@ -327,7 +327,7 @@ static void print_mce(struct mce *m)
* Print out human-readable details about the MCE error,
* (if the CPU has an implementation for that)
*/
- ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
+ ret = blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
if (ret == NOTIFY_STOP)
return;

--
2.9.3


2017-04-12 09:15:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Tue, Apr 11, 2017 at 04:44:57PM -0600, Vishal Verma wrote:
> The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> takes a mutex to add the location of a memory error to a list. But since
> the notifier call chain for machine checks (x86_mce_decoder_chain) is
> atomic, we get a lockdep splat like:
>
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
> in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
> [..]
> Call Trace:
> dump_stack+0x86/0xc3
> ___might_sleep+0x178/0x240
> __might_sleep+0x4a/0x80
> mutex_lock_nested+0x43/0x3f0
> ? __lock_acquire+0xcbc/0x1290
> nfit_handle_mce+0x33/0x180 [nfit]
> notifier_call_chain+0x4a/0x70
> atomic_notifier_call_chain+0x6e/0x110
> ? atomic_notifier_call_chain+0x5/0x110
> mce_gen_pool_process+0x41/0x70
>
> Commit 648ed94038c030245a06e4be59744fd5cdc18c40
> x86/mce: Provide a lockless memory pool to save error records
> Changes the mce notifier callbacks to be run in a process context, and
> this can allow us to use the 'blocking' type notifier, where we can take
> mutexes etc. in the call chain functions.
>
> Reported-by: Ross Zwisler <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Dan Williams <[email protected]>
> Signed-off-by: Vishal Verma <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce-genpool.c | 2 +-
> arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +-
> arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++----
> 3 files changed, 6 insertions(+), 6 deletions(-)
>
> While this patch almost solves the problem, I think it is not quite right.
> The x86_mce_decoder_chain is also called from print_mce for fatal machine
> checks, and that is, afaict, still from an atomic context. One thing Tony
> suggested was splitting the notifier chain into two distinct chains, one
> for regular logging and recoverable actions that allows blocking, the
> other from the panic path.

Well, if Mohammad won't come to the mountain...

So the NFIT handler has:

/* We only care about memory errors */
if (!(mce->status & MCACOD))
return NOTIFY_DONE;

what severity are we talking here? Errors which can be reported on the
panic path, i.e., in atomic context or only AO/AR ones which don't raise
an #MC exception?

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-04-12 20:00:26

by Verma, Vishal L

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On 04/12, Borislav Petkov wrote:
> On Tue, Apr 11, 2017 at 04:44:57PM -0600, Vishal Verma wrote:
> > The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> > takes a mutex to add the location of a memory error to a list. But since
> > the notifier call chain for machine checks (x86_mce_decoder_chain) is
> > atomic, we get a lockdep splat like:
> >
> > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
> > in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
> > [..]
> > Call Trace:
> > dump_stack+0x86/0xc3
> > ___might_sleep+0x178/0x240
> > __might_sleep+0x4a/0x80
> > mutex_lock_nested+0x43/0x3f0
> > ? __lock_acquire+0xcbc/0x1290
> > nfit_handle_mce+0x33/0x180 [nfit]
> > notifier_call_chain+0x4a/0x70
> > atomic_notifier_call_chain+0x6e/0x110
> > ? atomic_notifier_call_chain+0x5/0x110
> > mce_gen_pool_process+0x41/0x70
> >
> > Commit 648ed94038c030245a06e4be59744fd5cdc18c40
> > x86/mce: Provide a lockless memory pool to save error records
> > Changes the mce notifier callbacks to be run in a process context, and
> > this can allow us to use the 'blocking' type notifier, where we can take
> > mutexes etc. in the call chain functions.
> >
> > Reported-by: Ross Zwisler <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tony Luck <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Signed-off-by: Vishal Verma <[email protected]>
> > ---
> > arch/x86/kernel/cpu/mcheck/mce-genpool.c | 2 +-
> > arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +-
> > arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++----
> > 3 files changed, 6 insertions(+), 6 deletions(-)
> >
> > While this patch almost solves the problem, I think it is not quite right.
> > The x86_mce_decoder_chain is also called from print_mce for fatal machine
> > checks, and that is, afaict, still from an atomic context. One thing Tony
> > suggested was splitting the notifier chain into two distinct chains, one
> > for regular logging and recoverable actions that allows blocking, the
> > other from the panic path.
>
> Well, if Mohammad won't come to the mountain...
>
> So the NFIT handler has:
>
> /* We only care about memory errors */
> if (!(mce->status & MCACOD))
> return NOTIFY_DONE;
>
> what severity are we talking here? Errors which can be reported on the
> panic path, i.e., in atomic context or only AO/AR ones which don't raise
> an #MC exception?

I don't think we can do anything about the panic path errors. The NFIT
handler takes the recoverable machine checks, and essentially, adds the
location to a list.

>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg)
> --

2017-04-12 20:22:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 01:59:03PM -0600, Vishal Verma wrote:
> I don't think we can do anything about the panic path errors.

Then the fix should be a lot easier:

---
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index 3ba1c3472cf9..44c092ec2ac9 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -25,6 +25,9 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
struct acpi_nfit_desc *acpi_desc;
struct nfit_spa *nfit_spa;

+ if (in_atomic())
+ return NOTIFY_DONE;
+
/* We only care about memory errors */
if (!(mce->status & MCACOD))
return NOTIFY_DONE;


--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-04-12 20:27:10

by Verma, Vishal L

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, 2017-04-12 at 22:22 +0200, Borislav Petkov wrote:
> On Wed, Apr 12, 2017 at 01:59:03PM -0600, Vishal Verma wrote:
> > I don't think we can do anything about the panic path errors.
>
> Then the fix should be a lot easier:
>
> ---
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index 3ba1c3472cf9..44c092ec2ac9 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -25,6 +25,9 @@ static int nfit_handle_mce(struct notifier_block
> *nb, unsigned long val,
>   struct acpi_nfit_desc *acpi_desc;
>   struct nfit_spa *nfit_spa;
>  
> + if (in_atomic())
> + return NOTIFY_DONE;

But isn't the atomic notifier call chain always called in atomic
context?

> +
>   /* We only care about memory errors */
>   if (!(mce->status & MCACOD))
>   return NOTIFY_DONE;
>
>
> -- 
> Regards/Gruss,
>     Boris.
>
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)

2017-04-12 20:52:32

by Luck, Tony

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
> > ? /* We only care about memory errors */
> > ? if (!(mce->status & MCACOD))
> > ? return NOTIFY_DONE;

N.B. that isn't a valid test that this is a memory error. You need


if (!(m->status & 0xef80) == BIT(7))
return NOTIFY_DONE;

See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes

-Tony

2017-04-12 20:55:36

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 1:52 PM, Luck, Tony <[email protected]> wrote:
> On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
>> > /* We only care about memory errors */
>> > if (!(mce->status & MCACOD))
>> > return NOTIFY_DONE;
>
> N.B. that isn't a valid test that this is a memory error. You need
>
>
> if (!(m->status & 0xef80) == BIT(7))
> return NOTIFY_DONE;
>
> See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes

But Vishal's point is that even if we get this check correct the
notifier still requires no sleeping operations. So we would need to
move recoverable notifications to a separate blocking notifier chain.

2017-04-12 21:12:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'


On Wed, 12 Apr 2017, Dan Williams wrote:

> On Wed, Apr 12, 2017 at 1:52 PM, Luck, Tony <[email protected]> wrote:
> > On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
> >> > /* We only care about memory errors */
> >> > if (!(mce->status & MCACOD))
> >> > return NOTIFY_DONE;
> >
> > N.B. that isn't a valid test that this is a memory error. You need
> >
> >
> > if (!(m->status & 0xef80) == BIT(7))
> > return NOTIFY_DONE;
> >
> > See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes
>
> But Vishal's point is that even if we get this check correct the
> notifier still requires no sleeping operations. So we would need to
> move recoverable notifications to a separate blocking notifier chain.

There is another solution:

Convert the notifier to a blocking notifier and in the panic case, ignore
the locking and invoke the notifier chain directly. That needs some minimal
surgery in the notifier code to allow that, but that's certainly less ugly
than splitting stuff up into two chains.

Thanks,

tglx



2017-04-12 21:13:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 08:27:05PM +0000, Verma, Vishal L wrote:
> But isn't the atomic notifier call chain always called in atomic
> context?

No, it isn't. We're calling it in normal process context in
mce_gen_pool_process() too.

So this early exit will avoid any sleeping in atomic context. And since
there's nothing you can do about the errors reported in atomic context,
we can actually use that fact.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-04-12 21:19:35

by Luck, Tony

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 11:12:21PM +0200, Thomas Gleixner wrote:
> There is another solution:
>
> Convert the notifier to a blocking notifier and in the panic case, ignore
> the locking and invoke the notifier chain directly. That needs some minimal
> surgery in the notifier code to allow that, but that's certainly less ugly
> than splitting stuff up into two chains.

But I wonder whether we actually want two chains. We've been adding a bunch
of general run-time logging and recovery stuff to this chain. So now we have
things there that aren't needed or useful in the panic case. E.g.
srao_decode_notifier() (which tries to offline a page that reported an
uncorrected error out of the execution path) and Boris's new CEC code.

-Tony

2017-04-12 21:48:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 02:19:32PM -0700, Luck, Tony wrote:
> On Wed, Apr 12, 2017 at 11:12:21PM +0200, Thomas Gleixner wrote:
> > There is another solution:
> >
> > Convert the notifier to a blocking notifier and in the panic case, ignore
> > the locking and invoke the notifier chain directly. That needs some minimal
> > surgery in the notifier code to allow that, but that's certainly less ugly
> > than splitting stuff up into two chains.
>
> But I wonder whether we actually want two chains. We've been adding a bunch
> of general run-time logging and recovery stuff to this chain. So now we have
> things there that aren't needed or useful in the panic case. E.g.
> srao_decode_notifier() (which tries to offline a page that reported an
> uncorrected error out of the execution path) and Boris's new CEC code.

I guess we'll have to. The CEC thing does mutex_lock() too and the
atomic notifier disables preemption:

__atomic_notifier_call_chain()
rcu_read_lock()
__rcu_read_lock()
if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
preempt_disable();

so we need to think about something better to handle events down the
whole chain. Maybe route events from the atomic path to the blocking
path where the sleeping notifier callbacks can sleep as much as they
want to...

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-04-12 21:50:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, 12 Apr 2017, Borislav Petkov wrote:

> On Wed, Apr 12, 2017 at 08:27:05PM +0000, Verma, Vishal L wrote:
> > But isn't the atomic notifier call chain always called in atomic
> > context?
>
> No, it isn't. We're calling it in normal process context in
> mce_gen_pool_process() too.
>
> So this early exit will avoid any sleeping in atomic context. And since
> there's nothing you can do about the errors reported in atomic context,
> we can actually use that fact.

No, you can't.

CONFIG_RCU_PREEMPT=n + CONFIG_PREEMPT_COUNT will disable preemption from
within __atomic_notifier_call_chain() via rcu_read_lock(). Ergo you wont
ever enter the handler.

The behaviour in the RCU code is inconsistent. CONFIG_RCU_PREEMPT=y does
obviouly not disable preemption, but it should still trigger the
might_sleep() check when a blocking function is called from within a rcu
read side critical section.

Thanks,

tglx





2017-04-12 22:17:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 11:47:49PM +0200, Borislav Petkov wrote:
> so we need to think about something better to handle events down the
> whole chain. Maybe route events from the atomic path to the blocking
> path where the sleeping notifier callbacks can sleep as much as they
> want to...

... and it is midnight here so I could be talking crap but we probably
should really split the reporting "chain" into two:

1. atomic context which doesn't do any notifier wankery. We simply call
non-sleeping functions one after the other. And those should be fast and
not taking any locks.

For example, I'm thinking you want to decode the error and that's it. So
we can replace the notifier call in print_mce() with our atomic context
function.

Then:

2. move the notifier completely in process context, convert it to a
blocking one and connect there all the noodling stuff like EDAC, NFIT,
mcelog, EXTLOG, ... I.e., all the logging machinery.

Problem solved.

And here's where you come in and say, "Boris, you're talking crap.." :-)

Lemme sleep on it.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-04-12 22:26:36

by Luck, Tony

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Thu, Apr 13, 2017 at 12:16:39AM +0200, Borislav Petkov wrote:
> ... and it is midnight here so I could be talking crap but we probably
> should really split the reporting "chain" into two:

This shouldn't be too painful. Users ask to be put on the chain via
the wrapper:

void mce_register_decode_chain(struct notifier_block *nb)
{
atomic_inc(&num_notifiers);

WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC);

atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
}

We can futz with that and have them specify which chain (or both)
that they want to be added to.

-Tony

2017-04-12 22:29:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 03:26:19PM -0700, Luck, Tony wrote:
> We can futz with that and have them specify which chain (or both)
> that they want to be added to.

Well, I didn't want the atomic chain to be a notifier because we can
keep it simple and non-blocking. Only the process context one will be.

So the question is, do we even have a use case for outside consumers
hanging on the atomic chain? Because if not, we're good to go.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-04-12 22:42:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 11:50:45PM +0200, Thomas Gleixner wrote:
> On Wed, 12 Apr 2017, Borislav Petkov wrote:
>
> > On Wed, Apr 12, 2017 at 08:27:05PM +0000, Verma, Vishal L wrote:
> > > But isn't the atomic notifier call chain always called in atomic
> > > context?
> >
> > No, it isn't. We're calling it in normal process context in
> > mce_gen_pool_process() too.
> >
> > So this early exit will avoid any sleeping in atomic context. And since
> > there's nothing you can do about the errors reported in atomic context,
> > we can actually use that fact.
>
> No, you can't.
>
> CONFIG_RCU_PREEMPT=n + CONFIG_PREEMPT_COUNT will disable preemption from
> within __atomic_notifier_call_chain() via rcu_read_lock(). Ergo you wont
> ever enter the handler.
>
> The behaviour in the RCU code is inconsistent. CONFIG_RCU_PREEMPT=y does
> obviouly not disable preemption, but it should still trigger the
> might_sleep() check when a blocking function is called from within a rcu
> read side critical section.

Maybe something like the (untested) patch below. Please note that this
would need some help to work correctly in -rt. This applies only against
-rcu tip, but in that case you can just get it directly from -rcu.

Thanx, Paul

------------------------------------------------------------------------

commit 122acec803471468d8a453d08219ca2fc94f5556
Author: Paul E. McKenney <[email protected]>
Date: Wed Apr 12 15:29:14 2017 -0700

rcu: Complain if blocking in preemptible RCU read-side critical section

Although preemptible RCU allows its read-side critical sections to be
preempted, general blocking is forbidden. The reason for this is that
excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a
voluntarily blocked task doesn't care how high you boost its priority.
Because preemptible RCU is a global mechanism, one ill-behaved reader
hurts everyone. Hence the prohibition against general blocking in
RCU-preempt read-side critical sections. Preemption yes, blocking no.

This commit enforces this prohibition.

There is a special exception for the -rt patchset (which they kindly
volunteered to implement): It is OK to block (as opposed to merely being
preempted) within an RCU-preempt read-side critical section, but only if
the blocking is subject to priority inheritance. This exception permits
CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble.

Why doesn't this exception also apply to mainline's rt_mutex? Because
of the possibility that someone does general blocking while holding
an rt_mutex. Yes, the priority boosting will affect the rt_mutex,
but it won't help with the task doing general blocking while holding
that rt_mutex.

Reported-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d013bd4767a7..abc09d368b3a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -465,7 +465,7 @@ void rcu_note_context_switch(bool preempt)
barrier(); /* Avoid RCU read-side critical sections leaking down. */
trace_rcu_utilization(TPS("Start context switch"));
rcu_sched_qs();
- rcu_preempt_note_context_switch();
+ rcu_preempt_note_context_switch(preempt);
/* Load rcu_urgent_qs before other flags. */
if (!smp_load_acquire(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs)))
goto out;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 0e598ab08fea..781fe684f230 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -476,7 +476,7 @@ DECLARE_PER_CPU(char, rcu_cpu_has_work);

/* Forward declarations for rcutree_plugin.h */
static void rcu_bootup_announce(void);
-static void rcu_preempt_note_context_switch(void);
+static void rcu_preempt_note_context_switch(bool preempt);
static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp);
#ifdef CONFIG_HOTPLUG_CPU
static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 6d8f7f82259c..67a90158f32e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -286,12 +286,13 @@ static void rcu_preempt_qs(void)
*
* Caller must disable interrupts.
*/
-static void rcu_preempt_note_context_switch(void)
+static void rcu_preempt_note_context_switch(bool preempt)
{
struct task_struct *t = current;
struct rcu_data *rdp;
struct rcu_node *rnp;

+ WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
if (t->rcu_read_lock_nesting > 0 &&
!t->rcu_read_unlock_special.b.blocked) {

@@ -738,7 +739,7 @@ static void __init rcu_bootup_announce(void)
* Because preemptible RCU does not exist, we never have to check for
* CPUs being in quiescent states.
*/
-static void rcu_preempt_note_context_switch(void)
+static void rcu_preempt_note_context_switch(bool preempt)
{
}


2017-04-12 23:46:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 03:42:32PM -0700, Paul E. McKenney wrote:
> On Wed, Apr 12, 2017 at 11:50:45PM +0200, Thomas Gleixner wrote:
> > On Wed, 12 Apr 2017, Borislav Petkov wrote:
> >
> > > On Wed, Apr 12, 2017 at 08:27:05PM +0000, Verma, Vishal L wrote:
> > > > But isn't the atomic notifier call chain always called in atomic
> > > > context?
> > >
> > > No, it isn't. We're calling it in normal process context in
> > > mce_gen_pool_process() too.
> > >
> > > So this early exit will avoid any sleeping in atomic context. And since
> > > there's nothing you can do about the errors reported in atomic context,
> > > we can actually use that fact.
> >
> > No, you can't.
> >
> > CONFIG_RCU_PREEMPT=n + CONFIG_PREEMPT_COUNT will disable preemption from
> > within __atomic_notifier_call_chain() via rcu_read_lock(). Ergo you wont
> > ever enter the handler.
> >
> > The behaviour in the RCU code is inconsistent. CONFIG_RCU_PREEMPT=y does
> > obviouly not disable preemption, but it should still trigger the
> > might_sleep() check when a blocking function is called from within a rcu
> > read side critical section.
>
> Maybe something like the (untested) patch below. Please note that this
> would need some help to work correctly in -rt. This applies only against
> -rcu tip, but in that case you can just get it directly from -rcu.

So I injected a schedule_timeout_interruptible() into rcutorture's RCU
read-side critical section, and the patch complained as expected. But is
also got a "scheduling while atomic" and a "DEBUG_LOCKS_WARN_ON(val >
preempt_count())" and a warning at "kernel/time/timer.c:1275", which
is this:

if (count != preempt_count()) {
WARN_ONCE(1, "timer: %pF preempt leak: %08x -> %08x\n",
fn, count, preempt_count());
/*
* Restore the preempt count. That gives us a decent
* chance to survive and extract information. If the
* callback kept a lock held, bad luck, but not worse
* than the BUG() we had.
*/
preempt_count_set(count);
}

So you are saying that you are seeing blocking in RCU-preempt read-side
critical sections being ignored?

Here is the Kconfig fragment used by this test:

CONFIG_SMP=y
CONFIG_NR_CPUS=8
CONFIG_PREEMPT_NONE=n
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=y
#CHECK#CONFIG_PREEMPT_RCU=y
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ_FULL=n
CONFIG_RCU_FAST_NO_HZ=n
CONFIG_RCU_TRACE=n
CONFIG_HOTPLUG_CPU=n
CONFIG_SUSPEND=n
CONFIG_HIBERNATION=n
CONFIG_RCU_FANOUT=3
CONFIG_RCU_FANOUT_LEAF=3
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=n
CONFIG_RCU_BOOST=n
CONFIG_RCU_EXPERT=y
CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP=y
CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y
CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y

I will run other scenarios overnight.

Thanx, Paul

2017-04-13 11:32:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Thu, Apr 13, 2017 at 12:29:25AM +0200, Borislav Petkov wrote:
> On Wed, Apr 12, 2017 at 03:26:19PM -0700, Luck, Tony wrote:
> > We can futz with that and have them specify which chain (or both)
> > that they want to be added to.
>
> Well, I didn't want the atomic chain to be a notifier because we can
> keep it simple and non-blocking. Only the process context one will be.
>
> So the question is, do we even have a use case for outside consumers
> hanging on the atomic chain? Because if not, we're good to go.

Ok, new day, new patch.

Below is what we could do: we don't call the notifier at all on the
atomic path but only print the MCEs. We do log them and if the machine
survives, we process them accordingly. This is only a fix for upstream
so that the current issue at hand is addressed.

For later, we'd need to split the paths in:

critical_print_mce()

or somesuch which immediately dumps the MCE to dmesg, and

mce_log()

which does the slow path of logging MCEs and calling the blocking
notifier.

Now, I'd want to have decoding of the MCE on the critical path too so
I have to think about how to do that nicely. Maybe move the decoding
bits which are the same between Intel and AMD in mce.c and have some
vendor-specific, fast calls. We'll see. Btw, this is something Ingo has
been mentioning for a while.

Anyway, here's just the urgent fix for now.

Thanks.

---
From: Vishal Verma <[email protected]>
Date: Tue, 11 Apr 2017 16:44:57 -0600
Subject: [PATCH] x86/mce: Make the MCE notifier a blocking one

The NFIT MCE handler callback (for handling media errors on NVDIMMs)
takes a mutex to add the location of a memory error to a list. But since
the notifier call chain for machine checks (x86_mce_decoder_chain) is
atomic, we get a lockdep splat like:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
[..]
Call Trace:
dump_stack
___might_sleep
__might_sleep
mutex_lock_nested
? __lock_acquire
nfit_handle_mce
notifier_call_chain
atomic_notifier_call_chain
? atomic_notifier_call_chain
mce_gen_pool_process

Convert the notifier to a blocking one which gets to run only in process
context.

Boris: remove the notifier call in atomic context in print_mce(). For
now, let's print the MCE on the atomic path so that we can make sure it
goes out. We still log it for process context later.

Reported-by: Ross Zwisler <[email protected]>
Signed-off-by: Vishal Verma <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: linux-edac <[email protected]>
Cc: x86-ml <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce-genpool.c | 2 +-
arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++--------------
3 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-genpool.c b/arch/x86/kernel/cpu/mcheck/mce-genpool.c
index 1e5a50c11d3c..217cd4449bc9 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-genpool.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-genpool.c
@@ -85,7 +85,7 @@ void mce_gen_pool_process(struct work_struct *__unused)
head = llist_reverse_order(head);
llist_for_each_entry_safe(node, tmp, head, llnode) {
mce = &node->mce;
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
+ blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
gen_pool_free(mce_evt_pool, (unsigned long)node, sizeof(*node));
}
}
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 903043e6a62b..19592ba1a320 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -13,7 +13,7 @@ enum severity_level {
MCE_PANIC_SEVERITY,
};

-extern struct atomic_notifier_head x86_mce_decoder_chain;
+extern struct blocking_notifier_head x86_mce_decoder_chain;

#define ATTR_LEN 16
#define INITIAL_CHECK_INTERVAL 5 * 60 /* 5 minutes */
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5accfbdee3f0..8e470735b16b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -123,7 +123,7 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
* CPU/chipset specific EDAC code can register a notifier call here to print
* MCE errors in a human-readable form.
*/
-ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
+BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);

/* Do initial initialization of a struct mce */
void mce_setup(struct mce *m)
@@ -220,7 +220,7 @@ void mce_register_decode_chain(struct notifier_block *nb)

WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC);

- atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
+ blocking_notifier_chain_register(&x86_mce_decoder_chain, nb);
}
EXPORT_SYMBOL_GPL(mce_register_decode_chain);

@@ -228,7 +228,7 @@ void mce_unregister_decode_chain(struct notifier_block *nb)
{
atomic_dec(&num_notifiers);

- atomic_notifier_chain_unregister(&x86_mce_decoder_chain, nb);
+ blocking_notifier_chain_unregister(&x86_mce_decoder_chain, nb);
}
EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);

@@ -321,18 +321,8 @@ static void __print_mce(struct mce *m)

static void print_mce(struct mce *m)
{
- int ret = 0;
-
__print_mce(m);
-
- /*
- * Print out human-readable details about the MCE error,
- * (if the CPU has an implementation for that)
- */
- ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
- if (ret == NOTIFY_STOP)
- return;
-
+ mce_log(m);
pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
}

--
2.11.0

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-04-13 12:12:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Thu, Apr 13, 2017 at 01:31:59PM +0200, Borislav Petkov wrote:
> @@ -321,18 +321,8 @@ static void __print_mce(struct mce *m)
>
> static void print_mce(struct mce *m)
> {
> - int ret = 0;
> -
> __print_mce(m);
> -
> - /*
> - * Print out human-readable details about the MCE error,
> - * (if the CPU has an implementation for that)
> - */
> - ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> - if (ret == NOTIFY_STOP)
> - return;
> -
> + mce_log(m);

Actually, we don't need that call here because do_machine_check()
already does it.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2017-04-13 14:34:16

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Wed, Apr 12, 2017 at 04:45:59PM -0700, Paul E. McKenney wrote:
> On Wed, Apr 12, 2017 at 03:42:32PM -0700, Paul E. McKenney wrote:
> > On Wed, Apr 12, 2017 at 11:50:45PM +0200, Thomas Gleixner wrote:
> > > On Wed, 12 Apr 2017, Borislav Petkov wrote:
> > >
> > > > On Wed, Apr 12, 2017 at 08:27:05PM +0000, Verma, Vishal L wrote:
> > > > > But isn't the atomic notifier call chain always called in atomic
> > > > > context?
> > > >
> > > > No, it isn't. We're calling it in normal process context in
> > > > mce_gen_pool_process() too.
> > > >
> > > > So this early exit will avoid any sleeping in atomic context. And since
> > > > there's nothing you can do about the errors reported in atomic context,
> > > > we can actually use that fact.
> > >
> > > No, you can't.
> > >
> > > CONFIG_RCU_PREEMPT=n + CONFIG_PREEMPT_COUNT will disable preemption from
> > > within __atomic_notifier_call_chain() via rcu_read_lock(). Ergo you wont
> > > ever enter the handler.
> > >
> > > The behaviour in the RCU code is inconsistent. CONFIG_RCU_PREEMPT=y does
> > > obviouly not disable preemption, but it should still trigger the
> > > might_sleep() check when a blocking function is called from within a rcu
> > > read side critical section.
> >
> > Maybe something like the (untested) patch below. Please note that this
> > would need some help to work correctly in -rt. This applies only against
> > -rcu tip, but in that case you can just get it directly from -rcu.
>
> So I injected a schedule_timeout_interruptible() into rcutorture's RCU
> read-side critical section, and the patch complained as expected. But is
> also got a "scheduling while atomic" and a "DEBUG_LOCKS_WARN_ON(val >
> preempt_count())" and a warning at "kernel/time/timer.c:1275", which
> is this:
>
> if (count != preempt_count()) {
> WARN_ONCE(1, "timer: %pF preempt leak: %08x -> %08x\n",
> fn, count, preempt_count());
> /*
> * Restore the preempt count. That gives us a decent
> * chance to survive and extract information. If the
> * callback kept a lock held, bad luck, but not worse
> * than the BUG() we had.
> */
> preempt_count_set(count);
> }
>
> So you are saying that you are seeing blocking in RCU-preempt read-side
> critical sections being ignored?
>
> Here is the Kconfig fragment used by this test:
>
> CONFIG_SMP=y
> CONFIG_NR_CPUS=8
> CONFIG_PREEMPT_NONE=n
> CONFIG_PREEMPT_VOLUNTARY=n
> CONFIG_PREEMPT=y
> #CHECK#CONFIG_PREEMPT_RCU=y
> CONFIG_HZ_PERIODIC=n
> CONFIG_NO_HZ_IDLE=y
> CONFIG_NO_HZ_FULL=n
> CONFIG_RCU_FAST_NO_HZ=n
> CONFIG_RCU_TRACE=n
> CONFIG_HOTPLUG_CPU=n
> CONFIG_SUSPEND=n
> CONFIG_HIBERNATION=n
> CONFIG_RCU_FANOUT=3
> CONFIG_RCU_FANOUT_LEAF=3
> CONFIG_RCU_NOCB_CPU=n
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=n
> CONFIG_RCU_BOOST=n
> CONFIG_RCU_EXPERT=y
> CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP=y
> CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y
> CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT=y
> CONFIG_DEBUG_OBJECTS=y
> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
>
> I will run other scenarios overnight.

Well, that was an extremely poor choice of scenario to test. Yes, it
has CONFIG_PREEMPT_RCU=y, but it tests RCU-bh because the TREE01.boot
file contains "rcutorture.torture_type=rcu_bh maxcpus=8". Hence all the
"scheduling while atomic" and other complaints. :-/

Without the patch, RCU-preempt does not complain, as reported, as
shown below. Other RCU flavors always give "scheduling while atomic".
With the patch, all flavors always complain, including RCU-preempt.

TREE01 ------- 18589 grace periods (103.272 per second)
BUG: 2028 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE01
BUG: FAILURE, 1982 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE01/console.log
WARNING: Summary: Warnings: 2 Bugs: 2033 Call Traces: 2034
----> Lots of:
"scheduling while atomic"
"DEBUG_LOCKS_WARN_ON(val > preempt_count())"
----> Preemptible, but testing RCU-bh, thus blocking makes too-short GP.
Failures are thus expected behavior.
CONFIG_HOTPLUG_CPU=y, CONFIG_MAXSMP=y, CONFIG_CPUMASK_OFFSTACK=y,
CONFIG_RCU_NOCB_CPU_ZERO=y.

TREE02 ------- 3270 grace periods (18.1667 per second)
----> Preemptible, CONFIG_HOTPLUG_CPU=n, CONFIG_DEBUG_LOCK_ALLOC=y,
CONFIG_PROVE_LOCKING=n. No complaints.

TREE03 ------- 2963 grace periods (16.4611 per second)
CPU count limited from 16 to 8 (Yeah, yeah, wimpy laptop.)
----> Preemptible, CONFIG_HOTPLUG_CPU=y, CONFIG_RCU_BOOST=y,
CONFIG_RCU_KTHREAD_PRIO=2. Again, no complaints.

TREE04 ------- 272 grace periods (1.51111 per second)
BUG: 33 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE04
BUG: FAILURE, 72 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE04/console.log
WARNING: Summary: Warnings: 1 Bugs: 30 Call Traces: 31 Stalls: 1
!!! PID 29727 hung at 360 vs. 180 seconds
----> Nonpreemptible, so failures are expected behavior, especially given that
this is testing RCU-bh.
"scheduling while atomic"

TREE05 ------- 4520 grace periods (25.1111 per second)
BUG: 1770 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE05
BUG: FAILURE, 1358 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE05/console.log
WARNING: Summary: Warnings: 1 Bugs: 2 Call Traces: 37 Stalls: 14 Starves: 2
!!! PID 13911 hung at 370 vs. 180 seconds
----> Nonpreemptible, so failures are expected behavior. Also testing RCU-sched.
"scheduling while atomic"

TREE06 ------- 4012 grace periods (22.2889 per second)
BUG: 2314 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE06
BUG: FAILURE, 1783 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE06/console.log
WARNING: Summary: Warnings: 3 Bugs: 1 Call Traces: 4
----> Nonpreemptible, so failures are expected behavior.
CONFIG_PROVE_RCU=y, hence:
"Illegal context switch in RCU read-side critical section!"
Also:
"scheduling while atomic"

TREE07 ------- 1853 grace periods (10.2944 per second)
BUG: 278 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE07
BUG: FAILURE, 621 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE07/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1
CPU count limited from 16 to 8
----> Nonpreemptible, so failures are expected behavior.
But no complaints other than from rcutorture.

TREE08 ------- 2725 grace periods (15.1389 per second)
BUG: 2265 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE08
BUG: FAILURE, 1152 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE08/console.log
WARNING: Summary: Warnings: 3 Bugs: 4290 Call Traces: 4292
----> Lots of:
"scheduling while atomic"
"DEBUG_LOCKS_WARN_ON(val > preempt_count())"
----> Preemptible, but testing RCU-bh, so failures are expected behavior.
CONFIG_HOTPLUG_CPU=n, CONFIG_PROVE_LOCKING=n, CONFIG_RCU_EQS_DEBUG=y.

TREE09 ------- 12148 grace periods (67.4889 per second)
----> Preemptible. Also !SMP. No complaints.

------------------------------------------------------------------------

Now with the patch, we always get a splat no matter what:

TREE01 ------- 12418 grace periods (68.9889 per second)
BUG: 2768 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE01
BUG: FAILURE, 2692 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE01/console.log
WARNING: Summary: Warnings: 2 Bugs: 2786 Call Traces: 2787

TREE02 ------- 3221 grace periods (17.8944 per second)
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE02/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1

TREE03 ------- 3129 grace periods (17.3833 per second)
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE03/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1
CPU count limited from 16 to 8

TREE04 ------- 497 grace periods (2.76111 per second)
BUG: 38 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE04
BUG: FAILURE, 99 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE04/console.log
WARNING: Summary: Warnings: 1 Bugs: 34 Call Traces: 35 Stalls: 1
!!! PID 12749 hung at 360 vs. 180 seconds

TREE05 ------- 5793 grace periods (32.1833 per second)
BUG: 2138 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE05
BUG: FAILURE, 1672 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE05/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1

TREE06 ------- 4072 grace periods (22.6222 per second)
BUG: 2339 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE06
BUG: FAILURE, 1762 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE06/console.log
WARNING: Summary: Warnings: 2 Call Traces: 2

TREE07 ------- 1863 grace periods (10.35 per second)
BUG: 288 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE07
BUG: FAILURE, 590 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE07/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1
CPU count limited from 16 to 8

TREE08 ------- 2628 grace periods (14.6 per second)
BUG: 2203 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE08
BUG: FAILURE, 1086 instances

WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE08/console.log
WARNING: Summary: Warnings: 2 Bugs: 4265 Call Traces: 4266

TREE09 ------- 12109 grace periods (67.2722 per second)
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE09/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1

------------------------------------------------------------------------

So I have the patch queued for 4.13.

Thanx, Paul

2017-04-18 16:28:26

by Luck, Tony

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Thu, Apr 13, 2017 at 02:12:16PM +0200, Borislav Petkov wrote:
> On Thu, Apr 13, 2017 at 01:31:59PM +0200, Borislav Petkov wrote:
> > @@ -321,18 +321,8 @@ static void __print_mce(struct mce *m)
> >
> > static void print_mce(struct mce *m)
> > {
> > - int ret = 0;
> > -
> > __print_mce(m);
> > -
> > - /*
> > - * Print out human-readable details about the MCE error,
> > - * (if the CPU has an implementation for that)
> > - */
> > - ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> > - if (ret == NOTIFY_STOP)
> > - return;
> > -
> > + mce_log(m);
>
> Actually, we don't need that call here because do_machine_check()
> already does it.

Yes. Don't add mce_log(m) here. We've already done it.

With this change:

Acked-by: Tony Luck <[email protected]>

-Tony

Subject: [tip:ras/urgent] x86/mce: Make the MCE notifier a blocking one

Commit-ID: 0dc9c639e6553e39c13b2c0d54c8a1b098cb95e2
Gitweb: http://git.kernel.org/tip/0dc9c639e6553e39c13b2c0d54c8a1b098cb95e2
Author: Vishal Verma <[email protected]>
AuthorDate: Tue, 18 Apr 2017 20:42:35 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Tue, 18 Apr 2017 22:23:48 +0200

x86/mce: Make the MCE notifier a blocking one

The NFIT MCE handler callback (for handling media errors on NVDIMMs)
takes a mutex to add the location of a memory error to a list. But since
the notifier call chain for machine checks (x86_mce_decoder_chain) is
atomic, we get a lockdep splat like:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
[..]
Call Trace:
dump_stack
___might_sleep
__might_sleep
mutex_lock_nested
? __lock_acquire
nfit_handle_mce
notifier_call_chain
atomic_notifier_call_chain
? atomic_notifier_call_chain
mce_gen_pool_process

Convert the notifier to a blocking one which gets to run only in process
context.

Boris: remove the notifier call in atomic context in print_mce(). For
now, let's print the MCE on the atomic path so that we can make sure
they go out and get logged at least.

Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Reported-by: Ross Zwisler <[email protected]>
Signed-off-by: Vishal Verma <[email protected]>
Acked-by: Tony Luck <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: linux-edac <[email protected]>
Cc: x86-ml <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce-genpool.c | 2 +-
arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 17 +++--------------
3 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-genpool.c b/arch/x86/kernel/cpu/mcheck/mce-genpool.c
index 1e5a50c..217cd44 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-genpool.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-genpool.c
@@ -85,7 +85,7 @@ void mce_gen_pool_process(struct work_struct *__unused)
head = llist_reverse_order(head);
llist_for_each_entry_safe(node, tmp, head, llnode) {
mce = &node->mce;
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
+ blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
gen_pool_free(mce_evt_pool, (unsigned long)node, sizeof(*node));
}
}
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 903043e..19592ba 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -13,7 +13,7 @@ enum severity_level {
MCE_PANIC_SEVERITY,
};

-extern struct atomic_notifier_head x86_mce_decoder_chain;
+extern struct blocking_notifier_head x86_mce_decoder_chain;

#define ATTR_LEN 16
#define INITIAL_CHECK_INTERVAL 5 * 60 /* 5 minutes */
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5accfbd..af44ebe 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -123,7 +123,7 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
* CPU/chipset specific EDAC code can register a notifier call here to print
* MCE errors in a human-readable form.
*/
-ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
+BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);

/* Do initial initialization of a struct mce */
void mce_setup(struct mce *m)
@@ -220,7 +220,7 @@ void mce_register_decode_chain(struct notifier_block *nb)

WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC);

- atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
+ blocking_notifier_chain_register(&x86_mce_decoder_chain, nb);
}
EXPORT_SYMBOL_GPL(mce_register_decode_chain);

@@ -228,7 +228,7 @@ void mce_unregister_decode_chain(struct notifier_block *nb)
{
atomic_dec(&num_notifiers);

- atomic_notifier_chain_unregister(&x86_mce_decoder_chain, nb);
+ blocking_notifier_chain_unregister(&x86_mce_decoder_chain, nb);
}
EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);

@@ -321,18 +321,7 @@ static void __print_mce(struct mce *m)

static void print_mce(struct mce *m)
{
- int ret = 0;
-
__print_mce(m);
-
- /*
- * Print out human-readable details about the MCE error,
- * (if the CPU has an implementation for that)
- */
- ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
- if (ret == NOTIFY_STOP)
- return;
-
pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
}


2017-04-21 21:39:52

by Verma, Vishal L

[permalink] [raw]
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

On Thu, 2017-04-13 at 13:31 +0200, Borislav Petkov wrote:
> On Thu, Apr 13, 2017 at 12:29:25AM +0200, Borislav Petkov wrote:
> > On Wed, Apr 12, 2017 at 03:26:19PM -0700, Luck, Tony wrote:
> > > We can futz with that and have them specify which chain (or both)
> > > that they want to be added to.
> >
> > Well, I didn't want the atomic chain to be a notifier because we can
> > keep it simple and non-blocking. Only the process context one will
> > be.
> >
> > So the question is, do we even have a use case for outside consumers
> > hanging on the atomic chain? Because if not, we're good to go.
>
> Ok, new day, new patch.
>
> Below is what we could do: we don't call the notifier at all on the
> atomic path but only print the MCEs. We do log them and if the machine
> survives, we process them accordingly. This is only a fix for upstream
> so that the current issue at hand is addressed.
>
> For later, we'd need to split the paths in:
>
> critical_print_mce()
>
> or somesuch which immediately dumps the MCE to dmesg, and
>
> mce_log()
>
> which does the slow path of logging MCEs and calling the blocking
> notifier.
>
> Now, I'd want to have decoding of the MCE on the critical path too so
> I have to think about how to do that nicely. Maybe move the decoding
> bits which are the same between Intel and AMD in mce.c and have some
> vendor-specific, fast calls. We'll see. Btw, this is something Ingo
> has
> been mentioning for a while.
>
> Anyway, here's just the urgent fix for now.
>
> Thanks.
>
> ---
> From: Vishal Verma <[email protected]>
> Date: Tue, 11 Apr 2017 16:44:57 -0600
> Subject: [PATCH] x86/mce: Make the MCE notifier a blocking one
>
> The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> takes a mutex to add the location of a memory error to a list. But
> since
> the notifier call chain for machine checks (x86_mce_decoder_chain) is
> atomic, we get a lockdep splat like:
>
>   BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:620
>   in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
>   [..]
>   Call Trace:
>    dump_stack
>    ___might_sleep
>    __might_sleep
>    mutex_lock_nested
>    ? __lock_acquire
>    nfit_handle_mce
>    notifier_call_chain
>    atomic_notifier_call_chain
>    ? atomic_notifier_call_chain
>    mce_gen_pool_process
>
> Convert the notifier to a blocking one which gets to run only in
> process
> context.
>
> Boris: remove the notifier call in atomic context in print_mce(). For
> now, let's print the MCE on the atomic path so that we can make sure
> it
> goes out. We still log it for process context later.
>
> Reported-by: Ross Zwisler <[email protected]>
> Signed-off-by: Vishal Verma <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: linux-edac <[email protected]>
> Cc: x86-ml <[email protected]>
> Cc: <[email protected]>
> Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@i
> ntel.com
> Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media
> error")
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
>  arch/x86/kernel/cpu/mcheck/mce-genpool.c  |  2 +-
>  arch/x86/kernel/cpu/mcheck/mce-internal.h |  2 +-
>  arch/x86/kernel/cpu/mcheck/mce.c          | 18 ++++--------------
>  3 files changed, 6 insertions(+), 16 deletions(-)
>

I noticed this patch was picked up in tip, in ras/urgent, but didn't see
a pull request for 4.11 - was this the intention? Or will it just be
added for 4.12?

-Vishal