LinuxLists.cc - [PATCH] x86, amd, mce: Prevent potential cpu-online oops

2013-04-04 15:54:19

Subject: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On platforms where all Northbridges may not be visible (due to routing, eg on
NumaConnect systems), prevent oopsing due to stale pointer access when
offlining cores.

Signed-off-by: Steffen Persvold <[email protected]>
Signed-off-by: Daniel J Blueman <[email protected]>

---
arch/x86/kernel/cpu/mcheck/mce_amd.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 1ac581f..53a58c2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -578,8 +578,11 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
if (shared_bank[bank]) {
nb = node_to_amd_nb(amd_get_nb_id(cpu));

+ if (WARN_ON_ONCE(!nb))
+ goto out;
+
/* threshold descriptor already initialized on this node? */
- if (nb && nb->bank4) {
+ if (nb->bank4) {
/* yes, use it */
b = nb->bank4;
err = kobject_add(b->kobj, &dev->kobj, name);
@@ -613,10 +616,8 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
atomic_set(&b->cpus, 1);

/* nb is already initialized, see above */
- if (nb) {
- WARN_ON(nb->bank4);
- nb->bank4 = b;
- }
+ WARN_ON(nb->bank4);
+ nb->bank4 = b;
}

err = allocate_threshold_blocks(cpu, bank, 0,
--
1.7.4.1

2013-04-04 16:05:30

by Tony Luck

[permalink] [raw]

Subject: RE: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

+ if (WARN_ON_ONCE(!nb))
+ goto out;
+

WARN_ON_ONCE() will drop a stack trace to the console - is that going to be useful?

If you want a message perhaps:

if (!nb) {
printk_once("something interesting about not having access to north bridge\n")
goto out;
}

-Tony

2013-04-04 16:13:46

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On Thu, Apr 04, 2013 at 11:52:00PM +0800, Daniel J Blueman wrote:
> On platforms where all Northbridges may not be visible (due to routing, eg on
> NumaConnect systems), prevent oopsing due to stale pointer access when
> offlining cores.
>
> Signed-off-by: Steffen Persvold <[email protected]>
> Signed-off-by: Daniel J Blueman <[email protected]>

Huh, what's up?

This one is almost reverting 21c5e50e15b1a which you wrote in the first
place. What's happening? What stale pointer access, where? We have the
if (nb ..) guards there.

This commit message needs a *lot* more explanation about what's going
on and why we're reverting 21c5e50e15b1a. And why the special handling
for shared banks? I presume you offline some of the cores and there's a
dangling pointer but again, there are the nb validity guards...

/me is genuinely confused.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-04 19:07:37

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On Thu, Apr 04, 2013 at 08:05:46PM +0200, Steffen Persvold wrote:
> It made more sense (to me) to skip the creation of MC4 all together
> if you can't find the matching northbridge since you can't reliably
> do the dec_and_test() reference counting on the shared bank when you
> don't have the common NB struct for all the shared cores.
>
> Or am I just smoking the wrong stuff ?

No, actually *this* explanation should've been in the commit message.
You numascale people do crazy things with the hardware :) so explaining
yourself more verbosely is an absolute must if anyone is to understand
why you're changing the code.

So please write a detailed commit message why you need this change,
don't be afraid to talk about the big picture.

Also, I'm guessing this is urgent stuff and it needs to go into 3.9?
Yes, no? If yes, this patch should probably be tagged for stable.

Also, please redo this patch against tip:x86/ras which already has
patches touching mce_amd.c.

Oh, and lastly, needless to say, it needs to be tested on a "normal",
i.e. !numascale AMD multinode box, in case you haven't done so yet. :-)

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-04 19:13:12

by Steffen Persvold

[permalink] [raw]

Subject: Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On 4/4/2013 6:13 PM, Borislav Petkov wrote:
> On Thu, Apr 04, 2013 at 11:52:00PM +0800, Daniel J Blueman wrote:
>> On platforms where all Northbridges may not be visible (due to routing, eg on
>> NumaConnect systems), prevent oopsing due to stale pointer access when
>> offlining cores.
>>
>> Signed-off-by: Steffen Persvold <[email protected]>
>> Signed-off-by: Daniel J Blueman <[email protected]>
>
> Huh, what's up?
>
> This one is almost reverting 21c5e50e15b1a which you wrote in the first
> place. What's happening? What stale pointer access, where? We have the
> if (nb ..) guards there.
>
> This commit message needs a *lot* more explanation about what's going
> on and why we're reverting 21c5e50e15b1a. And why the special handling
> for shared banks? I presume you offline some of the cores and there's a
> dangling pointer but again, there are the nb validity guards...
>
> /me is genuinely confused.
>

You get oopses when offlining cores when there's no NB struct for the shared MC4 bank. In threshold_remove_bank(), there's no "if (!nb)" guard :

if (shared_bank[bank]) {
if (!atomic_dec_and_test(&b->cpus)) {
__threshold_remove_blocks(b);
per_cpu(threshold_banks, cpu)[bank] = NULL;
return;
} else {
/*
* the last CPU on this node using the shared bank is
* going away, remove that bank now.
*/
nb = node_to_amd_nb(amd_get_nb_id(cpu));
nb->bank4 = NULL;
}
}

nb->bank4 = NULL will oops, since nb is NULL.

It made more sense (to me) to skip the creation of MC4 all together if you can't find the matching northbridge since you can't reliably do the dec_and_test() reference counting on the shared bank when you don't have the common NB struct for all the shared cores.

Or am I just smoking the wrong stuff ?

Cheers,
Steffen

2013-04-04 20:01:22

by Steffen Persvold

[permalink] [raw]

Subject: Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On 4/4/2013 9:07 PM, Borislav Petkov wrote:
> On Thu, Apr 04, 2013 at 08:05:46PM +0200, Steffen Persvold wrote:
>> It made more sense (to me) to skip the creation of MC4 all together
>> if you can't find the matching northbridge since you can't reliably
>> do the dec_and_test() reference counting on the shared bank when you
>> don't have the common NB struct for all the shared cores.
>>
>> Or am I just smoking the wrong stuff ?
>
> No, actually *this* explanation should've been in the commit message.
> You numascale people do crazy things with the hardware :) so explaining
> yourself more verbosely is an absolute must if anyone is to understand
> why you're changing the code.

Ok :)

>
> So please write a detailed commit message why you need this change,
> don't be afraid to talk about the big picture.

Will do.

>
> Also, I'm guessing this is urgent stuff and it needs to go into 3.9?
> Yes, no? If yes, this patch should probably be tagged for stable.

Yes. We found the issue on -stable at first (3.8.2 iirc) because it
doesn't have the multi-domain support we needed (which is added in 3.9).

>
> Also, please redo this patch against tip:x86/ras which already has
> patches touching mce_amd.c.

Ok.

>
> Oh, and lastly, needless to say, it needs to be tested on a "normal",
> i.e. !numascale AMD multinode box, in case you haven't done so yet. :-)
>

It has been tested on "normal" platforms and NumaConnect platforms
(Fam10h and Fam15h AMD processors, SCM and MCM versions).

Cheers,
Steffen

2013-04-09 09:25:27

by Steffen Persvold

[permalink] [raw]

Subject: Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On 4/4/2013 9:07 PM, Borislav Petkov wrote:
> On Thu, Apr 04, 2013 at 08:05:46PM +0200, Steffen Persvold wrote:
>> It made more sense (to me) to skip the creation of MC4 all together
>> if you can't find the matching northbridge since you can't reliably
>> do the dec_and_test() reference counting on the shared bank when you
>> don't have the common NB struct for all the shared cores.
>>
>> Or am I just smoking the wrong stuff ?
>
> No, actually *this* explanation should've been in the commit message.
> You numascale people do crazy things with the hardware :) so explaining
> yourself more verbosely is an absolute must if anyone is to understand
> why you're changing the code.
>

Boris,

A question came up. Why have this "shared" bank concept for the kobjects
at all ? What's the advantage ? Before our patch, when running on our
architecture but without pci domains for "slave" servers, everything was
working fine except the de-allocation oops due to the NULL pointer when
offlining cores.

Why not let all cores just create their individual kobject and skip this
"shared" nb->bank4 concept ? Any disadvantage to that (apart from the
obvious storage bloat?).

Cheers,
Steffen

2013-04-09 09:39:02

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On Tue, Apr 09, 2013 at 11:25:16AM +0200, Steffen Persvold wrote:
> Why not let all cores just create their individual kobject and skip
> this "shared" nb->bank4 concept ? Any disadvantage to that (apart from
> the obvious storage bloat?).

Well, bank4 is shared across cores on the northbridge in *hardware*.
So it is only logical to represent the hardware layout correctly in
software.

Also, if you want to configure any settings over one core's sysfs nodes,
you want those to be visible across all cores automagically:

# echo 12 > machinecheck2/northbridge/dram/threshold_limit
# grep . -Er /sys/devices/system/machinecheck | grep -E "dram.*threshold_limit"
/sys/devices/system/machinecheck/machinecheck0/northbridge/dram/threshold_limit:12
/sys/devices/system/machinecheck/machinecheck1/northbridge/dram/threshold_limit:12
/sys/devices/system/machinecheck/machinecheck2/northbridge/dram/threshold_limit:12
/sys/devices/system/machinecheck/machinecheck3/northbridge/dram/threshold_limit:12
/sys/devices/system/machinecheck/machinecheck4/northbridge/dram/threshold_limit:12
/sys/devices/system/machinecheck/machinecheck5/northbridge/dram/threshold_limit:12
/sys/devices/system/machinecheck/machinecheck6/northbridge/dram/threshold_limit:12
/sys/devices/system/machinecheck/machinecheck7/northbridge/dram/threshold_limit:12

HTH.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-09 09:45:49

by Steffen Persvold

[permalink] [raw]

Subject: Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On 4/9/2013 11:38 AM, Borislav Petkov wrote:
> On Tue, Apr 09, 2013 at 11:25:16AM +0200, Steffen Persvold wrote:
>> Why not let all cores just create their individual kobject and skip
>> this "shared" nb->bank4 concept ? Any disadvantage to that (apart from
>> the obvious storage bloat?).
>
> Well, bank4 is shared across cores on the northbridge in *hardware*.

Well, yes I was aware of that :)

> So it is only logical to represent the hardware layout correctly in
> software.
>
> Also, if you want to configure any settings over one core's sysfs nodes,
> you want those to be visible across all cores automagically:

Hmm, yes of course. This of course breaks on our slave servers when the
shared mechanism doesn't work properly (i.e NB not visible). Then all
cores gets individual kobjects and there can be discrepancies between
what the hardware is programmed to and what is reflected in /sys on some
cores..

Ok, we go with our first approach to not create MC4 at all if NB isn't
visible.

We'll redo the patch against the tip:x86/ras branch.

Cheers,
Steffen

2013-04-09 10:26:24

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On Tue, Apr 09, 2013 at 11:45:44AM +0200, Steffen Persvold wrote:
> Hmm, yes of course. This of course breaks on our slave servers when
> the shared mechanism doesn't work properly (i.e NB not visible). Then
> all cores gets individual kobjects and there can be discrepancies
> between what the hardware is programmed to and what is reflected in
> /sys on some cores..

Hold on, are you saying you have cores with an invisible NB? How does
that even work? Or is it only invisible to sw?

Which would explain why you want the bank4 thing to be per-core.
However, the ugliness with this approach is that you need to note which
cores share the bank and do the proper updates on all sharing cores upon
modifications from sysfs.

Oh well, if you can come up with a clean patchset doing that and without
introducing too much bloat, we can talk about it... :-)

> Ok, we go with our first approach to not create MC4 at all if NB isn't
> visible.

... but this would be the simpler fix. Unless it is still not fixing
everything for you guys.

> We'll redo the patch against the tip:x86/ras branch.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-09 11:34:04

by Steffen Persvold

[permalink] [raw]

Subject: Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

On 4/9/2013 12:24 PM, Borislav Petkov wrote:
> On Tue, Apr 09, 2013 at 11:45:44AM +0200, Steffen Persvold wrote:
>> Hmm, yes of course. This of course breaks on our slave servers when
>> the shared mechanism doesn't work properly (i.e NB not visible). Then
>> all cores gets individual kobjects and there can be discrepancies
>> between what the hardware is programmed to and what is reflected in
>> /sys on some cores..
>
> Hold on, are you saying you have cores with an invisible NB? How does
> that even work? Or is it only invisible to sw?

only invisible to the kernel because the multi-pci-domains isn't working
pre 3.9 on our architecture.

cheers,
Steffen