2012-08-29 07:43:41

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

All the MCE attributes currently exported via sysfs appear under
/sys/devices/system/machinecheck/machinecheck<n>/. Pretty much all of these
are global in nature and not specific to a processor. We have around 7
attributes duplicated across each processor and on multi-core multi-socket
machines, this amounts to quite a large number. So, move these out under
/sys/devices/system/machinecheck/ where they rightly belong.

Note: I'm not sure if it's ok to change sysfs entries and this does break
userspace tools that depend on the current path for some of these attributes.
So, they will need to be updated to use the new path. However, if we ever get
to a point where cpu0 can be offlined, these tools will need to be updated
anyway (as they mostly hardcode machinecheck0 currently)

Signed-off-by: Naveen N. Rao <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 46 ++++++++++++++++++--------------------
1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index c311122..c8eeaa6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2205,17 +2205,26 @@ static struct dev_ext_attribute dev_attr_cmci_disabled = {
&mce_cmci_disabled
};

-static struct device_attribute *mce_device_attrs[] = {
- &dev_attr_tolerant.attr,
- &dev_attr_check_interval.attr,
- &dev_attr_trigger,
- &dev_attr_monarch_timeout.attr,
- &dev_attr_dont_log_ce.attr,
- &dev_attr_ignore_ce.attr,
- &dev_attr_cmci_disabled.attr,
+static struct attribute *mce_device_attrs[] = {
+ &dev_attr_tolerant.attr.attr,
+ &dev_attr_check_interval.attr.attr,
+ &dev_attr_trigger.attr,
+ &dev_attr_monarch_timeout.attr.attr,
+ &dev_attr_dont_log_ce.attr.attr,
+ &dev_attr_ignore_ce.attr.attr,
+ &dev_attr_cmci_disabled.attr.attr,
NULL
};

+static struct attribute_group mce_device_attr_group = {
+ .attrs = mce_device_attrs,
+};
+
+static const struct attribute_group *mce_device_attr_groups[] = {
+ &mce_device_attr_group,
+ NULL,
+};
+
static cpumask_var_t mce_device_initialized;

static void mce_device_release(struct device *dev)
@@ -2228,7 +2237,7 @@ static __cpuinit int mce_device_create(unsigned int cpu)
{
struct device *dev;
int err;
- int i, j;
+ int i;

if (!mce_available(&boot_cpu_data))
return -EIO;
@@ -2244,26 +2253,18 @@ static __cpuinit int mce_device_create(unsigned int cpu)
if (err)
return err;

- for (i = 0; mce_device_attrs[i]; i++) {
- err = device_create_file(dev, mce_device_attrs[i]);
+ for (i = 0; i < banks; i++) {
+ err = device_create_file(dev, &mce_banks[i].attr);
if (err)
goto error;
}
- for (j = 0; j < banks; j++) {
- err = device_create_file(dev, &mce_banks[j].attr);
- if (err)
- goto error2;
- }
cpumask_set_cpu(cpu, mce_device_initialized);
per_cpu(mce_device, cpu) = dev;

return 0;
-error2:
- while (--j >= 0)
- device_remove_file(dev, &mce_banks[j].attr);
error:
while (--i >= 0)
- device_remove_file(dev, mce_device_attrs[i]);
+ device_remove_file(dev, &mce_banks[i].attr);

device_unregister(dev);

@@ -2278,9 +2279,6 @@ static __cpuinit void mce_device_remove(unsigned int cpu)
if (!cpumask_test_cpu(cpu, mce_device_initialized))
return;

- for (i = 0; mce_device_attrs[i]; i++)
- device_remove_file(dev, mce_device_attrs[i]);
-
for (i = 0; i < banks; i++)
device_remove_file(dev, &mce_banks[i].attr);

@@ -2397,7 +2395,7 @@ static __init int mcheck_init_device(void)

mce_init_banks();

- err = subsys_system_register(&mce_subsys, NULL);
+ err = subsys_system_register(&mce_subsys, mce_device_attr_groups);
if (err)
return err;


2012-08-29 10:13:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

On Wed, Aug 29, 2012 at 01:11:55PM +0530, Naveen N. Rao wrote:
> All the MCE attributes currently exported via sysfs appear under
> /sys/devices/system/machinecheck/machinecheck<n>/. Pretty much all of these
> are global in nature and not specific to a processor. We have around 7
> attributes duplicated across each processor and on multi-core multi-socket
> machines, this amounts to quite a large number. So, move these out under
> /sys/devices/system/machinecheck/ where they rightly belong.
>
> Note: I'm not sure if it's ok to change sysfs entries and this does break
> userspace tools that depend on the current path for some of these attributes.
> So, they will need to be updated to use the new path. However, if we ever get
> to a point where cpu0 can be offlined, these tools will need to be updated
> anyway (as they mostly hardcode machinecheck0 currently)

How do you know that for all tools out there?

I know, I know, moving them to /sys/.../machinecheck/ is the right thing
to do but they're exposed to userspace and we're breaking it with this
patch. And we don't break userspace so I'd guess we're stuck with the
current situation.

Sorry.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2012-08-29 10:28:11

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

On 08/29/2012 03:43 PM, Borislav Petkov wrote:
> On Wed, Aug 29, 2012 at 01:11:55PM +0530, Naveen N. Rao wrote:
>> All the MCE attributes currently exported via sysfs appear under
>> /sys/devices/system/machinecheck/machinecheck<n>/. Pretty much all of these
>> are global in nature and not specific to a processor. We have around 7
>> attributes duplicated across each processor and on multi-core multi-socket
>> machines, this amounts to quite a large number. So, move these out under
>> /sys/devices/system/machinecheck/ where they rightly belong.
>>
>> Note: I'm not sure if it's ok to change sysfs entries and this does break
>> userspace tools that depend on the current path for some of these attributes.
>> So, they will need to be updated to use the new path. However, if we ever get
>> to a point where cpu0 can be offlined, these tools will need to be updated
>> anyway (as they mostly hardcode machinecheck0 currently)
>
> How do you know that for all tools out there?

I don't - I've seen mcelog and related tools and they look in machinecheck0.

>
> I know, I know, moving them to /sys/.../machinecheck/ is the right thing
> to do but they're exposed to userspace and we're breaking it with this
> patch. And we don't break userspace so I'd guess we're stuck with the
> current situation.

Hmmm.. Can't we just deprecate these? ;)
Perhaps we can consider adding newer tunables in the right place.


Thanks,
Naveen

>
> Sorry.
>

2012-08-29 10:40:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

On Wed, Aug 29, 2012 at 03:56:04PM +0530, Naveen N. Rao wrote:
> Hmmm.. Can't we just deprecate these? ;) Perhaps we can consider
> adding newer tunables in the right place.

In case you haven't noticed yet: I'm all on your side.

But let me ask you this: these attributes grow to a large number with
a large number of cores but why is this a problem? We have a bunch of
redundant attributes in sysfs, so what?

See what I mean?

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2012-08-29 13:12:43

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

On 08/29/2012 04:10 PM, Borislav Petkov wrote:
> On Wed, Aug 29, 2012 at 03:56:04PM +0530, Naveen N. Rao wrote:
>> Hmmm.. Can't we just deprecate these? ;) Perhaps we can consider
>> adding newer tunables in the right place.
>
> In case you haven't noticed yet: I'm all on your side.

Yup, I know :)

I had my doubts when I sent this patch (hence the RFC tag) and I was
only wondering above if it'll be a good idea to limit such tunables
going forward. We could force all _new_ MCE tunables to be global,
except where they actually apply on a per-processor basis.

>
> But let me ask you this: these attributes grow to a large number with
> a large number of cores but why is this a problem? We have a bunch of
> redundant attributes in sysfs, so what?
>
> See what I mean?
>

Well, it's ugly and does not make much sense, as I'm sure you noticed.
On a 10-core, 8-socket machine with HT, we'll end up with nearly a
thousand such entries!

I don't know how much resource this takes up (if any) and like you said,
this may just be a "so what?", but I wanted to bring this up and see if
we could/want to do anything about this. I'm certainly fine if we want
to ignore this.


Thanks,
Naveen

2012-08-29 14:43:17

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

> Note: I'm not sure if it's ok to change sysfs entries and this does break
> userspace tools that depend on the current path for some of these attributes.
> So, they will need to be updated to use the new path. However, if we ever get
> to a point where cpu0 can be offlined, these tools will need to be updated
> anyway (as they mostly hardcode machinecheck0 currently)

Linus' clarified his "never break user space" edict at the kernel summit
on Monday. Paraphrasing:

If nobody notices, or nobody complains, then we can make changes. But
if anyone does complain, then the patch gets reverted.

So if you want to do this, the right approach would be to change the
utilities that use this to look in the new location for these sysfs files
first, and fall back to looking in the old per-cpu place.

Next (or in parallel) have the kernel provide both interfaces.

Wait a long[1] time so that most people have updated utilities.

Delete the per-cpu interfaces from the kernel.

Delete the per-cpu references from the utilities.

-Tony

[1] Long enough that there are no complaints. At least a year, probably two or more.

2012-08-30 09:49:37

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

On 08/29/2012 08:13 PM, Luck, Tony wrote:
>> Note: I'm not sure if it's ok to change sysfs entries and this does break
>> userspace tools that depend on the current path for some of these attributes.
>> So, they will need to be updated to use the new path. However, if we ever get
>> to a point where cpu0 can be offlined, these tools will need to be updated
>> anyway (as they mostly hardcode machinecheck0 currently)
>
> Linus' clarified his "never break user space" edict at the kernel summit
> on Monday. Paraphrasing:
>
> If nobody notices, or nobody complains, then we can make changes. But
> if anyone does complain, then the patch gets reverted.
>
> So if you want to do this, the right approach would be to change the
> utilities that use this to look in the new location for these sysfs files
> first, and fall back to looking in the old per-cpu place.
>
> Next (or in parallel) have the kernel provide both interfaces.
>
> Wait a long[1] time so that most people have updated utilities.
>
> Delete the per-cpu interfaces from the kernel.
>
> Delete the per-cpu references from the utilities.
>
> -Tony
>
> [1] Long enough that there are no complaints. At least a year, probably two or more.
>

Makes sense. Thanks for the explanation. I will send a new patch for this.


Thanks,
Naveen