2005-11-14 22:14:21

by Doug Thompson

[permalink] [raw]
Subject: [RFC] EDAC and the sysfs


I am trying to design the sysfs interface tree for the
new set of EDAC modules that are waiting for this
interface, before being put into the kernel.

Currently the original EDAC (bluesmoke) has its own
/proc directory (/proc/mc) with files and a directory
(0,1,2,...)for each memory controller on the system.
This will be removed and the new information interface
will be placed in the sysfs.

One proposal is to place the information in
/sys/devices/system in the following directories:

For EDAC general memory ECC controls and information
files:

/sys/devices/systems/edac/mc/

For PCI Parity Error detection controls and
information files:

/sys/devices/system/edac/pci


In addition /sys/devices/system/edac/mc/ would have
directories:

mc0/
mc1/
...

for each memory controller's specific controls and
information.


Currently the similiar error detection device
/sys/devices/system/machinecheck resides here are
well.


The alternative layout would be to use the /sys/class
directory when nested-classes become available:

/sys/class/edac/mc/...

and

/sys/class/edac/pci/...

But edac doesn't quite seem to fit here.

I have failed to date to really find a policy or set
of rules of use for the sysfs as to what goes where
for such items as EDAC. After searching the web,
articles and thinking about this for some time now, I
am requesting comments on the sysfs model for where
EDAC would fit best.

I currently favor the /sys/devices/system/edac
placement at the moment, but would welcome input.

thanks

doug thompson





"If you think Education is expensive, just try Ignorance"

"Don't tell people HOW to do things, tell them WHAT you
want and they will surprise you with their ingenuity."
Gen George Patton


2005-11-14 22:44:34

by Greg KH

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs

On Mon, Nov 14, 2005 at 02:14:19PM -0800, Doug Thompson wrote:
>
> I am trying to design the sysfs interface tree for the
> new set of EDAC modules that are waiting for this
> interface, before being put into the kernel.
>
> Currently the original EDAC (bluesmoke) has its own
> /proc directory (/proc/mc) with files and a directory
> (0,1,2,...)for each memory controller on the system.
> This will be removed and the new information interface
> will be placed in the sysfs.
>
> One proposal is to place the information in
> /sys/devices/system in the following directories:

Why not use /sys/firmware/ instead?

Or do you want to use the struct device stuff?

> For EDAC general memory ECC controls and information
> files:
>
> /sys/devices/systems/edac/mc/

What kind of controls and files?

>
> For PCI Parity Error detection controls and
> information files:
>
> /sys/devices/system/edac/pci

That kind of controls and files?


> In addition /sys/devices/system/edac/mc/ would have
> directories:
>
> mc0/
> mc1/
> ...
>
> for each memory controller's specific controls and
> information.

Again, what kind of controls and information?

> Currently the similiar error detection device
> /sys/devices/system/machinecheck resides here are
> well.
>
>
> The alternative layout would be to use the /sys/class
> directory when nested-classes become available:

They are in 2.6.15-rc1, but you _really_ don't want to use them, they
are a huge pain, and I will be getting rid of them, along with all
struct class_device stuff in the near future. See the archives for
details, or it's summarized here:
http://www.kroah.com/log/linux/driver_model_changes.html


>
> /sys/class/edac/mc/...
>
> and
>
> /sys/class/edac/pci/...
>
> But edac doesn't quite seem to fit here.

I agree.

> I have failed to date to really find a policy or set
> of rules of use for the sysfs as to what goes where
> for such items as EDAC. After searching the web,
> articles and thinking about this for some time now, I
> am requesting comments on the sysfs model for where
> EDAC would fit best.

What exactly does EDAC do (and what does it stand for anyway?)

thanks,

greg k-h

2005-11-15 00:30:37

by Dave Jones

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs

On Mon, Nov 14, 2005 at 02:31:05PM -0800, Greg Kroah-Hartman wrote:
> On Mon, Nov 14, 2005 at 02:14:19PM -0800, Doug Thompson wrote:
> >
> > I am trying to design the sysfs interface tree for the
> > new set of EDAC modules that are waiting for this
> > interface, before being put into the kernel.
> >
> > Currently the original EDAC (bluesmoke) has its own
> > /proc directory (/proc/mc) with files and a directory
> > (0,1,2,...)for each memory controller on the system.
> > This will be removed and the new information interface
> > will be placed in the sysfs.
> >
> > One proposal is to place the information in
> > /sys/devices/system in the following directories:
>
> Why not use /sys/firmware/ instead?

Probably the same reason we don't have the cpufreq (for eg)
stuff under /sys/firmware. Because it's poking hardware,
not manipulating firmware.

/sys/devices/system makes a lot more sense, as thats
where the cpu level machine check stuff is (amongst other
similar things).

> > I have failed to date to really find a policy or set
> > of rules of use for the sysfs as to what goes where
> > for such items as EDAC. After searching the web,
> > articles and thinking about this for some time now, I
> > am requesting comments on the sysfs model for where
> > EDAC would fit best.
>
> What exactly does EDAC do (and what does it stand for anyway?)

Reports hardware events read from chipset specific registers.
Similar to /sys/devices/system/machinecheck/, but from
chipset instead of CPU. (That's grossly simplified, but
hopefully gets the idea across).

Dave

2005-11-15 00:47:06

by Doug Thompson

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs

What is EDAC and what does it stand for?

EDAC= Error Detection And Correction.

It currently is in the -mm2 tree, but with the older
files and controls location.

The primary purpose of 'edac' (formerly
blusmoke.sourceforge.net) is to provide a DETECTOR
module of various errors detected by the hardware. The
main detector has been the detecting of ECC memory
errors of memory controllers. PCI Parity scanning was
recently added.

Uncorrected ECC Errors (UE) are detected if the
machinecheck is not configured in the kernel, since
the machinecheck will occur synchronous with the
error, edac polss. edac can log UEs and panic if the
control is set for panic_on_ue. edac logs Corrected
ECC Errors (CE) to the sys concole and it is output.

edac is a two component system. edac_mc is the core
and then one of several memory controller (mc) modules
is used as the mc driver. edac_k8 is the
opteron/athlon64 modules. The mc driver can then
extract the mc specific information and abstract it
and send the information to the edac_mc core.

The current information file in /proc/mc/0 (for CPU
0's mc) is as follows:

Check PCI Parity: 1
Panic PCI Parity: 1
Panic UE: 1
Log UE: 1
Log CE: 1
Poll msec: 1000

MC Core: bluesmoke_mc Ver: 2.0.3 Nov
11 2005
MC Module: bluesmoke_k8 Ver: 2.0.2 Nov
11 2005
Memory Controller: Athlon64/Opteron
PCI Bus ID: 0000:00:18.2 (0000:00:18.2)
EDAC capability: None SECDED S4ECD4ED
Current EDAC capability: None SECDED S4ECD4ED
Supported Mem Types: Unbuffered-DDR Registered-DDR

0:H0_DIMM0|H0_DIMM1:Memory Size: 1024 MiB
0:H0_DIMM0|H0_DIMM1:Mem Type: Registered-DDR
0:H0_DIMM0|H0_DIMM1:Dev Type: x4
0:H0_DIMM0|H0_DIMM1:EDAC Mode: S4ECD4ED
0:H0_DIMM0|H0_DIMM1:UE: 0
0:H0_DIMM0|H0_DIMM1:CE: 0
0.0:H0_DIMM0:CE: 0
0.1:H0_DIMM1:CE: 0

1:H0_DIMM0|H0_DIMM1:Memory Size: 1024 MiB
1:H0_DIMM0|H0_DIMM1:Mem Type: Registered-DDR
1:H0_DIMM0|H0_DIMM1:Dev Type: x4
1:H0_DIMM0|H0_DIMM1:EDAC Mode: S4ECD4ED
1:H0_DIMM0|H0_DIMM1:UE: 0
1:H0_DIMM0|H0_DIMM1:CE: 0
1.0:H0_DIMM0:CE: 0
1.1:H0_DIMM1:CE: 0

Total Memory Size: 2048 MiB
Seconds since reset: 270160
UE No Info: 0
CE No Info: 0
Total UE: 0
Total CE: 0
Total PCI Parity: 0


<end paste>

Yes, this output is way too monolithic and needs to be
refactored. I aim to move this output to the new sysfs
destination, but in a different format via different
files.

The output show UE and CE counts arranged by CSROW and
by Channel 0 or Channel 1. These are also tagged with
the motherboard silk screen labels (Arima's HDAMA mobo
is the example: H0_DIM0, etc). This allows the
adminstrator to harvest this data and map to the node
AND DIMM slot for replacement.

The above also has PCI device parity scanning
information, which will be broken out as mentioned in
my original posting. We have found flaws in PCI riser
cards with this scanning process, which caused
unreported data corruption in high speed
interconnects. BUT not all devices conform to the PCI
spec on parity generation. (as usual)



--- Greg KH <[email protected]> wrote:

> On Mon, Nov 14, 2005 at 02:14:19PM -0800, Doug
> Thompson wrote:
> >
> > I am trying to design the sysfs interface tree for
> the
> > new set of EDAC modules that are waiting for this
> > interface, before being put into the kernel.
> >
> > Currently the original EDAC (bluesmoke) has its
> own
> > /proc directory (/proc/mc) with files and a
> directory
> > (0,1,2,...)for each memory controller on the
> system.
> > This will be removed and the new information
> interface
> > will be placed in the sysfs.
> >
> > One proposal is to place the information in
> > /sys/devices/system in the following directories:
>
> Why not use /sys/firmware/ instead?

I guess my initial explaination was not clear enough.
This doesn't fit, and I assume you see this now from
the above explaination.

>
> Or do you want to use the struct device stuff
>
> > For EDAC general memory ECC controls and
> information
> > files:
> >
> > /sys/devices/systems/edac/mc/
>
> What kind of controls and files?

Abstracted mc files, taken from the above older
monolithic output:

mc_core_version
mc_driver_version
memory_controller
device_bus_id (symlink)
edac_capability
current_edac_capability
supported_mem_types
seconds_since_counter_reset
total_memory_size
total_ue_noinfo_count
total_ce_noinfo_count
total_ue_count
total_ce_count

[the no_info counts are respective errors, but the
edac mc drivers could not determine more information
on it, hence a no_info count]

Controls:

panic_on_ue
log_ue
log_ce
poll_interval_msec



These controls are also set via module load options.


>
> >
> > For PCI Parity Error detection controls and
> > information files:
> >
> > /sys/devices/system/edac/pci
>
> That kind of controls and files?

Controls:

check_pci_parity
panic_on_pci_parity


info files:

total_pci_parity_count


>
>
> > In addition /sys/devices/system/edac/mc/ would
> have
> > directories:
> >
> > mc0/
> > mc1/
> > ...
> >
> > for each memory controller's specific controls and
> > information.
>
> Again, what kind of controls and information?

For each Chip-Select Row (csrow) there would be
information. I am still trying to determine if each
csrow would be in its own directory or all cwrows just
flat in the mc0, mc1, ... directories.

Assuming each csrow is in its own directory (which is
the way I am leaning) below:

csrow0/
csrow1/
csrow2/
csrow3/
...

info files in the above directories:

memory_size
memory_type
device_type
edac_mode
ue_count
ce_count
ce_count_channel_0
ce_count_channel_1
dimm_label
dimm_label_channel_0
dimm_label_channel_1


controls:

none at this time

--------------
>From this data, a reaper/harvester can determine the
CE rate, which is the main real value in EDAC at this
time, and notify the admin of preventative
maintainance work.

>
> > Currently the similiar error detection device
> > /sys/devices/system/machinecheck resides here are
> > well.
> >
> >
> > The alternative layout would be to use the
> /sys/class
> > directory when nested-classes become available:
>
> They are in 2.6.15-rc1, but you _really_ don't want
> to use them, they
> are a huge pain, and I will be getting rid of them,
> along with all
> struct class_device stuff in the near future. See
> the archives for
> details, or it's summarized here:

interesting. That info is what I was looking for.
BTW, thanks for starting the 'HOWTO do kernel
development'

>
>
http://www.kroah.com/log/linux/driver_model_changes.html
>
>
> >
> > /sys/class/edac/mc/...
> >
> > and
> >
> > /sys/class/edac/pci/...
> >
> > But edac doesn't quite seem to fit here.
>
> I agree.
>
> > I have failed to date to really find a policy or
> set
> > of rules of use for the sysfs as to what goes
> where
> > for such items as EDAC. After searching the web,
> > articles and thinking about this for some time
> now, I
> > am requesting comments on the sysfs model for
> where
> > EDAC would fit best.
>
> What exactly does EDAC do (and what does it stand
> for anyway?)

see beginning of this post

>
> thanks,
>
> greg k-h
>


thank you greg, for the specific questions to clarify
my request.


doug t



"If you think Education is expensive, just try Ignorance"

"Don't tell people HOW to do things, tell them WHAT you
want and they will surprise you with their ingenuity."
Gen George Patton

2005-11-15 01:13:01

by Doug Thompson

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs



--- Doug Thompson <[email protected]> wrote:

> What is EDAC and what does it stand for?
>
> EDAC= Error Detection And Correction.

> > For PCI Parity Error detection controls and
> > information files:
> >
> > /sys/devices/system/edac/pci
>
> That kind of controls and files?

I left out new PCI whitelist/blacklist control files I
am also working on:

pci_parity_whitelist
pci_parity_blacklist


Since some unmentioned (and expensive) pci boards fail
to conform with the PCI spec when dealing with PCI
parity status reporting (or they just plain have
bugs), the pci scanning feature needs a whitelist or
a blacklist of "vendor_id:device_id" to specificly
scan or not scan.

If there is a whitelist, no blacklist occurs. When a
blacklist is written to, the whitelist is erased and
devices on blacklist are skipped on the scanning.

format of info to write to these controls, in hex:

vendor_id:device_id[,vendor_id:device_id...]


I have timed both ECC scanning and PCI parity
scanning.

ECC scanning on a dual opteron is 170 TSC clocks.

PCI Parity for 24 devices is 65000 TSC clocks. Ouch!
When all devices are blacklisted, the iterator is 2700
TSC clocks.

doug t




"If you think Education is expensive, just try Ignorance"

"Don't tell people HOW to do things, tell them WHAT you
want and they will surprise you with their ingenuity."
Gen George Patton

2005-11-15 17:39:46

by Greg KH

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs

On Mon, Nov 14, 2005 at 04:47:03PM -0800, Doug Thompson wrote:
> For each Chip-Select Row (csrow) there would be
> information. I am still trying to determine if each
> csrow would be in its own directory or all cwrows just
> flat in the mc0, mc1, ... directories.
>
> Assuming each csrow is in its own directory (which is
> the way I am leaning) below:
>
> csrow0/
> csrow1/
> csrow2/
> csrow3/
> ...
>
> info files in the above directories:
>
> memory_size
> memory_type
> device_type
> edac_mode
> ue_count
> ce_count
> ce_count_channel_0
> ce_count_channel_1
> dimm_label
> dimm_label_channel_0
> dimm_label_channel_1
>

Ok, thanks for the details, it makes more sense now. Your heirachy
seems sane, have you implemented it to see if it works properly?

thanks,

greg k-h

2005-11-15 17:40:18

by Greg KH

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs

On Mon, Nov 14, 2005 at 07:30:26PM -0500, Dave Jones wrote:
> On Mon, Nov 14, 2005 at 02:31:05PM -0800, Greg Kroah-Hartman wrote:
> > On Mon, Nov 14, 2005 at 02:14:19PM -0800, Doug Thompson wrote:
> > >
> > > I am trying to design the sysfs interface tree for the
> > > new set of EDAC modules that are waiting for this
> > > interface, before being put into the kernel.
> > >
> > > Currently the original EDAC (bluesmoke) has its own
> > > /proc directory (/proc/mc) with files and a directory
> > > (0,1,2,...)for each memory controller on the system.
> > > This will be removed and the new information interface
> > > will be placed in the sysfs.
> > >
> > > One proposal is to place the information in
> > > /sys/devices/system in the following directories:
> >
> > Why not use /sys/firmware/ instead?
>
> Probably the same reason we don't have the cpufreq (for eg)
> stuff under /sys/firmware. Because it's poking hardware,
> not manipulating firmware.
>
> /sys/devices/system makes a lot more sense, as thats
> where the cpu level machine check stuff is (amongst other
> similar things).

Ok, that does make sense, thanks for explaining it.

greg k-h

2005-11-16 00:26:40

by Doug Thompson

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs

--- Greg KH <[email protected]> wrote:

> On Mon, Nov 14, 2005 at 04:47:03PM -0800, Doug
> Thompson wrote:
> > For each Chip-Select Row (csrow) there would be
> > information. I am still trying to determine if
> each
> > csrow would be in its own directory or all cwrows
> just
> > flat in the mc0, mc1, ... directories.
> >
> > Assuming each csrow is in its own directory (which
> is
> > the way I am leaning) below:
> >
> > csrow0/
> > csrow1/
> > csrow2/
> > csrow3/
> > ...
> >
> > info files in the above directories:
> >
> > memory_size
> > memory_type
> > device_type
> > edac_mode
> > ue_count
> > ce_count
> > ce_count_channel_0
> > ce_count_channel_1
> > dimm_label
> > dimm_label_channel_0
> > dimm_label_channel_1
> >
>
> Ok, thanks for the details, it makes more sense now.
> Your heirachy
> seems sane, have you implemented it to see if it
> works properly?

I began implementing first in /sys/classes and that is
when I ran into the nested class issue. I then looked
at the /sys/devices/system interface point and then
sought more information and then ASKED for RFC from
the list.

I will now use the /sys/devices/system/edac as my root
for my files and controls.

Speaking of controls, edac has them currently in
/proc/sys/mc. I have proposed to have them in
/sys/devices/system/edac/mc and friends.

My question is: Should I remove entirely my old
/proc/sys/mc sysctl tree? Or still maintain the
aliases there (which seems weird)?

If I do that, then /etc/sysctl.conf will no longer
allow for setting things up there.

Is there going to be a similiar functionality as
/etc/sysctl.conf for those items we place in sysfs, in
the future?

thanks

doug t

PS. These questions on sysfs seem a perfect food
stream for your 'HOWTO do kernel development'. Trying
to do new entries in sysfs has been a painstaking
adventure. After googling the web for info, it
definitely has been a bit thin on information on sysfs
at the level I am seeking.

>
> thanks,
>
> greg k-h
>



"If you think Education is expensive, just try Ignorance"

"Don't tell people HOW to do things, tell them WHAT you
want and they will surprise you with their ingenuity."
Gen George Patton

2005-11-17 07:25:26

by Greg KH

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs

On Tue, Nov 15, 2005 at 04:26:38PM -0800, Doug Thompson wrote:
> Speaking of controls, edac has them currently in
> /proc/sys/mc. I have proposed to have them in
> /sys/devices/system/edac/mc and friends.
>
> My question is: Should I remove entirely my old
> /proc/sys/mc sysctl tree? Or still maintain the
> aliases there (which seems weird)?

That would be wierd, just drop the sysctl stuff.

> If I do that, then /etc/sysctl.conf will no longer
> allow for setting things up there.

True.

> Is there going to be a similiar functionality as
> /etc/sysctl.conf for those items we place in sysfs, in
> the future?

If you want to write one, sure :)

But you can just probably use a udev rule to initialize your things
properly, that's what all of the distros are now using.

> PS. These questions on sysfs seem a perfect food
> stream for your 'HOWTO do kernel development'. Trying
> to do new entries in sysfs has been a painstaking
> adventure. After googling the web for info, it
> definitely has been a bit thin on information on sysfs
> at the level I am seeking.

sysfs and the driver model are woefully underdocumented. Right now, I'd
recommend the Linux Device Drivers, third edition, free online if you
don't want to buy it, for anyone doing any driver core stuff. It has a
whole chapter that is the most up-to-date and the best description I've
seen so far.

But even then, it is out of date, due to api changes, sorry.

thanks,

greg k-h

2005-11-17 17:20:57

by Doug Thompson

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs



--- Greg KH <[email protected]> wrote:

> On Tue, Nov 15, 2005 at 04:26:38PM -0800, Doug
> Thompson wrote:

> > My question is: Should I remove entirely my old
> > /proc/sys/mc sysctl tree? Or still maintain the
> > aliases there (which seems weird)?
>
> That would be wierd, just drop the sysctl stuff.
>
> > If I do that, then /etc/sysctl.conf will no longer
> > allow for setting things up there.
>
> True.
>
> > Is there going to be a similiar functionality as
> > /etc/sysctl.conf for those items we place in
> sysfs, in
> > the future?
>
> If you want to write one, sure :)

Good idea. Seems I have hit the edge of current
features. That's is good for now there is another food
item for a TODO list.

>
> But you can just probably use a udev rule to
> initialize your things
> properly, that's what all of the distros are now
> using.

Ok. That's another area for me to research. edac does
not have any /dev/ entries, just the files and
controls previous mentioned.

So, from your comment then, udev has some mechanism to
set controls in sysfs?


>
> > PS. These questions on sysfs seem a perfect food
> > stream for your 'HOWTO do kernel development'.
> Trying
> > to do new entries in sysfs has been a painstaking
> > adventure. After googling the web for info, it
> > definitely has been a bit thin on information on
> sysfs
> > at the level I am seeking.
>
> sysfs and the driver model are woefully
> underdocumented. Right now, I'd
> recommend the Linux Device Drivers, third edition,
> free online if you
> don't want to buy it, for anyone doing any driver
> core stuff. It has a
> whole chapter that is the most up-to-date and the
> best description I've
> seen so far.

Yes, I have LDD 3rd and it is good. I have also have
Robert Love's book and it has some good stuff on the
device model and sysfs, but I assume since the whole
feature is fairly new its documentation and
understanding are still in the nursery.

I also came across Patrick Mochel's paper given at
Linux Symposium in June 2005. That helped.

>From the src on the machinecheck currently in sysfs, I
see it implements the 'subsystem' feature of sysfs.
>From that code I see the pattern I can use.

>
> But even then, it is out of date, due to api
> changes, sorry.
>
> thanks,
>
> greg k-h
>

at least there is some docs AND some people to ask
questions to.

thanks

doug t



"If you think Education is expensive, just try Ignorance"

"Don't tell people HOW to do things, tell them WHAT you
want and they will surprise you with their ingenuity."
Gen George Patton

2005-11-17 17:55:00

by Greg KH

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs

On Thu, Nov 17, 2005 at 09:20:53AM -0800, Doug Thompson wrote:
> > But you can just probably use a udev rule to
> > initialize your things
> > properly, that's what all of the distros are now
> > using.
>
> Ok. That's another area for me to research. edac does
> not have any /dev/ entries, just the files and
> controls previous mentioned.
>
> So, from your comment then, udev has some mechanism to
> set controls in sysfs?

udev gets called whenever you add a kobject to the system. You can then
do whatever you want in udev when this happens. As an example, on one
distro, when a bluetooth device is created by the kernel, a bluetooth
startup script is run by udev.

thanks,

greg k-h

2005-11-17 18:32:48

by Kay Sievers

[permalink] [raw]
Subject: Re: [RFC] EDAC and the sysfs

On Thu, Nov 17, 2005 at 09:18:56AM -0800, Greg KH wrote:
> On Thu, Nov 17, 2005 at 09:20:53AM -0800, Doug Thompson wrote:
> > > But you can just probably use a udev rule to
> > > initialize your things
> > > properly, that's what all of the distros are now
> > > using.
> >
> > Ok. That's another area for me to research. edac does
> > not have any /dev/ entries, just the files and
> > controls previous mentioned.
> >
> > So, from your comment then, udev has some mechanism to
> > set controls in sysfs?
>
> udev gets called whenever you add a kobject to the system. You can then
> do whatever you want in udev when this happens. As an example, on one
> distro, when a bluetooth device is created by the kernel, a bluetooth
> startup script is run by udev.

We do things like this:
ACTION=="add", SUBSYSTEM="scsi", SYSFS{type}=="1", RUN+="/bin/sh -c 'echo 900 > /sys/$DEVPATH/timeout'"

There are only very few users now, that set values in sysfs. If that is
a common need to change values with udev rules we may integrate that into
udev itself, instead of calling a shell, but that works fine so far.

Kay