2006-03-05 10:18:08

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code


> +/* Main MC kobject release() function */
> +static void edac_memctrl_master_release(struct kobject *kobj)
> +{
> + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> +}
> +



ehhh how on earth can this be right?


2006-03-05 10:30:28

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > +/* Main MC kobject release() function */
> > +static void edac_memctrl_master_release(struct kobject *kobj)
> > +{
> > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > +}
> > +
>
>
>
> ehhh how on earth can this be right?


oh and this stuff also violates the "one value per file" rule; can we
fix that urgently before it becomes part of the ABI in 2.6.16??


2006-03-05 15:55:06

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Sun, Mar 05, 2006 at 11:18:04AM +0100, Arjan van de Ven wrote:
>
> > +/* Main MC kobject release() function */
> > +static void edac_memctrl_master_release(struct kobject *kobj)
> > +{
> > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > +}
> > +
>
> ehhh how on earth can this be right?

Ugh. Good catch, it isn't right. Gotta love it when people try to
ignore the helpful messages the kernel gives you when you use an API
wrong :(

thanks,

greg k-h

2006-03-05 16:24:11

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Sun, 2006-03-05 at 07:55 -0800, Greg KH wrote:
> On Sun, Mar 05, 2006 at 11:18:04AM +0100, Arjan van de Ven wrote:
> >
> > > +/* Main MC kobject release() function */
> > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > +{
> > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > +}
> > > +
> >
> > ehhh how on earth can this be right?
>
> Ugh. Good catch, it isn't right. Gotta love it when people try to
> ignore the helpful messages the kernel gives you when you use an API
> wrong :(


s/ignore/circumvent/



2006-03-06 18:15:18

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > +/* Main MC kobject release() function */
> > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > +{
> > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > +}
> > > +
> >
> > ehhh how on earth can this be right?
>
> oh and this stuff also violates the "one value per file" rule; can we
> fix that urgently before it becomes part of the ABI in 2.6.16??

Ok, I'll admit to being a bit clueless about this. I'm not familiar
with the "one value per file" rule; can someone please explain?

On Sunday 05 March 2006 07:55, Greg KH wrote:
> Ugh. Good catch, it isn't right. Gotta love it when people try to
> ignore the helpful messages the kernel gives you when you use an API
> wrong :(

Hmm... I don't recall seeing any messages from the kernel. What
are you seeing?

2006-03-06 19:52:50

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Mon, Mar 06, 2006 at 10:14:22AM -0800, Dave Peterson wrote:
> On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> > On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > > +/* Main MC kobject release() function */
> > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > +{
> > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > +}
> > > > +
> > >
> > > ehhh how on earth can this be right?
> >
> > oh and this stuff also violates the "one value per file" rule; can we
> > fix that urgently before it becomes part of the ABI in 2.6.16??
>
> Ok, I'll admit to being a bit clueless about this. I'm not familiar
> with the "one value per file" rule; can someone please explain?

sysfs consists of files that only have one value per file. Please do
not do otherwise, as it will become a maintance nightmare over time.
See the documentation file that Randy pointed you at for details.

> On Sunday 05 March 2006 07:55, Greg KH wrote:
> > Ugh. Good catch, it isn't right. Gotta love it when people try to
> > ignore the helpful messages the kernel gives you when you use an API
> > wrong :(
>
> Hmm... I don't recall seeing any messages from the kernel. What
> are you seeing?

If you do not have a release function, you will see the messages. Just
putting a printk() in a release function is not acceptable, you really
need to free the memory from it. If not, then your code is really
wrong...

thanks,

greg k-h

2006-03-06 18:21:10

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Mon, 6 Mar 2006 10:14:22 -0800 Dave Peterson wrote:

> On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> > On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > > +/* Main MC kobject release() function */
> > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > +{
> > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > +}
> > > > +
> > >
> > > ehhh how on earth can this be right?
> >
> > oh and this stuff also violates the "one value per file" rule; can we
> > fix that urgently before it becomes part of the ABI in 2.6.16??
>
> Ok, I'll admit to being a bit clueless about this. I'm not familiar
> with the "one value per file" rule; can someone please explain?

it's in Documentation/filesystems/sysfs.txt
Strongly preferred.

> On Sunday 05 March 2006 07:55, Greg KH wrote:
> > Ugh. Good catch, it isn't right. Gotta love it when people try to
> > ignore the helpful messages the kernel gives you when you use an API
> > wrong :(
>
> Hmm... I don't recall seeing any messages from the kernel. What
> are you seeing?



---
~Randy

2006-03-06 18:53:25

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Sunday 05 March 2006 07:55, Greg KH wrote:
> On Sun, Mar 05, 2006 at 11:18:04AM +0100, Arjan van de Ven wrote:
> > > +/* Main MC kobject release() function */
> > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > +{
> > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > +}
> > > +
> >
> > ehhh how on earth can this be right?
>
> Ugh. Good catch, it isn't right. Gotta love it when people try to
> ignore the helpful messages the kernel gives you when you use an API
> wrong :(

Is the concern here that EDAC is not waiting for the reference count
on the kobject to reach 0, therefore creating the possibility of the
module unloading while the kobject (declared statically within the
module) is still in use?

2006-03-06 19:53:44

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Mon, Mar 06, 2006 at 10:52:57AM -0800, Dave Peterson wrote:
> On Sunday 05 March 2006 07:55, Greg KH wrote:
> > On Sun, Mar 05, 2006 at 11:18:04AM +0100, Arjan van de Ven wrote:
> > > > +/* Main MC kobject release() function */
> > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > +{
> > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > +}
> > > > +
> > >
> > > ehhh how on earth can this be right?
> >
> > Ugh. Good catch, it isn't right. Gotta love it when people try to
> > ignore the helpful messages the kernel gives you when you use an API
> > wrong :(
>
> Is the concern here that EDAC is not waiting for the reference count
> on the kobject to reach 0, therefore creating the possibility of the
> module unloading while the kobject (declared statically within the
> module) is still in use?

Eeek, don't statically create a kobject :(

Anyway, yes, that is a problem, if it is static, then you need to know
it is safe to unload. Even if it is dynamic that is also true...

thanks,

greg k-h

2006-03-06 21:02:07

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Monday 06 March 2006 11:53, Greg KH wrote:
> > Is the concern here that EDAC is not waiting for the reference count
> > on the kobject to reach 0, therefore creating the possibility of the
> > module unloading while the kobject (declared statically within the
> > module) is still in use?
>
> Eeek, don't statically create a kobject :(
>
> Anyway, yes, that is a problem, if it is static, then you need to know
> it is safe to unload. Even if it is dynamic that is also true...

Ok, now I understand. At first I thought it was something specific
to the way the debugf1() call was implemented that people were
commenting on.

Regarding the above problem with the kobject reference count, this
was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
complete() in edac_memctrl_master_release() and then have the module
cleanup code wait for the completion. I think there were a few other
instances of this type of problem that I also fixed in the
above-mentioned patch.

Is it more desirable to dynamically allocate kobjects than to declare
them statically? If so, I'd be curious to know why dynamic
allocation is preferred over static allocation. If desired, I can
make a patch that fixes EDAC so that its kobjects are dynamically
allocated.

Dave

2006-03-06 21:07:46

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code


> Is it more desirable to dynamically allocate kobjects than to declare
> them statically?

Yes

> If so, I'd be curious to know why dynamic
> allocation is preferred over static allocation.

because the lifetime of the kobject is independent of the lifetime of
the memory of your static allocation.
Separate lifetimes -> separate memory is a very good design principle.

2006-03-06 21:32:08

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> Regarding the above problem with the kobject reference count, this
> was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> complete() in edac_memctrl_master_release() and then have the module
> cleanup code wait for the completion. I think there were a few other
> instances of this type of problem that I also fixed in the
> above-mentioned patch.

This is not a fix, this is a goddamn deadlock.
rmmod your_turd </sys/spew/from/your_turd
and there you go. rmmod can _NOT_ wait for sysfs references to go away.

2006-03-06 21:54:00

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Mon, Mar 06, 2006 at 09:32:03PM +0000, Al Viro wrote:
> On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > Regarding the above problem with the kobject reference count, this
> > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > complete() in edac_memctrl_master_release() and then have the module
> > cleanup code wait for the completion. I think there were a few other
> > instances of this type of problem that I also fixed in the
> > above-mentioned patch.
>
> This is not a fix, this is a goddamn deadlock.
> rmmod your_turd </sys/spew/from/your_turd
> and there you go. rmmod can _NOT_ wait for sysfs references to go away.

To be fair, the only part of the kernel that supports the above process,
is the network stack. And they implemented a special kind of lock to
handle just this kind of thing.

That is not something that I want the rest of the kernel to have to use.
If your code blocks when doing the above thing, that's fine with me.

Note, you better have the module owner reference right for the above to
not oops the kernel, deadlock is fine. There is no rule that we _have_
to allow rmmod to always succeed.

thanks,

greg k-h

2006-03-06 22:24:04

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > rmmod your_turd </sys/spew/from/your_turd
> > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
>
> To be fair, the only part of the kernel that supports the above process,
> is the network stack. And they implemented a special kind of lock to
> handle just this kind of thing.
>
> That is not something that I want the rest of the kernel to have to use.
> If your code blocks when doing the above thing, that's fine with me.

One word: fail. With -EBUSY.

> Note, you better have the module owner reference right for the above to
> not oops the kernel, deadlock is fine.

Never is.

> There is no rule that we _have_
> to allow rmmod to always succeed.

Quite so, which means we can have it fail saying that module removal has
failed. Deadlock is not the same thing.

2006-03-06 22:56:10

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Mon, Mar 06, 2006 at 10:24:00PM +0000, Al Viro wrote:
> On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > > rmmod your_turd </sys/spew/from/your_turd
> > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> >
> > To be fair, the only part of the kernel that supports the above process,
> > is the network stack. And they implemented a special kind of lock to
> > handle just this kind of thing.
> >
> > That is not something that I want the rest of the kernel to have to use.
> > If your code blocks when doing the above thing, that's fine with me.
>
> One word: fail. With -EBUSY.
>
> > Note, you better have the module owner reference right for the above to
> > not oops the kernel, deadlock is fine.
>
> Never is.

My apologies, you are right, for some reason I thought rmmod would just
wait for the reference count to go away. I just tested this on a lot of
different things in sysfs and it works properly and rmmod will return an
error saying the module is in use at this time.

> > There is no rule that we _have_
> > to allow rmmod to always succeed.
>
> Quite so, which means we can have it fail saying that module removal has
> failed. Deadlock is not the same thing.

Agreed.

thanks,

greg k-h

2006-03-07 01:46:05

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Monday 06 March 2006 16:53, Greg KH wrote:
> On Mon, Mar 06, 2006 at 09:32:03PM +0000, Al Viro wrote:
> > On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > > Regarding the above problem with the kobject reference count, this
> > > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > > complete() in edac_memctrl_master_release() and then have the module
> > > cleanup code wait for the completion. I think there were a few other
> > > instances of this type of problem that I also fixed in the
> > > above-mentioned patch.
> >
> > This is not a fix, this is a goddamn deadlock.
> > rmmod your_turd </sys/spew/from/your_turd
> > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
>
> To be fair, the only part of the kernel that supports the above process,
> is the network stack. And they implemented a special kind of lock to
> handle just this kind of thing.
>

Not so:

[root@core ~]# rmmod psmouse < /sys/bus/serio/devices/serio0/rate
ERROR: Module psmouse is in use
[root@core ~]# rmmod psmouse
[root@core ~]# modprobe psmouse
[root@core ~]#

It would be nice if more subsystem could handle this, preferably without
"Waiting for blah to become free" messages (as in W1).

--
Dmitry

2006-03-07 01:57:57

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Mon, Mar 06, 2006 at 08:45:59PM -0500, Dmitry Torokhov wrote:
> On Monday 06 March 2006 16:53, Greg KH wrote:
> > On Mon, Mar 06, 2006 at 09:32:03PM +0000, Al Viro wrote:
> > > On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > > > Regarding the above problem with the kobject reference count, this
> > > > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > > > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > > > complete() in edac_memctrl_master_release() and then have the module
> > > > cleanup code wait for the completion. I think there were a few other
> > > > instances of this type of problem that I also fixed in the
> > > > above-mentioned patch.
> > >
> > > This is not a fix, this is a goddamn deadlock.
> > > rmmod your_turd </sys/spew/from/your_turd
> > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> >
> > To be fair, the only part of the kernel that supports the above process,
> > is the network stack. And they implemented a special kind of lock to
> > handle just this kind of thing.
> >
>
> Not so:
>
> [root@core ~]# rmmod psmouse < /sys/bus/serio/devices/serio0/rate
> ERROR: Module psmouse is in use
> [root@core ~]# rmmod psmouse
> [root@core ~]# modprobe psmouse
> [root@core ~]#
>
> It would be nice if more subsystem could handle this, preferably without
> "Waiting for blah to become free" messages (as in W1).

See my previous apology about this :)

Anyway, the network stack does have a special lock for unloading modules
while they are still "in use", but as long as Al was referring to your
above sequence, I have no disagreement.

thanks,

greg k-h

2006-03-07 02:10:34

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Monday 06 March 2006 20:57, Greg KH wrote:
> On Mon, Mar 06, 2006 at 08:45:59PM -0500, Dmitry Torokhov wrote:
> > On Monday 06 March 2006 16:53, Greg KH wrote:
> > > On Mon, Mar 06, 2006 at 09:32:03PM +0000, Al Viro wrote:
> > > > On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > > > > Regarding the above problem with the kobject reference count, this
> > > > > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > > > > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > > > > complete() in edac_memctrl_master_release() and then have the module
> > > > > cleanup code wait for the completion. I think there were a few other
> > > > > instances of this type of problem that I also fixed in the
> > > > > above-mentioned patch.
> > > >
> > > > This is not a fix, this is a goddamn deadlock.
> > > > rmmod your_turd </sys/spew/from/your_turd
> > > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> > >
> > > To be fair, the only part of the kernel that supports the above process,
> > > is the network stack. And they implemented a special kind of lock to
> > > handle just this kind of thing.
> > >
> >
> > Not so:
> >
> > [root@core ~]# rmmod psmouse < /sys/bus/serio/devices/serio0/rate
> > ERROR: Module psmouse is in use
> > [root@core ~]# rmmod psmouse
> > [root@core ~]# modprobe psmouse
> > [root@core ~]#
> >
> > It would be nice if more subsystem could handle this, preferably without
> > "Waiting for blah to become free" messages (as in W1).
>
> See my previous apology about this :)
>
> Anyway, the network stack does have a special lock for unloading modules
> while they are still "in use", but as long as Al was referring to your
> above sequence, I have no disagreement.
>

I am sorry, I butt in as I read LKML ;) I noticed Al's and your replies
only after I wrote mine...

--
Dmitry

2006-03-07 10:43:23

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

Al Viro <[email protected]> wrote:
>
> On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > > rmmod your_turd </sys/spew/from/your_turd
> > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> >
> > To be fair, the only part of the kernel that supports the above process,
> > is the network stack. And they implemented a special kind of lock to
> > handle just this kind of thing.
> >
> > That is not something that I want the rest of the kernel to have to use.
> > If your code blocks when doing the above thing, that's fine with me.
>
> One word: fail. With -EBUSY.

It seems quite simple to make wait_for_zero_refcount() interruptible?
Something like...



--- devel/kernel/module.c~modules-make-wait_for_zero_refcount-interruptible 2006-03-07 02:36:46.000000000 -0800
+++ devel-akpm/kernel/module.c 2006-03-07 02:39:18.000000000 -0800
@@ -578,8 +578,8 @@ static void wait_for_zero_refcount(struc
mutex_unlock(&module_mutex);
for (;;) {
DEBUGP("Looking at refcount...\n");
- set_current_state(TASK_UNINTERRUPTIBLE);
- if (module_refcount(mod) == 0)
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (module_refcount(mod) == 0 || signal_pending(current))
break;
schedule();
}
@@ -645,8 +645,18 @@ sys_delete_module(const char __user *nam
goto out;

/* Never wait if forced. */
- if (!forced && module_refcount(mod) != 0)
+ if (!forced && module_refcount(mod) != 0) {
wait_for_zero_refcount(mod);
+ if (module_refcount(mod)) {
+ /*
+ * Signalled: back out
+ */
+ mod->state = MODULE_STATE_LIVE;
+ mod->waiter = NULL;
+ ret = -EINTR;
+ goto out;
+ }
+ }

/* Final destruction now noone is using it. */
if (mod->exit != NULL) {
_

2006-03-07 11:09:00

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tue, Mar 07, 2006 at 02:41:13AM -0800, Andrew Morton wrote:
> Al Viro <[email protected]> wrote:
> >
> > On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > > > rmmod your_turd </sys/spew/from/your_turd
> > > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> > >
> > > To be fair, the only part of the kernel that supports the above process,
> > > is the network stack. And they implemented a special kind of lock to
> > > handle just this kind of thing.
> > >
> > > That is not something that I want the rest of the kernel to have to use.
> > > If your code blocks when doing the above thing, that's fine with me.
> >
> > One word: fail. With -EBUSY.
>
> It seems quite simple to make wait_for_zero_refcount() interruptible?
> Something like...

That's something we need to do anyway, but here it's not a matter of removal
blocking on module refcount:

| Regarding the above problem with the kobject reference count, this
| was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
| in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
| complete() in edac_memctrl_master_release() and then have the module
| cleanup code wait for the completion. I think there were a few other
| instances of this type of problem that I also fixed in the
| above-mentioned patch.

and that is clearly broken. Moreover, unlike having rmmod -w interruptible
(which is obviously a very good idea), here we would be in the middle of
cleanup sequence and it would be too late to back off if interrupted.

2006-03-07 16:48:01

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Monday 06 March 2006 13:32, Al Viro wrote:
> On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > Regarding the above problem with the kobject reference count, this
> > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > complete() in edac_memctrl_master_release() and then have the module
> > cleanup code wait for the completion. I think there were a few other
> > instances of this type of problem that I also fixed in the
> > above-mentioned patch.
>
> This is not a fix, this is a goddamn deadlock.
> rmmod your_turd </sys/spew/from/your_turd
> and there you go. rmmod can _NOT_ wait for sysfs references to go away.

Ok, how does this sound:

- Modify EDAC so it uses kmalloc() to create the kobject.
- Eliminate edac_memctrl_master_release(). Instead, use kfree() as
the release method for the kobject. Here, it's important to use a
function -outside- of EDAC as the release method since the core
EDAC module may have been unloaded by the time the release method
is called.
- Make similar modifications to the other places in EDAC where
kobjects are used.

At least this will keep the module unload operation from blocking
in the module cleanup function due to a nonzero kobject reference
count. I'm going to be away from my keyboard for most of the rest of
today. However, if there is general agreement that this is a
reasonable way to proceed, I'll make a patch that implements this
tomorrow.

Dave

2006-03-07 17:04:12

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Monday 06 March 2006 10:22, Randy.Dunlap wrote:
> On Mon, 6 Mar 2006 10:14:22 -0800 Dave Peterson wrote:
> > On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> > > On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > > > +/* Main MC kobject release() function */
> > > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > > +{
> > > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > > +}
> > > > > +
> > > >
> > > > ehhh how on earth can this be right?
> > >
> > > oh and this stuff also violates the "one value per file" rule; can we
> > > fix that urgently before it becomes part of the ABI in 2.6.16??
> >
> > Ok, I'll admit to being a bit clueless about this. I'm not familiar
> > with the "one value per file" rule; can someone please explain?
>
> it's in Documentation/filesystems/sysfs.txt
> Strongly preferred.

Ok, I assume the comment refers to the following:

Attributes should be ASCII text files, preferably with only one value
per file. It is noted that it may not be efficient to contain only
value per file, so it is socially acceptable to express an array of
values of the same type.

I was initially a bit confused because I thought the comment
specifically pertained to the piece of code shown above. I need to
take a closer look at the EDAC sysfs code - I'm not as familiar with
some of its details as I should be. Thanks for pointing out the
issue.

Dave

2006-03-07 17:04:19

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tue, Mar 07, 2006 at 08:47:44AM -0800, Dave Peterson wrote:
> On Monday 06 March 2006 13:32, Al Viro wrote:
> > On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > > Regarding the above problem with the kobject reference count, this
> > > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > > complete() in edac_memctrl_master_release() and then have the module
> > > cleanup code wait for the completion. I think there were a few other
> > > instances of this type of problem that I also fixed in the
> > > above-mentioned patch.
> >
> > This is not a fix, this is a goddamn deadlock.
> > rmmod your_turd </sys/spew/from/your_turd
> > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
>
> Ok, how does this sound:
>
> - Modify EDAC so it uses kmalloc() to create the kobject.
> - Eliminate edac_memctrl_master_release(). Instead, use kfree() as
> the release method for the kobject. Here, it's important to use a
> function -outside- of EDAC as the release method since the core
> EDAC module may have been unloaded by the time the release method
> is called.

No, if this happens then you are using the kobject incorrectly. How
could it be held if your module is unloaded? Don't you have the module
reference counting logic correct?

> - Make similar modifications to the other places in EDAC where
> kobjects are used.

Yes.

> At least this will keep the module unload operation from blocking
> in the module cleanup function due to a nonzero kobject reference
> count. I'm going to be away from my keyboard for most of the rest of
> today. However, if there is general agreement that this is a
> reasonable way to proceed, I'll make a patch that implements this
> tomorrow.

It's a good start :)

thanks,

greg k-h

2006-03-07 17:06:50

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tuesday 07 March 2006 09:04, Greg KH wrote:
> > - Modify EDAC so it uses kmalloc() to create the kobject.
> > - Eliminate edac_memctrl_master_release(). Instead, use kfree() as
> > the release method for the kobject. Here, it's important to use a
> > function -outside- of EDAC as the release method since the core
> > EDAC module may have been unloaded by the time the release method
> > is called.
>
> No, if this happens then you are using the kobject incorrectly. How
> could it be held if your module is unloaded? Don't you have the module
> reference counting logic correct?

Oops... my mistake.

2006-03-07 17:21:08

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tue, Mar 07, 2006 at 09:03:19AM -0800, Dave Peterson wrote:
> On Monday 06 March 2006 10:22, Randy.Dunlap wrote:
> > On Mon, 6 Mar 2006 10:14:22 -0800 Dave Peterson wrote:
> > > On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> > > > On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > > > > +/* Main MC kobject release() function */
> > > > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > > > +{
> > > > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > > > +}
> > > > > > +
> > > > >
> > > > > ehhh how on earth can this be right?
> > > >
> > > > oh and this stuff also violates the "one value per file" rule; can we
> > > > fix that urgently before it becomes part of the ABI in 2.6.16??
> > >
> > > Ok, I'll admit to being a bit clueless about this. I'm not familiar
> > > with the "one value per file" rule; can someone please explain?
> >
> > it's in Documentation/filesystems/sysfs.txt
> > Strongly preferred.
>
> Ok, I assume the comment refers to the following:
>
> Attributes should be ASCII text files, preferably with only one value
> per file. It is noted that it may not be efficient to contain only
> value per file, so it is socially acceptable to express an array of
> values of the same type.
>
> I was initially a bit confused because I thought the comment
> specifically pertained to the piece of code shown above. I need to
> take a closer look at the EDAC sysfs code - I'm not as familiar with
> some of its details as I should be. Thanks for pointing out the
> issue.

How else should we word the above text so that people realize that it
pertains to them? You aren't the first person to ignore it, so there is
a real problem here :(

thanks,

greg k-h

2006-03-07 19:03:44

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code


> I was initially a bit confused because I thought the comment
> specifically pertained to the piece of code shown above. I need to
> take a closer look at the EDAC sysfs code - I'm not as familiar with
> some of its details as I should be. Thanks for pointing out the
> issue.

afaics it is a list of pci devices. these should just be symlinks to the
sysfs resource of these pci devices instead, not a flat table file.


2006-03-07 19:06:04

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tue, Mar 07, 2006 at 08:03:38PM +0100, Arjan van de Ven wrote:
>
> > I was initially a bit confused because I thought the comment
> > specifically pertained to the piece of code shown above. I need to
> > take a closer look at the EDAC sysfs code - I'm not as familiar with
> > some of its details as I should be. Thanks for pointing out the
> > issue.
>
> afaics it is a list of pci devices. these should just be symlinks to the
> sysfs resource of these pci devices instead, not a flat table file.

Ugh, all this talk is making me wonder what in the world this code is
doing. Time to go look at it...

And people complain that all interfaces in userspace should be instantly
marked "stable" because we all know exactly what we are doing :)

greg k-h

2006-03-08 00:30:05

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tuesday 07 March 2006 09:20, Greg KH wrote:
> > Ok, I assume the comment refers to the following:
> >
> > Attributes should be ASCII text files, preferably with only one value
> > per file. It is noted that it may not be efficient to contain only
> > value per file, so it is socially acceptable to express an array of
> > values of the same type.
> >
> > I was initially a bit confused because I thought the comment
> > specifically pertained to the piece of code shown above. I need to
> > take a closer look at the EDAC sysfs code - I'm not as familiar with
> > some of its details as I should be. Thanks for pointing out the
> > issue.
>
> How else should we word the above text so that people realize that it
> pertains to them? You aren't the first person to ignore it, so there is
> a real problem here :(

I think it's written clearly as it is. I didn't write the
sysfs-specific parts of the EDAC code. However I'll take the blame
for not having reviewed it more carefully before it went upstream.
I guess my attention has been divided among too many things lately.

2006-03-08 01:03:45

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tuesday 07 March 2006 12:04, Greg KH wrote:
> On Tue, Mar 07, 2006 at 08:47:44AM -0800, Dave Peterson wrote:
> > Ok, how does this sound:
> >
> > - Modify EDAC so it uses kmalloc() to create the kobject.
> > - Eliminate edac_memctrl_master_release(). Instead, use kfree() as
> > the release method for the kobject. Here, it's important to use a
> > function -outside- of EDAC as the release method since the core
> > EDAC module may have been unloaded by the time the release method
> > is called.
>
> No, if this happens then you are using the kobject incorrectly. How
> could it be held if your module is unloaded? Don't you have the module
> reference counting logic correct?
>

It is pretty hard to implement kobject handling correctly. Consider the
following:

rmmod device_driver < /sys/devices/pci0000:00/...../power/state

for a driver that creates/destroys device objects.

Opening 'state' attribute will pin device structure into memory but will
not increase _your_ module's refcount. It is nice if you have a subsystem
core split from drivers code - then you can keep core module reference
until device objects are gone and allow individual drivers be unloaded
freely. But for single-module system it is pretty hard, that's why
platform devices are popular.

--
Dmitry

2006-03-08 01:33:20

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tue, Mar 07, 2006 at 08:03:41PM -0500, Dmitry Torokhov wrote:
> On Tuesday 07 March 2006 12:04, Greg KH wrote:
> > On Tue, Mar 07, 2006 at 08:47:44AM -0800, Dave Peterson wrote:
> > > Ok, how does this sound:
> > >
> > > - Modify EDAC so it uses kmalloc() to create the kobject.
> > > - Eliminate edac_memctrl_master_release(). Instead, use kfree() as
> > > the release method for the kobject. Here, it's important to use a
> > > function -outside- of EDAC as the release method since the core
> > > EDAC module may have been unloaded by the time the release method
> > > is called.
> >
> > No, if this happens then you are using the kobject incorrectly. How
> > could it be held if your module is unloaded? Don't you have the module
> > reference counting logic correct?
> >
>
> It is pretty hard to implement kobject handling correctly. Consider the
> following:
>
> rmmod device_driver < /sys/devices/pci0000:00/...../power/state
>
> for a driver that creates/destroys device objects.

I agree, that's one reason I really hate the "default" attributes :(

To do this "right" we need to make the attributes dynamically created
and the owner set to the proper module. I did that for the module core
code and it's on my todo list for the driver core too.

> Opening 'state' attribute will pin device structure into memory but will
> not increase _your_ module's refcount. It is nice if you have a subsystem
> core split from drivers code - then you can keep core module reference
> until device objects are gone and allow individual drivers be unloaded
> freely. But for single-module system it is pretty hard, that's why
> platform devices are popular.

They are popular for when you don't have a "bus", and rightfully so.

thanks,

greg k-h

2006-03-08 02:46:45

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tue, 2006-03-07 at 02:41 -0800, Andrew Morton wrote:
> Al Viro <[email protected]> wrote:
> >
> > On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > > > rmmod your_turd </sys/spew/from/your_turd
> > > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> > >
> > > To be fair, the only part of the kernel that supports the above process,
> > > is the network stack. And they implemented a special kind of lock to
> > > handle just this kind of thing.
> > >
> > > That is not something that I want the rest of the kernel to have to use.
> > > If your code blocks when doing the above thing, that's fine with me.
> >
> > One word: fail. With -EBUSY.
>
> It seems quite simple to make wait_for_zero_refcount() interruptible?
> Something like...

Al is correct.

It would have been so simple to implement rmmod as blocking, but it
seems that not what people want. They want modprobe -r to fail if the
module is busy, without ever causing spurious failures.

Hope that clarifies?
Rusty.
--
ccontrol: http://ozlabs.org/~rusty/ccontrol

2006-03-09 03:20:16

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Monday 06 March 2006 13:07, Arjan van de Ven wrote:
> > Is it more desirable to dynamically allocate kobjects than to declare
> > them statically?
>
> Yes
>
> > If so, I'd be curious to know why dynamic
> > allocation is preferred over static allocation.
>
> because the lifetime of the kobject is independent of the lifetime of
> the memory of your static allocation.
> Separate lifetimes -> separate memory is a very good design principle.

I'm not familiar with the internals of the module unloading code.
However, my understanding of the discussion so far is that the kernel
will refuse to unload a module while any of its kobjects still have
nonzero reference counts (either by waiting for the reference counts
to hit 0 or returning -EBUSY).

If this is the case, then I don't see any harm in declaring kobjects
statically. Declaring a kobject statically is simpler than
dynamically allocating and freeing it. Am I still missing something?

2006-03-09 03:44:39

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Wed, Mar 08, 2006 at 07:19:59PM -0800, Dave Peterson wrote:
> I'm not familiar with the internals of the module unloading code.
> However, my understanding of the discussion so far is that the kernel
> will refuse to unload a module while any of its kobjects still have
> nonzero reference counts (either by waiting for the reference counts
> to hit 0 or returning -EBUSY).
>
> If this is the case,

... the world you are living in is drastically different from the one
where the rest of us lives.

2006-03-09 06:00:28

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Wed, Mar 08, 2006 at 07:19:59PM -0800, Dave Peterson wrote:
> On Monday 06 March 2006 13:07, Arjan van de Ven wrote:
> > > Is it more desirable to dynamically allocate kobjects than to declare
> > > them statically?
> >
> > Yes
> >
> > > If so, I'd be curious to know why dynamic
> > > allocation is preferred over static allocation.
> >
> > because the lifetime of the kobject is independent of the lifetime of
> > the memory of your static allocation.
> > Separate lifetimes -> separate memory is a very good design principle.
>
> I'm not familiar with the internals of the module unloading code.
> However, my understanding of the discussion so far is that the kernel
> will refuse to unload a module while any of its kobjects still have
> nonzero reference counts (either by waiting for the reference counts
> to hit 0 or returning -EBUSY).

There is no tie between kobjects and modules. Only between attributes
and modules _if_ you have properly set them up.

That is the main problem, kobjects are data and modules are code, you
need to be careful to handle their reference counting properly.

good luck,

greg k-h

2006-03-09 23:51:55

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Tuesday 07 March 2006 11:03, Arjan van de Ven wrote:
> afaics it is a list of pci devices. these should just be symlinks to the
> sysfs resource of these pci devices instead, not a flat table file.

Ok, I'm looking at the EDAC sysfs interface. I see the following
issues concerning the "one value per file" rule:

1. /sys/devices/system/edac/mc/mc0/module_name contains two
values, a module name and a version:

# cat /sys/devices/system/edac/mc/mc0/module_name
k8_edac Ver: 2.0.1.devel Mar 8 2006
#

2. /sys/devices/system/edac/mc/mc0/supported_mem_type contains
the following on the machine I am looking at:

# cat /sys/devices/system/edac/mc/mc0/supported_mem_type
Unbuffered-DDR Registered-DDR
#

Here we have a whitespace-delimited list of values. Likewise,
the following files contain whitespace-delimited lists:

/sys/devices/system/edac/mc/mc0/edac_capability
/sys/devices/system/edac/mc/mc0/edac_current_capability

3. The following files contain comma-delimited lists of
(vendor ID, device ID) tuples:

/sys/devices/system/edac/pci/pci_parity_blacklist
/sys/devices/system/edac/pci/pci_parity_whitelist

I assume this is what Arjan is referring to.
Documentation/drivers/edac/edac.txt gives the following
description of how the whitelist functions:

This control file allows for an explicit list of PCI
devices to be scanned for parity errors. Only devices
found on this list will be examined. The list is a line
of hexadecimel VENDOR and DEVICE ID tuples:

1022:7450,1434:16a6

One or more can be inserted, seperated by a comma.
To write the above list doing the following as one
command line:

echo "1022:7450,1434:16a6"
> /sys/devices/system/edac/pci/pci_parity_whitelist

To display what the whitelist is, simply 'cat' the same
file.

Looking at the current EDAC implementation, these are all of the
"one value per file" issues I see. If anyone sees any others I
missed, please let me know. Here are my thoughts on each:

Issue #1
--------
Fixing this is easy. /sys/devices/system/edac/mc/mc0/module_name
can be replaced by two separate files, one providing the name and
the other providing the version:

/sys/devices/system/edac/mc/mc0/module_name
/sys/devices/system/edac/mc/mc0/module_version

Issue #2
--------
To fix this, /sys/devices/system/edac/mc/mc0/supported_mem_type
can be made into a directory containing a file representing each
supported memory type. Thus we might have the following:

/sys/devices/system/edac/mc/mc0/supported_mem_type
/sys/devices/system/edac/mc/mc0/supported_mem_type/Unbuffered-DDR
/sys/devices/system/edac/mc/mc0/supported_mem_type/Registered-DDR

In the above example, the files Unbuffered-DDR and Registered-DDR
would each be empty in content. The presence of each file would
indicate that the memory type it represents is supported.

Issue #3
--------
I am unclear about what to do here. If the list contents were
read-only, it would be relatively easy to make
/sys/devices/system/edac/pci/pci_parity_whitelist into a directory
containing symlinks, one for each device. However, the user is
supposed to be able to modify the list contents. This would imply
that the user creates and destroys symlinks. Does sysfs currently
support this sort of behavior? If not, what is the preferred
means for implementing a user-modifiable set of values?

2006-03-10 00:02:42

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Thu, Mar 09, 2006 at 03:51:25PM -0800, Dave Peterson wrote:
> On Tuesday 07 March 2006 11:03, Arjan van de Ven wrote:
> > afaics it is a list of pci devices. these should just be symlinks to the
> > sysfs resource of these pci devices instead, not a flat table file.
>
> Ok, I'm looking at the EDAC sysfs interface. I see the following
> issues concerning the "one value per file" rule:
>
> 1. /sys/devices/system/edac/mc/mc0/module_name contains two
> values, a module name and a version:
>
> # cat /sys/devices/system/edac/mc/mc0/module_name
> k8_edac Ver: 2.0.1.devel Mar 8 2006

Woah. That's what /sys/modules/ is for right? Don't add new stuff
please.

> 2. /sys/devices/system/edac/mc/mc0/supported_mem_type contains
> the following on the machine I am looking at:
>
> # cat /sys/devices/system/edac/mc/mc0/supported_mem_type
> Unbuffered-DDR Registered-DDR
> #
>
> Here we have a whitespace-delimited list of values. Likewise,
> the following files contain whitespace-delimited lists:
>
> /sys/devices/system/edac/mc/mc0/edac_capability
> /sys/devices/system/edac/mc/mc0/edac_current_capability

What exactly do they look like?

> 3. The following files contain comma-delimited lists of
> (vendor ID, device ID) tuples:
>
> /sys/devices/system/edac/pci/pci_parity_blacklist
> /sys/devices/system/edac/pci/pci_parity_whitelist

What exactly do they look like?

> I assume this is what Arjan is referring to.
> Documentation/drivers/edac/edac.txt gives the following
> description of how the whitelist functions:
>
> This control file allows for an explicit list of PCI
> devices to be scanned for parity errors. Only devices
> found on this list will be examined. The list is a line
> of hexadecimel VENDOR and DEVICE ID tuples:
>
> 1022:7450,1434:16a6
>
> One or more can be inserted, seperated by a comma.
> To write the above list doing the following as one
> command line:
>
> echo "1022:7450,1434:16a6"
> > /sys/devices/system/edac/pci/pci_parity_whitelist
>
> To display what the whitelist is, simply 'cat' the same
> file.
>
> Looking at the current EDAC implementation, these are all of the
> "one value per file" issues I see. If anyone sees any others I
> missed, please let me know. Here are my thoughts on each:
>
> Issue #1
> --------
> Fixing this is easy. /sys/devices/system/edac/mc/mc0/module_name
> can be replaced by two separate files, one providing the name and
> the other providing the version:
>
> /sys/devices/system/edac/mc/mc0/module_name
> /sys/devices/system/edac/mc/mc0/module_version

No, these should just be deleted. Use the proper MODULE_* macros for
these if you really want to display them to users.

> Issue #2
> --------
> To fix this, /sys/devices/system/edac/mc/mc0/supported_mem_type
> can be made into a directory containing a file representing each
> supported memory type. Thus we might have the following:
>
> /sys/devices/system/edac/mc/mc0/supported_mem_type
> /sys/devices/system/edac/mc/mc0/supported_mem_type/Unbuffered-DDR
> /sys/devices/system/edac/mc/mc0/supported_mem_type/Registered-DDR
>
> In the above example, the files Unbuffered-DDR and Registered-DDR
> would each be empty in content. The presence of each file would
> indicate that the memory type it represents is supported.

I don't think the original file is really a big problem.

> Issue #3
> --------
> I am unclear about what to do here. If the list contents were
> read-only, it would be relatively easy to make
> /sys/devices/system/edac/pci/pci_parity_whitelist into a directory
> containing symlinks, one for each device. However, the user is
> supposed to be able to modify the list contents. This would imply
> that the user creates and destroys symlinks. Does sysfs currently
> support this sort of behavior? If not, what is the preferred
> means for implementing a user-modifiable set of values?

No it doesn't. How big can this list get?

thanks,

greg k-h

2006-03-10 01:47:35

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Thursday 09 March 2006 16:02, Greg KH wrote:
> > 1. /sys/devices/system/edac/mc/mc0/module_name contains two
> > values, a module name and a version:
> >
> > # cat /sys/devices/system/edac/mc/mc0/module_name
> > k8_edac Ver: 2.0.1.devel Mar 8 2006
>
> Woah. That's what /sys/modules/ is for right? Don't add new stuff
> please.

My apologies for my lack of familiarity with how the contents of /sys
are supposed to be laid out. I'm just looking at what EDAC currently
does, trying to fix what's broken, and in the process learning how
sysfs is supposed to work.

> > Here we have a whitespace-delimited list of values. Likewise,
> > the following files contain whitespace-delimited lists:
> >
> > /sys/devices/system/edac/mc/mc0/edac_capability
> > /sys/devices/system/edac/mc/mc0/edac_current_capability
>
> What exactly do they look like?

On the machine I'm currently looking at, here is what I see:

# cd /sys/devices/system/edac/mc/mc0
# cat edac_capability
None SECDED S4ECD4ED
# cat edac_current_capability
None
#

Although edac_current_capability is shown above as having only one
value, it could in principle contain a few values (some subset of
the values contained in edac_capability).

> > 3. The following files contain comma-delimited lists of
> > (vendor ID, device ID) tuples:
> >
> > /sys/devices/system/edac/pci/pci_parity_blacklist
> > /sys/devices/system/edac/pci/pci_parity_whitelist
>
> What exactly do they look like?

Initially, both files are empty. If a user writes a list of values
to a file, the file will contain the values that the user wrote. For
instance, below we create a list containing two
(vendor ID, device ID) tuples:

# cd /sys/devices/system/edac/pci
# cat pci_parity_blacklist

# echo "1022:7450,1434:16a6" > pci_parity_blacklist
# cat pci_parity_blacklist
1022:7450,1434:16a6
#

> > Issue #1
> > --------
> > Fixing this is easy. /sys/devices/system/edac/mc/mc0/module_name
> > can be replaced by two separate files, one providing the name and
> > the other providing the version:
> >
> > /sys/devices/system/edac/mc/mc0/module_name
> > /sys/devices/system/edac/mc/mc0/module_version
>
> No, these should just be deleted. Use the proper MODULE_* macros for
> these if you really want to display them to users.

Ok, I'll make sure they get deleted.

> > Issue #2
> > --------
> > To fix this, /sys/devices/system/edac/mc/mc0/supported_mem_type
> > can be made into a directory containing a file representing each
> > supported memory type. Thus we might have the following:
> >
> > /sys/devices/system/edac/mc/mc0/supported_mem_type
> > /sys/devices/system/edac/mc/mc0/supported_mem_type/Unbuffered-DDR
> > /sys/devices/system/edac/mc/mc0/supported_mem_type/Registered-DDR
> >
> > In the above example, the files Unbuffered-DDR and Registered-DDR
> > would each be empty in content. The presence of each file would
> > indicate that the memory type it represents is supported.
>
> I don't think the original file is really a big problem.

Ok, so I'll leave this as-is?

> > Issue #3
> > --------
> > I am unclear about what to do here. If the list contents were
> > read-only, it would be relatively easy to make
> > /sys/devices/system/edac/pci/pci_parity_whitelist into a directory
> > containing symlinks, one for each device. However, the user is
> > supposed to be able to modify the list contents. This would imply
> > that the user creates and destroys symlinks. Does sysfs currently
> > support this sort of behavior? If not, what is the preferred
> > means for implementing a user-modifiable set of values?
>
> No it doesn't. How big can this list get?

It depends on how many PCI devices in your machine you wish to
blacklist or whitelist. The motivation for this feature is that
certain known badly-designed devices report an endless stream of
spurious PCI bus parity errors. We want to skip such devices when
checking for PCI bus parity errors.

Eventually we are thinking of maintaining a master list of known
bad hardware. The list will grow over time as users encounter
and report new instances of flaky devices. However, the user would
not have to feed the entire list to EDAC. I can imagine someone
writing a script that behaves as follows:

1. Determine the entire set of PCI devices present in the user's
machine.
2. Read the set of known bad hardware from the master list and
compute the intersection with the set of devices actually
present in the user's machine.
3. Feed the result of step 2 above to EDAC.

2006-03-10 07:37:05

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Thu, 2006-03-09 at 17:46 -0800, Dave Peterson wrote:
> > > Issue #3
> > > --------
> > > I am unclear about what to do here. If the list contents were
> > > read-only, it would be relatively easy to make
> > > /sys/devices/system/edac/pci/pci_parity_whitelist into a directory
> > > containing symlinks, one for each device. However, the user is
> > > supposed to be able to modify the list contents. This would imply
> > > that the user creates and destroys symlinks. Does sysfs currently
> > > support this sort of behavior? If not, what is the preferred
> > > means for implementing a user-modifiable set of values?
> >
> > No it doesn't. How big can this list get?
>
> It depends on how many PCI devices in your machine you wish to
> blacklist or whitelist. The motivation for this feature is that
> certain known badly-designed devices report an endless stream of
> spurious PCI bus parity errors. We want to skip such devices when
> checking for PCI bus parity errors.

ok so this is actually a per pci device property!
I would suggest moving this property to the pci device itself, not doing
it inside an edac directory.

by doing that you get the advantage that 1) it's a more logical place,
and 2) users can do it even for 1 of 2 cards, if they have 2 cards of
the same make and 3) you can use the generic kernel quirk interface for
this and 4) drivers can in principle control this for their hardware in
complex cases

I understand that on a PC, EDAC is the only user. but ppc has a similar
mechanism I suspect, and they more than likely would be able to share
this property....

2006-03-10 11:06:36

by Tim Small

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

Arjan van de Ven wrote:

> It depends on how many PCI devices in your machine you wish to
>
>>blacklist or whitelist. The motivation for this feature is that
>>certain known badly-designed devices report an endless stream of
>>spurious PCI bus parity errors. We want to skip such devices when
>>checking for PCI bus parity errors.
>>
>>
>
>ok so this is actually a per pci device property!
>I would suggest moving this property to the pci device itself, not doing
>it inside an edac directory.
>
>
Yes, this seems more sensible to me. For one thing, I suspect that just
keying on vendor:device is probably too blunt for this and that
blacklisting a particular PCI device revision is a likely requirement,
as well as subsystem vendor/subsystem device.

Tim.

2006-03-10 11:40:43

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Fri, 2006-03-10 at 11:06 +0000, Tim Small wrote:
> Arjan van de Ven wrote:
>
> > It depends on how many PCI devices in your machine you wish to
> >
> >>blacklist or whitelist. The motivation for this feature is that
> >>certain known badly-designed devices report an endless stream of
> >>spurious PCI bus parity errors. We want to skip such devices when
> >>checking for PCI bus parity errors.
> >>
> >>
> >
> >ok so this is actually a per pci device property!
> >I would suggest moving this property to the pci device itself, not doing
> >it inside an edac directory.
> >
> >
> Yes, this seems more sensible to me. For one thing, I suspect that just
> keying on vendor:device is probably too blunt for this and that
> blacklisting a particular PCI device revision is a likely requirement,
> as well as subsystem vendor/subsystem device.

and maybe even something as funky as firmware version.
So it for sure is a per device (not per ID) property, and something that
needs a global quirk table kind of thing with the option to do per
driver overrides


2006-03-10 17:46:55

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Thursday 09 March 2006 23:36, Arjan van de Ven wrote:
> ok so this is actually a per pci device property!
> I would suggest moving this property to the pci device itself, not doing
> it inside an edac directory.
>
> by doing that you get the advantage that 1) it's a more logical place,
> and 2) users can do it even for 1 of 2 cards, if they have 2 cards of
> the same make and 3) you can use the generic kernel quirk interface for
> this and 4) drivers can in principle control this for their hardware in
> complex cases
>
> I understand that on a PC, EDAC is the only user. but ppc has a similar
> mechanism I suspect, and they more than likely would be able to share
> this property....

I'd be curious to hear people's opinions on the following idea:
move the PCI bus parity error checking functionality from EDAC
to the PCI subsystem.

2006-03-10 17:58:07

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Fri, 2006-03-10 at 09:46 -0800, Dave Peterson wrote:
> On Thursday 09 March 2006 23:36, Arjan van de Ven wrote:
> > ok so this is actually a per pci device property!
> > I would suggest moving this property to the pci device itself, not doing
> > it inside an edac directory.
> >
> > by doing that you get the advantage that 1) it's a more logical place,
> > and 2) users can do it even for 1 of 2 cards, if they have 2 cards of
> > the same make and 3) you can use the generic kernel quirk interface for
> > this and 4) drivers can in principle control this for their hardware in
> > complex cases
> >
> > I understand that on a PC, EDAC is the only user. but ppc has a similar
> > mechanism I suspect, and they more than likely would be able to share
> > this property....
>
> I'd be curious to hear people's opinions on the following idea:
> move the PCI bus parity error checking functionality from EDAC
> to the PCI subsystem.

I can see the point on at least moving all the infrastructure there.
The actual call to run it... maybe. that's more debatable I suppose.

2006-03-10 19:08:33

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Friday 10 March 2006 09:58, Arjan van de Ven wrote:
> > I'd be curious to hear people's opinions on the following idea:
> > move the PCI bus parity error checking functionality from EDAC
> > to the PCI subsystem.
>
> I can see the point on at least moving all the infrastructure there.
> The actual call to run it... maybe. that's more debatable I suppose.

Regarding the actual call to run it, I guess it depends on which of
the following you prefer:

Scenario A
----------
A more decentralized layout. Here, the controls that govern the
error handling behavior for a given category of hardware (a
category might be "PCI devices" or "devices that use bus
technology XYZ") are grouped together with other stuff for that
category.

Scenario B
----------
A more centralized layout. Here, the controls that govern a wide
variety of error handling behaviors are all grouped together (for
instance, within EDAC).

I tend to prefer scenario A. The conceptual model I currently have in
my mind for EDAC is as follows:

- Each low-level EDAC module (amd76x, e7xxx, etc.) implements
hardware error handling functionality that is specific to just
that platform.

- The purpose of the core EDAC module is to serve as a home for
abstractions that are common to the various low-level EDAC
modules. This creates uniformity of mechanism, and also uniform
behavior from the user's point of view. It also encourages
avoidance of replicated code. However, the core EDAC module
(and by extension, the low-level EDAC modules) should not
contain stuff that is generic enough to fit elsewhere (for
instance, in the PCI subsystem). In other words, EDAC serves as
a catch-all for error handling functionality that is too
platform-specific to fit anywhere else.

However, these are just my personal thoughts. I'd be curious to hear
other ideas people may have.

2006-03-10 19:33:20

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Fri, 2006-03-10 at 11:07 -0800, Dave Peterson wrote:
> On Friday 10 March 2006 09:58, Arjan van de Ven wrote:
> > > I'd be curious to hear people's opinions on the following idea:
> > > move the PCI bus parity error checking functionality from EDAC
> > > to the PCI subsystem.
> >
> > I can see the point on at least moving all the infrastructure there.
> > The actual call to run it... maybe. that's more debatable I suppose.
>
> Regarding the actual call to run it, I guess it depends on which of
> the following you prefer:
>
> Scenario A
> ----------
> A more decentralized layout. Here, the controls that govern the
> error handling behavior for a given category of hardware (a
> category might be "PCI devices" or "devices that use bus
> technology XYZ") are grouped together with other stuff for that
> category.

this would basically make edac a place to report "help something went to
the gutter". Sure. I see that useful.

In fact there are 3 layers then

1) low level "do check" function

2) per bus code that calls the do check functions and whatever is needed
for bus checks

3) "EDAC" central command, which basically gathers all failure reports
and does something with them (push them to userspace or implement the
userspace chosen policy (panic/reboot/etc))


having 1) separate from 2) is useful, it means that drivers can do
synchronous checks in addition to the checks in 2) which will tend to be
asynchronous....



2006-03-10 21:13:48

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Friday 10 March 2006 11:33, Arjan van de Ven wrote:
> > Regarding the actual call to run it, I guess it depends on which of
> > the following you prefer:
> >
> > Scenario A
> > ----------
> > A more decentralized layout. Here, the controls that govern the
> > error handling behavior for a given category of hardware (a
> > category might be "PCI devices" or "devices that use bus
> > technology XYZ") are grouped together with other stuff for that
> > category.
>
> this would basically make edac a place to report "help something went to
> the gutter". Sure. I see that useful.
>
> In fact there are 3 layers then
>
> 1) low level "do check" function

I'm a bit unclear here. Are you thinking of a single "do check"
function that bus-specific code and chipset-specific code can hook
into, or individual "do check" functions for various bus-specific and
chipset-specific modules?

> 2) per bus code that calls the do check functions and whatever is needed
> for bus checks
>
> 3) "EDAC" central command, which basically gathers all failure reports
> and does something with them (push them to userspace or implement the
> userspace chosen policy (panic/reboot/etc))

Are you suggesting something like the following?

- The controls that determine how the error checking is done are
located within the various hardware subsystems. For instance,
with PCI parity checking, this would include stuff like setting
the polling frequency and determining which devices to check.

- When an error is actually detected, the subsystem that detected
the error (for instance, PCI) would feed the error information
to EDAC. Then EDAC would determine how to respond to the error
(for instance, push it to userspace or implement the
userspace-chosen policy (panic/reboot/etc))

2006-03-10 21:23:47

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Fri, 2006-03-10 at 13:13 -0800, Dave Peterson wrote:
> On Friday 10 March 2006 11:33, Arjan van de Ven wrote:
> > > Regarding the actual call to run it, I guess it depends on which of
> > > the following you prefer:
> > >
> > > Scenario A
> > > ----------
> > > A more decentralized layout. Here, the controls that govern the
> > > error handling behavior for a given category of hardware (a
> > > category might be "PCI devices" or "devices that use bus
> > > technology XYZ") are grouped together with other stuff for that
> > > category.
> >
> > this would basically make edac a place to report "help something went to
> > the gutter". Sure. I see that useful.
> >
> > In fact there are 3 layers then
> >
> > 1) low level "do check" function
>
> I'm a bit unclear here. Are you thinking of a single "do check"
> function that bus-specific code and chipset-specific code can hook
> into, or individual "do check" functions for various bus-specific and
> chipset-specific modules?

hmm ok so I want a function that takes a device as parameter, and checks
the state of that device for errors. Internally that probably has to go
via a function pointer somewhere to a device specific check method.

Or maybe a per test-type (pci parity / ECC / etc) check

int pci_check_parity_errors(struct pci_dev *dev, int flags);

something like that, or pci_check_and_clear_parity_errors()
(although that gets too long :)

drivers can call that, say, after firmware init or something to validate
their device is sanely connected. Maybe pci_enable_device() could call
it too.

This also needs a pci_suspend_parity_check() ... _resume_ ... so that
the driver can temporarily disable any checks, for example during device
reset/init. And then just before resume, it manually clears a check.




>
> > 2) per bus code that calls the do check functions and whatever is needed
> > for bus checks
> >
> > 3) "EDAC" central command, which basically gathers all failure reports
> > and does something with them (push them to userspace or implement the
> > userspace chosen policy (panic/reboot/etc))
>
> Are you suggesting something like the following?
>
> - The controls that determine how the error checking is done are
> located within the various hardware subsystems. For instance,
> with PCI parity checking, this would include stuff like setting
> the polling frequency and determining which devices to check.

yes. I would NOT make it overly tunable btw. For many things a sane
default can be chosen, and the effect of picking a different tuning
isn't all that big. Maybe think of it this way: a tuneable to a large
degree is an excuse for not determining the right value / heuristic in
the first place.



> - When an error is actually detected, the subsystem that detected
> the error (for instance, PCI) would feed the error information
> to EDAC. Then EDAC would determine how to respond to the error
> (for instance, push it to userspace or implement the
> userspace-chosen policy (panic/reboot/etc))

yup.


2006-03-11 01:57:36

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Friday 10 March 2006 13:23, Arjan van de Ven wrote:
> hmm ok so I want a function that takes a device as parameter, and checks
> the state of that device for errors. Internally that probably has to go
> via a function pointer somewhere to a device specific check method.
>
> Or maybe a per test-type (pci parity / ECC / etc) check
>
> int pci_check_parity_errors(struct pci_dev *dev, int flags);
>
> something like that, or pci_check_and_clear_parity_errors()
> (although that gets too long :)
>
> drivers can call that, say, after firmware init or something to validate
> their device is sanely connected. Maybe pci_enable_device() could call
> it too.
>
> This also needs a pci_suspend_parity_check() ... _resume_ ... so that
> the driver can temporarily disable any checks, for example during device
> reset/init. And then just before resume, it manually clears a check.

ok, perhaps things might look something like this?

- A check_error() function checks a device for errors. As you
mention above, this may be a device-specific check method or a
function like pci_check_parity_errors(dev, flags). In either
case, if check_error() finds an error, it clears the error from
the device's registers and returns the error. If check_error()
finds multiple errors, here are couple of options:

- Return a list of all errors detected.
- Return a single error along with a boolean value that says
whether more errors are present. If so, check_error() may
be called repeatedly until no errors remain.

- A handle_error() function handles errors returned by
check_error(). Here are a few options: Each device may have a
handle_error() method that takes an error as a parameter. Or,
each type of error may have its own handle_me() method. A third
option is something like pci_handle_parity_error(dev, error).
In any case, depending on the error type, the handler may choose
to feed the error to EDAC.

- Each hardware subsystem or device driver may determine its own
error checking/handling strategy. Some may want to poll for
errors. Others may want error detection to be asynchronous
(driven by interrupts or exceptions). Or a subsystem may poll
for some kinds of errors and detect others asynchronously. As
you suggest, errors may also be checked for in other situations,
such as after firmware init.

For polling, the poll function calls check_error(), and then
calls handle_error() if an error is found. For interrupt-driven
error checking, the interrupt handler calls check_error(). If
an error is found, the interrupt handler may either call
handle_error() directly or defer invocation of handle_error()
outside interrupt context. If the interrupt is an NMI, deferred
invocation of handle_error() allows it to execute without
introducing deadlocks or race conditions.

- For some types of hardware, at boot time the device's registers
contain spurious error info, which should be discarded. This
may be done by calling check_error() and discarding the results.

> > > 2) per bus code that calls the do check functions and whatever is
> > > needed for bus checks
> > >
> > > 3) "EDAC" central command, which basically gathers all failure reports
> > > and does something with them (push them to userspace or implement the
> > > userspace chosen policy (panic/reboot/etc))
> >
> > Are you suggesting something like the following?
> >
> > - The controls that determine how the error checking is done are
> > located within the various hardware subsystems. For instance,
> > with PCI parity checking, this would include stuff like setting
> > the polling frequency and determining which devices to check.
>
> yes. I would NOT make it overly tunable btw. For many things a sane
> default can be chosen, and the effect of picking a different tuning
> isn't all that big. Maybe think of it this way: a tuneable to a large
> degree is an excuse for not determining the right value / heuristic in
> the first place.

Sounds good.

> > - When an error is actually detected, the subsystem that detected
> > the error (for instance, PCI) would feed the error information
> > to EDAC. Then EDAC would determine how to respond to the error
> > (for instance, push it to userspace or implement the
> > userspace-chosen policy (panic/reboot/etc))
>
> yup.

Cool! I think this also coincides with what Doug is saying. Doug, how
does this sound?

2006-03-11 07:18:37

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Fri, 2006-03-10 at 17:57 -0800, Dave Peterson wrote:
> On Friday 10 March 2006 13:23, Arjan van de Ven wrote:
> > hmm ok so I want a function that takes a device as parameter, and checks
> > the state of that device for errors. Internally that probably has to go
> > via a function pointer somewhere to a device specific check method.
> >
> > Or maybe a per test-type (pci parity / ECC / etc) check
> >
> > int pci_check_parity_errors(struct pci_dev *dev, int flags);
> >
> > something like that, or pci_check_and_clear_parity_errors()
> > (although that gets too long :)
> >
> > drivers can call that, say, after firmware init or something to validate
> > their device is sanely connected. Maybe pci_enable_device() could call
> > it too.
> >
> > This also needs a pci_suspend_parity_check() ... _resume_ ... so that
> > the driver can temporarily disable any checks, for example during device
> > reset/init. And then just before resume, it manually clears a check.
>
> ok, perhaps things might look something like this?

<snip>

sounds like overdesign to me ;)



2006-03-13 19:31:48

by Dave Peterson

[permalink] [raw]
Subject: Re: [PATCH] EDAC: core EDAC support code

On Friday 10 March 2006 23:18, Arjan van de Ven wrote:
> > ok, perhaps things might look something like this?
>
> <snip>
>
> sounds like overdesign to me ;)

Hmm... ok :) I don't want to suggest that Linux's hardware error
handling mechanism should be "all of the above". Rather my intent is
this:

- To summarize some previously mentioned options for error
handling APIs, and mention a couple more ideas. More discussion
is needed to decide exactly what we want.

- To suggest that drivers and hardware subsystems may want some
flexibility in how they detect and process errors. (However,
there is also such a thing as too much flexibility; EDAC should
create a certain amount of uniformity)

- To state a couple of motivations for keeping error checking
(i.e. reading error info from registers and clearing the
registers) logically separate from error handling (i.e. doing
something with the info obtained from the registers)

Another thing to think about: If EDAC is going to collect error info
from subsystems and drivers, it needs some sort of API that supports
this. Although I don't think this requires anything fancy, it still
needs some discussion.

I'd be curious to read more discussion/ideas about error handling APIs
(and see some patches that illustrate the ideas).

Dave