2003-09-08 02:53:15

by Zwane Mwaikambo

[permalink] [raw]
Subject: [PATCH][2.6][CFT] rmmod floppy kills box fixes + default_device_remove

Randy gave me the courage to delve in there... now that i'm lost to the
world here is one picked up from bugzilla;

The crux of it is that the floppy driver isn't deleting timers before
unloading itself. I've tested it locally somewhat, but the ioport
busy looks very strange (although things do function). There is one part
of this patch that i'd like Greg to look at, it's the
default_device_release addition...

http://bugme.osdl.org/show_bug.cgi?id=1061

This is from the bugzilla and is the closest to a oops/backtrace the
reporter could put together.

esi: c02e1080 edi: c02e0780 ebp: c0349ef8 esp: c0349ee4
ds: 007b es:007b ss: 0068

Process: swapper (pid: 0, threadinfo=c034800 task=c02dd980)
Stack: 00000001 00000000 00000000 c03711c8 c0349f0c c0349f24 c0120681 c02e0780
c02e0f88 0000001f c0349f0c c0349f0c c02ff5fc 00000001 c03711c8 0000000a
c0349f40 c011c899 c03711c8 00000046 00000000 c0345a00 00000000 c0349f64

Call Trace:
[<c0120681>] run_timer_softirq+0x101/0x180
[<c011c899>] do_softirq+0x99/0xa0
[<c010ad68>] do_IRQ+0xe8/0x120
[<c01092dc>] common_interrupt+0x18/0x20
[<c01dfcfa>] acpi_processor_idle+0x15a/0x1ef
[<c01dfba0>] acpi_processor_idle+0x0/0x1ef
[<c0107000>] default_idle+0x0/0x30
[<c01070a1>] cpu_idle+0x31/0x40
[<c0105000>] rest_init0x0/0x30
[<c034a6ec>] start_kernel+0x15c/0x190
[<c034a460>] unknown_bootoption+0x0/0x100

Code: 94 00 b5 38 2a c0 eb bf 8d 76 00 55 89 e5 57 56 53 83 ec 08 8b 75 10 8b 7d
08 c1 e6 03 03 75 0c 8b 1e 39 f3 74 1e 90 8d 74 26 00 <39> 7b 18 89 d8 75 22 8b
1b 89 3c 24 89 44 24 04 e8 1b fd ff ff
<0> kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing

Index: /build/source/linux-2.6.0-test4-mm6/drivers/block/floppy.c
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test4-mm6/drivers/block/floppy.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 floppy.c
--- /build/source/linux-2.6.0-test4-mm6/drivers/block/floppy.c 7 Sep 2003 20:26:57 -0000 1.1.1.1
+++ /build/source/linux-2.6.0-test4-mm6/drivers/block/floppy.c 8 Sep 2003 02:40:46 -0000
@@ -4206,6 +4206,9 @@ static int have_no_fdc= -ENODEV;
static struct platform_device floppy_device = {
.name = "floppy",
.id = 0,
+ .dev = {
+ .release = default_device_release,
+ }
};

static struct kobject *floppy_find(dev_t dev, int *part, void *data)
@@ -4580,7 +4583,10 @@ void cleanup_module(void)
platform_device_unregister(&floppy_device);
blk_unregister_region(MKDEV(FLOPPY_MAJOR, 0), 256);
unregister_blkdev(FLOPPY_MAJOR, "fd");
+
for (drive = 0; drive < N_DRIVE; drive++) {
+ del_timer_sync(&motor_off_timer[drive]);
+
if ((allowed_drive_mask & (1 << drive)) &&
fdc_state[FDC(drive)].version != FDC_NONE) {
del_gendisk(disks[drive]);
@@ -4590,7 +4596,13 @@ void cleanup_module(void)
}
devfs_remove("floppy");

+ del_timer_sync(&fd_timeout);
+ del_timer_sync(&fd_timer);
blk_cleanup_queue(floppy_queue);
+
+ if (usage_count)
+ floppy_release_irq_and_dma();
+
/* eject disk, if any */
fd_eject(0);
}
Index: /build/source/linux-2.6.0-test4-mm6/drivers/base/core.c
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test4-mm6/drivers/base/core.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 core.c
--- /build/source/linux-2.6.0-test4-mm6/drivers/base/core.c 7 Sep 2003 20:26:56 -0000 1.1.1.1
+++ /build/source/linux-2.6.0-test4-mm6/drivers/base/core.c 7 Sep 2003 23:20:48 -0000
@@ -332,6 +332,14 @@ void device_del(struct device * dev)
}

/**
+ * @default_device_release - default null release function
+ * @dev: device to release
+ */
+void default_device_release(struct device *dev)
+{
+}
+
+/**
* device_unregister - unregister device from system.
* @dev: device going away.
*
@@ -381,6 +389,7 @@ int __init devices_init(void)
return subsystem_register(&devices_subsys);
}

+EXPORT_SYMBOL(default_device_release);
EXPORT_SYMBOL(device_for_each_child);

EXPORT_SYMBOL(device_initialize);
Index: /build/source/linux-2.6.0-test4-mm6/include/linux/device.h
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test4-mm6/include/linux/device.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 device.h
--- /build/source/linux-2.6.0-test4-mm6/include/linux/device.h 7 Sep 2003 20:27:50 -0000 1.1.1.1
+++ /build/source/linux-2.6.0-test4-mm6/include/linux/device.h 7 Sep 2003 23:21:58 -0000
@@ -304,6 +304,7 @@ extern void device_unregister(struct dev
extern void device_initialize(struct device * dev);
extern int device_add(struct device * dev);
extern void device_del(struct device * dev);
+extern void default_device_release(struct device *dev);
extern int device_for_each_child(struct device *, void *,
int (*fn)(struct device *, void *));


2003-09-08 15:50:41

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH][2.6][CFT] rmmod floppy kills box fixes + default_device_remove

On Sun, Sep 07, 2003 at 10:53:08PM -0400, Zwane Mwaikambo wrote:
> Randy gave me the courage to delve in there... now that i'm lost to the
> world here is one picked up from bugzilla;
>
> The crux of it is that the floppy driver isn't deleting timers before
> unloading itself. I've tested it locally somewhat, but the ioport
> busy looks very strange (although things do function). There is one part
> of this patch that i'd like Greg to look at, it's the
> default_device_release addition...

Ick, no, I do not want to see this function get added, sorry.

What happens if someone grabs the struct device reference by opening a
sysfs file and then you unload the module? Yeah, not nice. Please do
_not_ create "empty" release() functions, unless you _really_ know what
you are doing (and providing a "default" one like this is just ripe for
abuse, that warning message in the kernel is there for a reason.)

thanks,

greg k-h

2003-09-08 21:27:20

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.6][CFT] rmmod floppy kills box fixes + default_device_remove

On Mon, 8 Sep 2003, Greg KH wrote:

> Ick, no, I do not want to see this function get added, sorry.

Well i was expecting that.

> What happens if someone grabs the struct device reference by opening a
> sysfs file and then you unload the module? Yeah, not nice. Please do

Doesn't this all get taken care of by the platform_device_unregister?

> _not_ create "empty" release() functions, unless you _really_ know what
> you are doing (and providing a "default" one like this is just ripe for
> abuse, that warning message in the kernel is there for a reason.)

I know it's begging for abuse, but i don't want to sprinkle empty
release() functions everywhere, e.g. looking at the floppy driver, i'm
not quite sure what i'm supposed to do with a release() function there,
the struct platform_device_struct is statically allocated. Basically i'd
like a pointer as to what to do with these release() functions..

Thanks,
Zwane

2003-09-08 23:10:35

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH][2.6][CFT] rmmod floppy kills box fixes + default_device_remove

On Mon, Sep 08, 2003 at 05:27:10PM -0400, Zwane Mwaikambo wrote:
> > What happens if someone grabs the struct device reference by opening a
> > sysfs file and then you unload the module? Yeah, not nice. Please do
>
> Doesn't this all get taken care of by the platform_device_unregister?

No it doesn't, sorry.

> > _not_ create "empty" release() functions, unless you _really_ know what
> > you are doing (and providing a "default" one like this is just ripe for
> > abuse, that warning message in the kernel is there for a reason.)
>
> I know it's begging for abuse, but i don't want to sprinkle empty
> release() functions everywhere, e.g. looking at the floppy driver, i'm
> not quite sure what i'm supposed to do with a release() function there,
> the struct platform_device_struct is statically allocated.

You need to sleep until your release function gets called. Take a look
at the cpufreq code for an example of this (also the i2c core does
this.)

> Basically i'd like a pointer as to what to do with these release()
> functions..

release() is called when the last reference to this device is dropped.
When it is called it is then safe to do the following:
- free the memory of the device
- unload the module of the device

So an empty release() function is the wrong thing to do in 99.99% of the
situations in the kernel (the one exception seems to be the mca release
function that recently got added for use when the bus is doing probing
logic.)

Does this help out?

thanks,

greg k-h

2003-09-09 11:51:17

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.6][CFT] rmmod floppy kills box fixes + default_device_remove

On Mon, 8 Sep 2003, Greg KH wrote:

> You need to sleep until your release function gets called. Take a look
> at the cpufreq code for an example of this (also the i2c core does
> this.)
>
> > Basically i'd like a pointer as to what to do with these release()
> > functions..
>
> release() is called when the last reference to this device is dropped.
> When it is called it is then safe to do the following:
> - free the memory of the device
> - unload the module of the device
>
> So an empty release() function is the wrong thing to do in 99.99% of the
> situations in the kernel (the one exception seems to be the mca release
> function that recently got added for use when the bus is doing probing
> logic.)
>
> Does this help out?

Yes thanks, i was confused over which memory references had to be
maintained.

2003-09-09 16:39:06

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.6][CFT] rmmod floppy kills box fixes + default_device_remove

On Tue, 9 Sep 2003, Zwane Mwaikambo wrote:

> > So an empty release() function is the wrong thing to do in 99.99% of the
> > situations in the kernel (the one exception seems to be the mca release
> > function that recently got added for use when the bus is doing probing
> > logic.)
> >
> > Does this help out?
>
> Yes thanks, i was confused over which memory references had to be
> maintained.

Ok i had another look and i can see why you need a seperate release
function, as we don't always do the kobject_cleanup immediately.

John and myself had a look and now we have the following race on
->release() function exit.

my_release_fn()
{
complete(&my_completion);
<== [1] stall anywhere here, e.g. preempt/schedule
}

cleanup_module()
{
wait_for_completion(&my_completion);
<== [1] this task gets scheduled, free()s module text
}

2003-09-09 17:22:48

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH][2.6][CFT] rmmod floppy kills box fixes + default_device_remove

On Tue, Sep 09, 2003 at 12:38:51PM -0400, Zwane Mwaikambo wrote:
> On Tue, 9 Sep 2003, Zwane Mwaikambo wrote:
>
> > > So an empty release() function is the wrong thing to do in 99.99% of the
> > > situations in the kernel (the one exception seems to be the mca release
> > > function that recently got added for use when the bus is doing probing
> > > logic.)
> > >
> > > Does this help out?
> >
> > Yes thanks, i was confused over which memory references had to be
> > maintained.
>
> Ok i had another look and i can see why you need a seperate release
> function, as we don't always do the kobject_cleanup immediately.
>
> John and myself had a look and now we have the following race on
> ->release() function exit.
>
> my_release_fn()
> {
> complete(&my_completion);
> <== [1] stall anywhere here, e.g. preempt/schedule
> }

Ugh. Sure, point out the theoretical :)

Any thoughts on how to solve this?

thanks,

greg k-h

2003-09-09 19:18:42

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.6][CFT] rmmod floppy kills box fixes + default_device_remove

On Tue, 9 Sep 2003, Greg KH wrote:

> Ugh. Sure, point out the theoretical :)
>
> Any thoughts on how to solve this?

How about something like the following, the kobj_type.done is passed from
the driver so the driver's presence can maintain it's persistence and
we're guaranteed that the ->release() function is not running on a
processor at completion time.

Index: linux-2.6.0-test5/drivers/base/core.c
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test5/drivers/base/core.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 core.c
--- linux-2.6.0-test5/drivers/base/core.c 8 Sep 2003 22:07:57 -0000 1.1.1.1
+++ linux-2.6.0-test5/drivers/base/core.c 9 Sep 2003 18:38:45 -0000
@@ -334,6 +334,14 @@ void device_del(struct device * dev)

}

+void device_release_notify(struct device *dev, struct completion *done)
+{
+ struct kobj_type *ktype = get_ktype(&dev->kobj);
+
+ init_completion(done);
+ ktype->done = done;
+}
+
/**
* device_unregister - unregister device from system.
* @dev: device going away.
Index: linux-2.6.0-test5/lib/kobject.c
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test5/lib/kobject.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 kobject.c
--- linux-2.6.0-test5/lib/kobject.c 8 Sep 2003 22:08:55 -0000 1.1.1.1
+++ linux-2.6.0-test5/lib/kobject.c 9 Sep 2003 18:10:50 -0000
@@ -448,8 +448,12 @@ void kobject_cleanup(struct kobject * ko
if (kobj->k_name != kobj->name)
kfree(kobj->k_name);
kobj->k_name = NULL;
- if (t && t->release)
+ if (t && t->release) {
t->release(kobj);
+ if (t->done)
+ complete(t->done);
+ }
+
if (s)
kset_put(s);
}
Index: linux-2.6.0-test5/include/linux/kobject.h
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test5/include/linux/kobject.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 kobject.h
--- linux-2.6.0-test5/include/linux/kobject.h 8 Sep 2003 22:08:50 -0000 1.1.1.1
+++ linux-2.6.0-test5/include/linux/kobject.h 9 Sep 2003 17:47:19 -0000
@@ -59,6 +59,7 @@ extern void kobject_put(struct kobject *

struct kobj_type {
void (*release)(struct kobject *);
+ struct completion * done;
struct sysfs_ops * sysfs_ops;
struct attribute ** default_attrs;
};


Then in the driver module_exit;

void cleanup_module(void)
{
...
struct completion done;
struct device *dev = ...

device_release_notify(dev, &done);

...

wait_for_completion(&done);
}

2003-09-09 19:45:59

by John Levon

[permalink] [raw]
Subject: Re: [PATCH][2.6][CFT] rmmod floppy kills box fixes + default_device_remove

On Tue, Sep 09, 2003 at 03:18:20PM -0400, Zwane Mwaikambo wrote:

> > Any thoughts on how to solve this?
>
> How about something like the following, the kobj_type.done is passed from
> the driver so the driver's presence can maintain it's persistence and

This works. Repeat ad infinitum for all the other platform devices in
pcmcia etc, 99% of which don't need to release anything extra.

Alternatively why don't we do something like the below ? This still
allows platform devices to clean up if necessary, but solves the "wait
for last reference" on a sensible level IMHO.

Index: platform.c
===================================================================
RCS file: /home/cvs/linux-2.5/drivers/base/platform.c,v
retrieving revision 1.11
diff -u -p -r1.11 platform.c
--- platform.c 16 Aug 2003 05:00:10 -0000 1.11
+++ platform.c 9 Sep 2003 18:02:22 -0000
@@ -18,6 +18,16 @@ struct device legacy_bus = {
.bus_id = "legacy",
};

+
+static void platform_device_release(struct device * dev)
+{
+ struct platform_device * pdev = to_platform_device(dev);
+ if (pdev->release)
+ pdev->release(pdev);
+ complete(&pdev->done);
+}
+
+
/**
* platform_device_register - add a platform-level device
* @dev: platform device we're adding
@@ -32,6 +42,12 @@ int platform_device_register(struct plat
pdev->dev.parent = &legacy_bus;

pdev->dev.bus = &platform_bus_type;
+
+ if (pdev->dev.release) {
+ printk("use the other release dude\n");
+ }
+ pdev->dev.release = platform_device_release;
+ init_completion(&pdev->done);

snprintf(pdev->dev.bus_id,BUS_ID_SIZE,"%s%u",pdev->name,pdev->id);

@@ -44,6 +60,7 @@ void platform_device_unregister(struct p
{
if (pdev)
device_unregister(&pdev->dev);
+ wait_for_completion(&pdev->done);
}


--
Khendon's Law:
If the same point is made twice by the same person, the thread is over.