2000-10-30 14:11:18

by Shmuel Hen

[permalink] [raw]
Subject: Locking Between User Context and Soft IRQs in 2.4.0

Hello,

We are trying to port a network driver from 2.2.x to 2.4.x and have some
question regarding locks.
According to the kernel locking HOWTO, we have to take extra care when
locking between user context threads and BH/tasklet/softIRQ,
so we learned (the hard way ;-) that when running the ioctl system call from
an application we should use spin_lock/unlock_bh() and not
spin_lock/unlock() inside dev->do_ioctl().

* What about the other entry points implemented in net_device ?
* We've got dev->get_stats, dev->set_mac_address,
dev->set_mutlicast_list and others that are all called from running
'ifconfig' which is an application. Are they considered user context too ?
* What about dev->open and dev->stop ?
* We figured that dev->hard_start_xmit() and timer callbacks are not
considered user context, but how can I find out if they are being run as
SoftIRQ or as tasklets or as Bottom Halves ? (their different definitions
require different types of protections)

Our driver is actually an intermediate driver bound on top of a regular net
driver. It behaves both as a network adapter driver and a protocol at the
same time. I can safely assume that it will have to handle both transmits
and receives simultaneously (no hardware interrupts are involved). We've
decided that for the first stage we are going to implement "wide" locks that
wrap entire operations from top to bottom. For example, our
dev->hard_start_xmit() will have a spin_lock() at the beginning and a
spin_unlock() at the end of the function.
* Will it be safe to keep the lock until after the call to the base
driver's hard_start_xmit, or do I have to release the lock just before that
?
* Or, in our receive function, will I have to release the lock before
or after the call to netif_rx() ?
* What about other calls to the kernel ? can the running thread be
switched out of context when calling kernel entries and not be switched back
in when they finish ? should I beware of deadlocks in such case ?


Thanks in advance,
Shmulik Hen,
Software Engineer
Linux Advanced Networking Services
Network Communications Group, Israel (NCGj)
Intel Corporation Ltd.





2000-11-04 09:46:20

by Jeff Garzik

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0



struct net_device synchronization rules
=======================================
dev->open:
Locking: Inside rtnl_lock() semaphore.
Sleeping: OK

dev->stop:
Locking: Inside rtnl_lock() semaphore.
Sleeping: OK

dev->do_ioctl:
Locking: Inside rtnl_lock() semaphore.
Sleeping: OK

dev->get_stats:
Locking: Inside dev_base_lock spinlock.
Sleeping: NO

dev->hard_start_xmit:
Locking: Inside dev->xmit_lock spinlock.
Sleeping: NO[1]

dev->tx_timeout:
Locking: Inside dev->xmit_lock spinlock.
Sleeping: NO[1]

dev->set_multicast_list:
Locking: Inside dev->xmit_lock spinlock.
Sleeping: NO[1]


NOTE [1]: On principle, you should not sleep when a spinlock is held.
However, since this spinlock is per-net-device, we only block ourselves
if we sleep, so the effect is mitigated.


Attachments:
netdevices.txt (785.00 B)

2000-11-04 10:19:39

by Andi Kleen

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

On Sat, Nov 04, 2000 at 04:45:33AM -0500, Jeff Garzik wrote:
>
> > * What about dev->open and dev->stop ?
>
> Sleep all you want, we'll leave the light on for ya.


... but make sure you have no module unload races (or at least not too
huge holes, some are probably unavoidable with the current network
driver interface, e.g. without moving module count management a bit up).
This means you should do MOD_INC_USE_COUNT very early at least to
minimize the windows (and DEC_USE_COUNT very late)

> dev->do_ioctl:
> Locking: Inside rtnl_lock() semaphore.
> Sleeping: OK

Just make sure you lock against your interrupt and xmit threads.

> Locking: Inside dev->xmit_lock spinlock.
> Sleeping: NO[1]
>
>
> NOTE [1]: On principle, you should not sleep when a spinlock is held.
> However, since this spinlock is per-net-device, we only block ourselves
> if we sleep, so the effect is mitigated.

Sounds like dangerous advice.

-Andi

2000-11-04 15:37:03

by Jeff Garzik

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Andi Kleen wrote:
>
> On Sat, Nov 04, 2000 at 04:45:33AM -0500, Jeff Garzik wrote:
> >
> > > * What about dev->open and dev->stop ?
> >
> > Sleep all you want, we'll leave the light on for ya.
>
> ... but make sure you have no module unload races (or at least not too
> huge holes, some are probably unavoidable with the current network
> driver interface, e.g. without moving module count management a bit up).
> This means you should do MOD_INC_USE_COUNT very early at least to
> minimize the windows (and DEC_USE_COUNT very late)

Can you provide a trace of a race or deadlock? I do not see where there
are races in the current 2.4.x code.


> > dev->do_ioctl:
> > Locking: Inside rtnl_lock() semaphore.
> > Sleeping: OK
>
> Just make sure you lock against your interrupt and xmit threads.

But of course :) My doc only covered locks external to the driver.


> > Locking: Inside dev->xmit_lock spinlock.
> > Sleeping: NO[1]
> >
> >
> > NOTE [1]: On principle, you should not sleep when a spinlock is held.
> > However, since this spinlock is per-net-device, we only block ourselves
> > if we sleep, so the effect is mitigated.
>
> Sounds like dangerous advice.

Probably... I changed the doc "just say no" :)

Jeff


--
Jeff Garzik | Dinner is ready when
Building 1024 | the smoke alarm goes off.
MandrakeSoft | -/usr/games/fortune

2000-11-04 16:57:28

by Andi Kleen

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

On Sat, Nov 04, 2000 at 10:36:36AM -0500, Jeff Garzik wrote:
> Andi Kleen wrote:
> >
> > On Sat, Nov 04, 2000 at 04:45:33AM -0500, Jeff Garzik wrote:
> > >
> > > > * What about dev->open and dev->stop ?
> > >
> > > Sleep all you want, we'll leave the light on for ya.
> >
> > ... but make sure you have no module unload races (or at least not too
> > huge holes, some are probably unavoidable with the current network
> > driver interface, e.g. without moving module count management a bit up).
> > This means you should do MOD_INC_USE_COUNT very early at least to
> > minimize the windows (and DEC_USE_COUNT very late)
>
> Can you provide a trace of a race or deadlock? I do not see where there
> are races in the current 2.4.x code.

All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
is nothing that would a driver prevent from being unloaded on a different
CPU while it is already executing in ->open but has not yet executed the add
yet or after it has executed the _DEC but it is still running in module code
Normally the windows are pretty small, but very long running interrupt
on one CPU hitting exactly in the wrong moment can change that.

-Andi

2000-11-04 17:08:10

by Jeff Garzik

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Andi Kleen wrote:
> All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
> is nothing that would a driver prevent from being unloaded on a different
> CPU while it is already executing in ->open but has not yet executed the add
> yet or after it has executed the _DEC but it is still running in module code
> Normally the windows are pretty small, but very long running interrupt
> on one CPU hitting exactly in the wrong moment can change that.

Module unload calls unregister_netdev, which grabs rtnl_lock.
dev->open runs under rtnl_lock.

Given this, how can the driver be unloaded if dev->open is running?

--
Jeff Garzik | Dinner is ready when
Building 1024 | the smoke alarm goes off.
MandrakeSoft | -/usr/games/fortune

2000-11-05 00:38:36

by Andi Kleen

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

On Sat, Nov 04, 2000 at 12:07:34PM -0500, Jeff Garzik wrote:
> Andi Kleen wrote:
> > All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
> > is nothing that would a driver prevent from being unloaded on a different
> > CPU while it is already executing in ->open but has not yet executed the add
> > yet or after it has executed the _DEC but it is still running in module code
> > Normally the windows are pretty small, but very long running interrupt
> > on one CPU hitting exactly in the wrong moment can change that.
>
> Module unload calls unregister_netdev, which grabs rtnl_lock.
> dev->open runs under rtnl_lock.
>
> Given this, how can the driver be unloaded if dev->open is running?

It does not help, because when the semaphore synchronizes it is already
too late -- free_module already did the zero module count check and
nothing is going to stop it from unloading.

-Andi

2000-11-05 01:31:55

by Andrew Morton

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Andi Kleen wrote:
>
> On Sat, Nov 04, 2000 at 12:07:34PM -0500, Jeff Garzik wrote:
> > Andi Kleen wrote:
> > > All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
> > > is nothing that would a driver prevent from being unloaded on a different
> > > CPU while it is already executing in ->open but has not yet executed the add
> > > yet or after it has executed the _DEC but it is still running in module code
> > > Normally the windows are pretty small, but very long running interrupt
> > > on one CPU hitting exactly in the wrong moment can change that.
> >
> > Module unload calls unregister_netdev, which grabs rtnl_lock.
> > dev->open runs under rtnl_lock.
> >
> > Given this, how can the driver be unloaded if dev->open is running?
>
> It does not help, because when the semaphore synchronizes it is already
> too late -- free_module already did the zero module count check and
> nothing is going to stop it from unloading.

aaarrrggh!!!

CPU0 CPU1

rtnl_lock()
dev_ifsioc()
dev_change_flags()
dev_open();
dev->open();
vortex_open()
sys_delete_module()
if (!__MOD_IN_USE)
free_module()
mod->cleanup()
vortex_cleanup()
pci_unregister_driver()
[ time passes ] drv->remove();
vortex_remove_one()
unregister_netdev()
unregister_netdevice()
rtnl_lock() /* blocks */
...
MOD_INC_USE_COUNT;
...
rtnl_unlock()
...
module_unmap(); /* Not good */


We can't even fix this with a lock_kernel wrapped around
the dev->owner stuff in dev_open(), because the netdevice's
open() can sleep.

<subliminalmessage>prumpf's patch</sumliminalmessage>

Perhaps the best thing to do here is to create a system-wide
semaphore for module unloading. So we do a down()/up()
in sys_delete_module() and do this in dev_open:

/*
* Call device private open method
*/

down(&mod_unload_sem); /* sync with sys_delete_module() */
if (dev->owner == 0) {
if (dev->open)
ret = dev->open(dev);
} else {
if (try_inc_mod_count(dev->owner)) {
if (dev->open) {
if ((ret = dev->open(dev)) != 0)
__MOD_DEC_USE_COUNT(dev->owner);
}
} else
ret = -ENODEV;
}
up(&mod_unload_sem);

2000-11-05 01:52:53

by Andrew Morton

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Andrew Morton wrote:
>
> Perhaps the best thing to do here is to create a system-wide
> semaphore for module unloading. So we do a down()/up()
> in sys_delete_module() and do this in dev_open:

Yep. I think this is right. Jeff, this supersedes the
patch you sent to devem yesterday.


--- linux-2.4.0-test10/include/linux/netdevice.h Sat Nov 4 16:22:49 2000
+++ linux-akpm/include/linux/netdevice.h Sun Nov 5 12:40:00 2000
@@ -146,6 +146,11 @@
struct neigh_parms;
struct sk_buff;

+/* Centralised module refcounting for netdevices */
+struct module;
+#define SET_NETDEVICE_OWNER(dev) \
+ do { dev->owner = THIS_MODULE; } while (0)
+
struct netif_rx_stats
{
unsigned total;
@@ -382,6 +387,9 @@
unsigned char *haddr);
int (*neigh_setup)(struct net_device *dev, struct neigh_parms *);
int (*accept_fastpath)(struct net_device *, struct dst_entry*);
+
+ /* open/release and usage marking */
+ struct module *owner;

/* bridge stuff */
struct net_bridge_port *br_port;
--- linux-2.4.0-test10/include/linux/module.h Sat Nov 4 16:22:49 2000
+++ linux-akpm/include/linux/module.h Sun Nov 5 12:37:54 2000
@@ -146,10 +146,16 @@
#ifdef CONFIG_MODULES
extern unsigned long get_module_symbol(char *, char *);
extern void put_module_symbol(unsigned long);
+struct semaphore;
+extern struct semaphore mod_unload_sem;
+#define DOWN_MOD_UNLOAD_SEM() down(&mod_unload_sem)
+#define UP_MOD_UNLOAD_SEM() up(&mod_unload_sem)
#else
static inline unsigned long get_module_symbol(char *unused1, char *unused2) { return 0; };
static inline void put_module_symbol(unsigned long unused) { };
-#endif
+#define DOWN_MOD_UNLOAD_SEM() do { } while (0)
+#define UP_MOD_UNLOAD_SEM() do { } while (0)
+#endif

extern int try_inc_mod_count(struct module *mod);

--- linux-2.4.0-test10/net/core/dev.c Sat Nov 4 16:22:50 2000
+++ linux-akpm/net/core/dev.c Sun Nov 5 12:39:23 2000
@@ -93,6 +93,7 @@
#include <net/profile.h>
#include <linux/init.h>
#include <linux/kmod.h>
+#include <linux/module.h>
#if defined(CONFIG_NET_RADIO) || defined(CONFIG_NET_PCMCIA_RADIO)
#include <linux/wireless.h> /* Note : will define WIRELESS_EXT */
#endif /* CONFIG_NET_RADIO || CONFIG_NET_PCMCIA_RADIO */
@@ -666,9 +667,20 @@
/*
* Call device private open method
*/
-
- if (dev->open)
- ret = dev->open(dev);
+ DOWN_MOD_UNLOAD_SEM(); /* Synchronise with sys_delete_module */
+ if (dev->owner == 0) {
+ if (dev->open)
+ ret = dev->open(dev);
+ } else {
+ if (try_inc_mod_count(dev->owner)) {
+ if (dev->open) {
+ if ((ret = dev->open(dev)) != 0)
+ __MOD_DEC_USE_COUNT(dev->owner);
+ }
+ } else
+ ret = -ENODEV;
+ }
+ UP_MOD_UNLOAD_SEM();

/*
* If it went open OK then:
@@ -783,6 +795,13 @@
* Tell people we are down
*/
notifier_call_chain(&netdev_chain, NETDEV_DOWN, dev);
+
+ /*
+ * Drop the module refcount
+ */
+ if (dev->owner) {
+ __MOD_DEC_USE_COUNT(dev->owner);
+ }

return(0);
}
--- linux-2.4.0-test10/kernel/module.c Sat Nov 4 16:22:49 2000
+++ linux-akpm/kernel/module.c Sun Nov 5 12:36:03 2000
@@ -44,6 +44,7 @@
static struct module *find_module(const char *name);
static void free_module(struct module *, int tag_freed);

+DECLARE_MUTEX(mod_unload_sem);

/*
* Called at boot time
@@ -369,6 +370,7 @@
return -EPERM;

lock_kernel();
+ down(&mod_unload_sem);
if (name_user) {
if ((error = get_mod_name(name_user, &name)) < 0)
goto out;
@@ -431,6 +433,7 @@
mod->flags &= ~MOD_JUST_FREED;
error = 0;
out:
+ up(&mod_unload_sem);
unlock_kernel();
return error;
}

2000-11-05 02:32:38

by Andi Kleen

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

On Sun, Nov 05, 2000 at 12:28:55PM +1100, Andrew Morton wrote:
> Andi Kleen wrote:
> >
> > On Sat, Nov 04, 2000 at 12:07:34PM -0500, Jeff Garzik wrote:
> > > Andi Kleen wrote:
> > > > All the MOD_INC/DEC_USE_COUNT are done inside the modules themselves. There
> > > > is nothing that would a driver prevent from being unloaded on a different
> > > > CPU while it is already executing in ->open but has not yet executed the add
> > > > yet or after it has executed the _DEC but it is still running in module code
> > > > Normally the windows are pretty small, but very long running interrupt
> > > > on one CPU hitting exactly in the wrong moment can change that.
> > >
> > > Module unload calls unregister_netdev, which grabs rtnl_lock.
> > > dev->open runs under rtnl_lock.
> > >
> > > Given this, how can the driver be unloaded if dev->open is running?
> >
> > It does not help, because when the semaphore synchronizes it is already
> > too late -- free_module already did the zero module count check and
> > nothing is going to stop it from unloading.
>
> aaarrrggh!!!

A simple fix at least for this case would be a recheck of module count
after free_module() executed the cleanup function.

-Andi

2000-11-05 03:40:10

by Keith Owens

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

On Sun, 05 Nov 2000 12:28:55 +1100,
Andrew Morton <[email protected]> wrote:
> CPU0 CPU1
>
> rtnl_lock()
> dev_ifsioc()
> dev_change_flags()
> dev_open();
> dev->open();
> vortex_open()
> sys_delete_module()
> if (!__MOD_IN_USE)
> free_module()

If dev->open() calls try_inc_use_count() before it enters the module
you will lock out concurrent module unload via module unload_lock.
There is no need for another module semaphore.

Also there is no need to test dev->owner, try_inc_use_count() already
does that.

/*
* Call device private open method
*/

ret = -ENODEV;
if (try_inc_mod_count(dev->owner)) {
if (dev->open && (ret = dev->open(dev)) && dev->owner)
__MOD_DEC_USE_COUNT(dev->owner);
}

In dev_close()

/*
* Call device private close method
*/

if (dev->owner)
__MOD_DEC_USE_COUNT(dev->owner);

2000-11-05 03:48:03

by Keith Owens

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Pressed enter too soon.

/*
* Call device private open method
*/

ret = -ENODEV;
if (dev->open && try_inc_mod_count(dev->owner)) {
if ((ret = dev->open(dev)) && dev->owner)
__MOD_DEC_USE_COUNT(dev->owner);
}


2000-11-05 11:46:37

by Andrew Morton

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Keith Owens wrote:
>
> Pressed enter too soon.
>
> /*
> * Call device private open method
> */
>
> ret = -ENODEV;
> if (dev->open && try_inc_mod_count(dev->owner)) {
> if ((ret = dev->open(dev)) && dev->owner)
> __MOD_DEC_USE_COUNT(dev->owner);
> }

y'know, if we keep working this patch for about a year we
might end up getting it right. Thousand monkeys and all that.

The above assumes that (dev->open == 0) is an error, but
netdevices are in fact allowed to have a NULL open() method.

So hereunder is about the seventieth version of the netdevice
modules safety patch. Go wild.

Some notes:

- I've created the generic macro SET_OWNER_MODULE and put it
into modules.h. It may perhaps be useful for things other
than netdevices? The intent here is that 2.4-only drivers
can use:

SET_OWNER_MODULE(dev);

in their init routines.

Drivers which retain 2.2 compatibility should use:

#if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0)
#define SET_OWNER_MODULE(d)
#else
#undef MOD_INC_USE_COUNT
#undef MOD_DEC_USE_COUNT
#endif

xxx_probe()
{
...
SET_MODULE_OWNER(dev); /* 2.4 */
...
MOD_INC_USE_COUNT; /* 2.2 */
}

And everything should work.

- With this patch applied, the module refcounts for netdevices
will show doubled values in `lsmod', unless those drivers
have been changed to remove the now unneeded MOD_INC/DEC_USE_COUNT
macros (perhaps with the above 2.2-compatibility thing).



--- linux-2.4.0-test10/include/linux/netdevice.h Sat Nov 4 16:22:49 2000
+++ linux-akpm/include/linux/netdevice.h Sun Nov 5 22:15:48 2000
@@ -383,6 +383,9 @@
int (*neigh_setup)(struct net_device *dev, struct neigh_parms *);
int (*accept_fastpath)(struct net_device *, struct dst_entry*);

+ /* open/release and usage marking */
+ struct module *owner;
+
/* bridge stuff */
struct net_bridge_port *br_port;

--- linux-2.4.0-test10/include/linux/module.h Sat Nov 4 16:22:49 2000
+++ linux-akpm/include/linux/module.h Sun Nov 5 22:18:16 2000
@@ -313,4 +313,10 @@
#define EXPORT_NO_SYMBOLS
#endif /* MODULE */

+#ifdef CONFIG_MODULES
+#define SET_OWNER_MODULE(some_struct) do { some_struct->owner = THIS_MODULE; } while (0)
+#else
+#define SET_OWNER_MODULE(some_struct) do { } while (0)
+#endif
+
#endif /* _LINUX_MODULE_H */
--- linux-2.4.0-test10/net/core/dev.c Sat Nov 4 16:22:50 2000
+++ linux-akpm/net/core/dev.c Sun Nov 5 22:31:57 2000
@@ -93,6 +93,7 @@
#include <net/profile.h>
#include <linux/init.h>
#include <linux/kmod.h>
+#include <linux/module.h>
#if defined(CONFIG_NET_RADIO) || defined(CONFIG_NET_PCMCIA_RADIO)
#include <linux/wireless.h> /* Note : will define WIRELESS_EXT */
#endif /* CONFIG_NET_RADIO || CONFIG_NET_PCMCIA_RADIO */
@@ -666,9 +667,15 @@
/*
* Call device private open method
*/
-
- if (dev->open)
- ret = dev->open(dev);
+ if (try_inc_mod_count(dev->owner)) {
+ if (dev->open) {
+ ret = dev->open(dev);
+ if (ret != 0 && dev->owner)
+ __MOD_DEC_USE_COUNT(dev->owner);
+ }
+ } else {
+ ret = -ENODEV;
+ }

/*
* If it went open OK then:
@@ -783,6 +790,12 @@
* Tell people we are down
*/
notifier_call_chain(&netdev_chain, NETDEV_DOWN, dev);
+
+ /*
+ * Drop the module refcount
+ */
+ if (dev->owner)
+ __MOD_DEC_USE_COUNT(dev->owner);

return(0);
}

2000-11-06 08:06:36

by Paul Gortmaker

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Andrew Morton wrote:

> y'know, if we keep working this patch for about a year we
> might end up getting it right. Thousand monkeys and all that.

Yeah, probably still a year until the release of 2.4.0. 8)
Now where did I put those darn bananas...

> - With this patch applied, the module refcounts for netdevices
> will show doubled values in `lsmod', unless those drivers
> have been changed to remove the now unneeded MOD_INC/DEC_USE_COUNT
> macros (perhaps with the above 2.2-compatibility thing).

Assuming that nobody has all the MOD_..._USE_COUNT things culled
from a tree somewhere already, I quickly hacked up the following
script for drivers/net:

----------------
#!/bin/bash
for i in `grep -l 'MOD_..._USE_COUNT;' *.c */*.c`
do
mv $i $i~
cat $i~ | \
sed '/^$/{N;s/.*MOD.*COUNT;//;tz;b;:z;N;s/^\n$//;};/.*MOD.*COUNT;/d' > $i
done
----------------

It looks ugly but it zaps out the extra blank line when MOD_.*COUNT
had a blank line above and below. I had a quick look at the resulting
diff (4200 lines!) and it looks like the post-script hand editing will
be minimal (e.g. a few arcnet drivers have MOD_*_COUNT as the only code
in an if block).

We might want to filter the file list created by grep against VERSION_CODE
as people with that in their driver(s) probably don't want the wholesale
deletion of MOD_*_COUNT. (OTOH, drivers that have VERSION_CODE that
supports 2.0.38 or oddball 2.3.x versions could probably be pruned...)

That still leaves the addition of dev->owner = THIS_MODULE into
each device probe. One *hackish* way to do this without having to
deal with each driver could be something like this in netdevice.h

- extern void ether_setup(struct net_device *dev);
+ extern void __ether_setup(struct net_device *dev);
+ static inline void ether_setup(struct net_device *dev){
+ dev->owner = THIS_MODULE;
+ __ether_setup(dev);
+ }

Ugh. Probably should just add it to each probe and be done with it...

Paul. (aka. monkey #937)



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

2000-11-06 09:56:36

by Andrew Morton

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Paul Gortmaker wrote:
>
> Assuming that nobody has all the MOD_..._USE_COUNT things culled
> from a tree somewhere already, I quickly hacked up the following
> script for drivers/net:

Looks good. There's also drivers/isdn and possibly other places.

> ...
> We might want to filter the file list created by grep against VERSION_CODE
> as people with that in their driver(s) probably don't want the wholesale
> deletion of MOD_*_COUNT. (OTOH, drivers that have VERSION_CODE that
> supports 2.0.38 or oddball 2.3.x versions could probably be pruned...)

I think you're right. eepro100 and acenic seriously care about 2.2-compatibility
but AFAICT the others just pretend to.

> That still leaves the addition of dev->owner = THIS_MODULE into
> each device probe. One *hackish* way to do this without having to
> deal with each driver could be something like this in netdevice.h
>
> - extern void ether_setup(struct net_device *dev);
> + extern void __ether_setup(struct net_device *dev);
> + static inline void ether_setup(struct net_device *dev){
> + dev->owner = THIS_MODULE;
> + __ether_setup(dev);
> + }
>
> Ugh. Probably should just add it to each probe and be done with it...

mm.. Seeing as failure to set dev->owner is a fatal mistake,
it would be good to enforce this via the compiler type system.

How about making THIS_MODULE an argument to register_netdevice()
and, hence, register_netdev() and init_etherdev()?

> Paul. (aka. monkey #937)

:)

2000-11-06 10:06:39

by Jeff Garzik

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

I would just rather clean up the drivers manually. There's no
substitute for the human brain, and there are usually some associated
cleanups you can take care of, while working on the primary task.

I'm much more concerned about the interface. We need to get that nailed
down and reviewed. Once DaveM and you and Keith are all happy with the
net_device module stuff, apply that patch. The drivers can be trivially
cleaned up. With the latest patch I've seen, there is no -need- to
immediately update the drivers. Once the patch is applied, I can clean
the drivers while I'm cleaning up request_region and the other stuff.

--
Jeff Garzik | Dinner is ready when
Building 1024 | the smoke alarm goes off.
MandrakeSoft | -/usr/games/fortune

2000-11-06 12:37:52

by Keith Owens

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

On Mon, 06 Nov 2000 05:05:42 -0500,
Jeff Garzik <[email protected]> wrote:
>With the latest patch I've seen, there is no -need- to
>immediately update the drivers. Once the patch is applied, I can clean
>the drivers while I'm cleaning up request_region and the other stuff.

I prefer a requirement that all net drivers upgrade to the new
interface, otherwise we have odd drivers using the old interface
forever and being at risk of module unload. That is why I coded my
patch as returning -ENODEV if there was no dev->open. However I have
to accept that just before a 2.4 release is not the best time to have a
flag day. Put it down for 2.5.

2000-11-06 12:50:15

by Jeff Garzik

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Keith Owens wrote:
>
> On Mon, 06 Nov 2000 05:05:42 -0500,
> Jeff Garzik <[email protected]> wrote:
> >With the latest patch I've seen, there is no -need- to
> >immediately update the drivers. Once the patch is applied, I can clean
> >the drivers while I'm cleaning up request_region and the other stuff.
>
> I prefer a requirement that all net drivers upgrade to the new
> interface, otherwise we have odd drivers using the old interface
> forever and being at risk of module unload. That is why I coded my
> patch as returning -ENODEV if there was no dev->open. However I have
> to accept that just before a 2.4 release is not the best time to have a
> flag day. Put it down for 2.5.

What is "it" that gets put off until 2.5? Breaking net drivers with an
interface upgrade, or eliminating this race?

I would prefer that 2.4.0 went out the door with a race-free netdev
interface.

Andrew's patch is nice and small, and doesn't -require- a driver
upgrade. We can upgrade the important drivers now, and then do all the
stinkbomb crapola drivers during the 2.4.x series or whenever.

There is absolutely no need to break drivers for this. Not only is it
needless pain, but doing so is inconsistent -- with struct
file_operations, I am free to have owner==NULL.

Jeff


--
Jeff Garzik | "When I do this, my computer freezes."
Building 1024 | -user
MandrakeSoft | "Don't do that."
| -level 1

2000-11-06 12:58:36

by Keith Owens

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

On Mon, 06 Nov 2000 07:49:00 -0500,
Jeff Garzik <[email protected]> wrote:
>Keith Owens wrote:
>> I prefer a requirement that all net drivers upgrade to the new
>> interface, otherwise we have odd drivers using the old interface
>> forever and being at risk of module unload. That is why I coded my
>> patch as returning -ENODEV if there was no dev->open. However I have
>> to accept that just before a 2.4 release is not the best time to have a
>> flag day. Put it down for 2.5.
>
>What is "it" that gets put off until 2.5? Breaking net drivers with an
>interface upgrade, or eliminating this race?

Forcing all network drivers to define a dev->open routine.

>There is absolutely no need to break drivers for this. Not only is it
>needless pain, but doing so is inconsistent -- with struct
>file_operations, I am free to have owner==NULL.

True, but if you set owner==NULL for something that is really in a
module then you are lying to the module layer. See foot, shoot foot.

2000-11-06 13:10:37

by Jeff Garzik

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

Keith Owens wrote:
>
> On Mon, 06 Nov 2000 07:49:00 -0500,
> Jeff Garzik <[email protected]> wrote:
> >Keith Owens wrote:
> >> I prefer a requirement that all net drivers upgrade to the new
> >> interface, otherwise we have odd drivers using the old interface
> >> forever and being at risk of module unload. That is why I coded my
> >> patch as returning -ENODEV if there was no dev->open. However I have
> >> to accept that just before a 2.4 release is not the best time to have a
> >> flag day. Put it down for 2.5.
> >
> >What is "it" that gets put off until 2.5? Breaking net drivers with an
> >interface upgrade, or eliminating this race?
>
> Forcing all network drivers to define a dev->open routine.
>
> >There is absolutely no need to break drivers for this. Not only is it
> >needless pain, but doing so is inconsistent -- with struct
> >file_operations, I am free to have owner==NULL.
>
> True, but if you set owner==NULL for something that is really in a
> module then you are lying to the module layer. See foot, shoot foot.

You are missing the point here. If a driver is "old style", where
owner==NULL and it manually calls MOD_{INC,DEC}_USE_COUNT, things are
pretty much ok. There is a tiny race, but the system is mostly intact.

For never drivers "that matter," we update them to set
net_device::owner. But to me breaking all the net drivers to force such
a change is silly. For such a tiny race, there just isn't a pressing
need to update the stinkbomb crapola drivers...

Jeff


--
Jeff Garzik | "When I do this, my computer freezes."
Building 1024 | -user
MandrakeSoft | "Don't do that."
| -level 1

2000-11-06 13:19:17

by Keith Owens

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

On Mon, 06 Nov 2000 08:09:28 -0500,
Jeff Garzik <[email protected]> wrote:
>You are missing the point here. If a driver is "old style", where
>owner==NULL and it manually calls MOD_{INC,DEC}_USE_COUNT, things are
>pretty much ok. There is a tiny race, but the system is mostly intact.

Small race * size of install base * auto unload frequency => too many
bug reports for my liking. But we agree that requiring dev->open is
2.5 material.

2000-11-08 01:08:45

by Rusty Russell

[permalink] [raw]
Subject: Re: Locking Between User Context and Soft IRQs in 2.4.0

In message <[email protected]> you write:
> Paul Gortmaker wrote:
> > - extern void ether_setup(struct net_device *dev);
> > + extern void __ether_setup(struct net_device *dev);
> > + static inline void ether_setup(struct net_device *dev){
> > + dev->owner = THIS_MODULE;
> > + __ether_setup(dev);
> > + }
> >
> > Ugh. Probably should just add it to each probe and be done with it...
>
> mm.. Seeing as failure to set dev->owner is a fatal mistake,
> it would be good to enforce this via the compiler type system.
>
> How about making THIS_MODULE an argument to register_netdevice()
> and, hence, register_netdev() and init_etherdev()?

Bear in mind that in 2.5, the THIS_MODULE registration cancer
infesting the kernel[1] will vanish with two-stage module delete[2].

http://www.wcug.wwu.edu/lists/netdev/200006/msg00250.html

Rusty.

[1] And getting worse.
[2] Which was the correct solution for 2.4, only I was all out of
`get out of code freeze free' cards.
--
Hacking time.