2003-07-29 13:03:24

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: [PATCH] airo driver: fix races, oops, etc..

Hi !

Here's a patch against Linus current airo.c, it adds back some fixes I
did during OLS on the previous version of this driver. I couldn't test
this new 'fixed' version though as I don't have the airo card anymore:

- Initialize the work_struct structures used by the driver
- Change most of schedule_work() to schedule_delayed_work(). The
problem with schedule_work() is that the worker_thread will never
schedule() if the work keeps getting added back to the list by the
callback, which typically happened with this driver when the xmit
work gets scheduled while the semaphore was used by a pending
command. Note that -ac tree has a modified version of this driver
that gets rid of this "over-smart" work queue stuff and uses normal
spinlock instead, probably at the expense of some latency...
- Fix a small signed vs. unsigned char issue
- Remove bogus pci_module_init(), use pci_register_driver() instead and
add missing pci_unregister_driver() so the module can now be removed
without leaving stale references (and thus avoid an oops next time
the driver list is walked by the device core).

Jeff, if you are ok with these, please send to Linus,

Ben


diff -urN linux-2.5/drivers/net/wireless/airo.c linuxppc-2.5-benh/drivers/net/wireless/airo.c
--- linux-2.5/drivers/net/wireless/airo.c 2003-07-29 08:51:06.000000000 -0400
+++ linuxppc-2.5-benh/drivers/net/wireless/airo.c 2003-07-29 08:54:26.000000000 -0400
@@ -633,7 +633,7 @@
u16 SSIDlen;
char SSID[32];
char apName[16];
- char bssid[4][ETH_ALEN];
+ unsigned char bssid[4][ETH_ALEN];
u16 beaconPeriod;
u16 dimPeriod;
u16 atimDuration;
@@ -1031,7 +1031,7 @@
struct work_struct promisc_task;
struct {
struct sk_buff *skb;
- int fid;
+ int fid, hardirq;
struct work_struct task;
} xmit, xmit11;
struct net_device *wifidev;
@@ -1348,7 +1348,12 @@
netif_stop_queue(dev);
priv->xmit.task.func = (void (*)(void *))airo_do_xmit;
priv->xmit.task.data = (void *)dev;
- schedule_work(&priv->xmit.task);
+ if (priv->xmit.hardirq) {
+ priv->xmit.hardirq = 0;
+ schedule_work(&priv->xmit.task);
+ return;
+ }
+ schedule_delayed_work(&priv->xmit.task, 1);
return;
}
status = transmit_802_3_packet (priv, fids[fid], skb->data);
@@ -1393,6 +1398,7 @@
fids[i] |= (len << 16);
priv->xmit.skb = skb;
priv->xmit.fid = i;
+ priv->xmit.hardirq = 1;
airo_do_xmit(dev);
}
return 0;
@@ -1410,7 +1416,12 @@
netif_stop_queue(dev);
priv->xmit11.task.func = (void (*)(void *))airo_do_xmit11;
priv->xmit11.task.data = (void *)dev;
- schedule_work(&priv->xmit11.task);
+ if (priv->xmit11.hardirq) {
+ priv->xmit11.hardirq = 0;
+ schedule_work(&priv->xmit11.task);
+ return;
+ }
+ schedule_delayed_work(&priv->xmit11.task, 1);
return;
}
status = transmit_802_11_packet (priv, fids[fid], skb->data);
@@ -1485,7 +1496,7 @@
} else {
ai->stats_task.func = (void (*)(void *))airo_read_stats;
ai->stats_task.data = (void *)ai;
- schedule_work(&ai->stats_task);
+ schedule_delayed_work(&ai->stats_task, 1);
}
}

@@ -1508,7 +1519,7 @@
} else {
ai->promisc_task.func = (void (*)(void *))airo_end_promisc;
ai->promisc_task.data = (void *)ai;
- schedule_work(&ai->promisc_task);
+ schedule_delayed_work(&ai->promisc_task, 1);
}
}

@@ -1524,7 +1535,7 @@
} else {
ai->promisc_task.func = (void (*)(void *))airo_set_promisc;
ai->promisc_task.data = (void *)ai;
- schedule_work(&ai->promisc_task);
+ schedule_delayed_work(&ai->promisc_task, 1);
}
}

@@ -1710,6 +1721,14 @@
sema_init(&ai->sem, 1);
ai->need_commit = 0;
ai->config.len = 0;
+ INIT_WORK(&ai->stats_task, NULL, NULL);
+ INIT_WORK(&ai->promisc_task, NULL, NULL);
+ INIT_WORK(&ai->xmit.task, NULL, NULL);
+ INIT_WORK(&ai->xmit11.task, NULL, NULL);
+ INIT_WORK(&ai->mic_task, NULL, NULL);
+#ifdef WIRELESS_EXT
+ INIT_WORK(&ai->event_task, NULL, NULL);
+#endif
rc = add_airo_dev( dev );
if (rc)
goto err_out_free;
@@ -1859,7 +1878,7 @@
} else {
ai->event_task.func = (void (*)(void *))airo_send_event;
ai->event_task.data = (void *)dev;
- schedule_work(&ai->event_task);
+ schedule_delayed_work(&ai->event_task, 1);
}
}
#endif
@@ -1876,7 +1895,7 @@
} else {
ai->mic_task.func = (void (*)(void *))airo_read_mic;
ai->mic_task.data = (void *)ai;
- schedule_work(&ai->mic_task);
+ schedule_delayed_work(&ai->mic_task, 1);
}
}

@@ -4090,7 +4109,7 @@

#ifdef CONFIG_PCI
printk( KERN_INFO "airo: Probing for PCI adapters\n" );
- rc = pci_module_init(&airo_driver);
+ rc = pci_register_driver(&airo_driver);
printk( KERN_INFO "airo: Finished probing for PCI adapters\n" );
#endif

@@ -4102,6 +4121,7 @@

static void __exit airo_cleanup_module( void )
{
+ pci_unregister_driver(&airo_driver);
while( airo_devices ) {
printk( KERN_INFO "airo: Unregistering %s\n", airo_devices->dev->name );
stop_airo_card( airo_devices->dev, 1 );
@@ -5160,7 +5180,7 @@
& status_rid.bssid[i][2]
& status_rid.bssid[i][3]
& status_rid.bssid[i][4]
- & status_rid.bssid[i][5])!=-1 &&
+ & status_rid.bssid[i][5])!=0xff &&
(status_rid.bssid[i][0]
| status_rid.bssid[i][1]
| status_rid.bssid[i][2]


2003-07-29 16:27:27

by Jean Tourrilhes

[permalink] [raw]
Subject: Re: [PATCH] airo driver: fix races, oops, etc..

Benjamin Herrenschmidt wrote :
>
> Here's a patch against Linus current airo.c, it adds back some fixes I
> did during OLS on the previous version of this driver. I couldn't test
> this new 'fixed' version though as I don't have the airo card anymore:
>
> - Initialize the work_struct structures used by the driver
> - Change most of schedule_work() to schedule_delayed_work(). The
> problem with schedule_work() is that the worker_thread will never
> schedule() if the work keeps getting added back to the list by the
> callback, which typically happened with this driver when the xmit
> work gets scheduled while the semaphore was used by a pending
> command. Note that -ac tree has a modified version of this driver
> that gets rid of this "over-smart" work queue stuff and uses normal
> spinlock instead, probably at the expense of some latency...
> - Fix a small signed vs. unsigned char issue
> - Remove bogus pci_module_init(), use pci_register_driver() instead and
> add missing pci_unregister_driver() so the module can now be removed
> without leaving stale references (and thus avoid an oops next time
> the driver list is walked by the device core).
>
> Jeff, if you are ok with these, please send to Linus,

Ben,

Would you mind sending your patch to Javier (who is the
current maintainer). Javier did some work lately to fix some of those
problems, and I think your patch collides with it.
Thanks...

Jean

2003-08-05 09:08:48

by Javier Achirica

[permalink] [raw]
Subject: Re: [PATCH] airo driver: fix races, oops, etc..


I've integrated this patch in my code. I've done a major change: Instead
of using schedule_delayed_work(), I create a new workqueue and use
queue_work() on that queue. As all tasks sleep in the same lock, I can
queue them there and make them sleep instead of requeueing them.

I haven't sent them to Jeff yet, as I want to do more testing. If you want
to help testing them, please tell me.

Javier Achirica

On 29 Jul 2003, Benjamin Herrenschmidt wrote:

> Hi !
>
> Here's a patch against Linus current airo.c, it adds back some fixes I
> did during OLS on the previous version of this driver. I couldn't test
> this new 'fixed' version though as I don't have the airo card anymore:
>
> - Initialize the work_struct structures used by the driver
> - Change most of schedule_work() to schedule_delayed_work(). The
> problem with schedule_work() is that the worker_thread will never
> schedule() if the work keeps getting added back to the list by the
> callback, which typically happened with this driver when the xmit
> work gets scheduled while the semaphore was used by a pending
> command. Note that -ac tree has a modified version of this driver
> that gets rid of this "over-smart" work queue stuff and uses normal
> spinlock instead, probably at the expense of some latency...
> - Fix a small signed vs. unsigned char issue
> - Remove bogus pci_module_init(), use pci_register_driver() instead and
> add missing pci_unregister_driver() so the module can now be removed
> without leaving stale references (and thus avoid an oops next time
> the driver list is walked by the device core).
>
> Jeff, if you are ok with these, please send to Linus,
>
> Ben
>
>
> diff -urN linux-2.5/drivers/net/wireless/airo.c linuxppc-2.5-benh/drivers/net/wireless/airo.c
> --- linux-2.5/drivers/net/wireless/airo.c 2003-07-29 08:51:06.000000000 -0400
> +++ linuxppc-2.5-benh/drivers/net/wireless/airo.c 2003-07-29 08:54:26.000000000 -0400
> @@ -633,7 +633,7 @@
> u16 SSIDlen;
> char SSID[32];
> char apName[16];
> - char bssid[4][ETH_ALEN];
> + unsigned char bssid[4][ETH_ALEN];
> u16 beaconPeriod;
> u16 dimPeriod;
> u16 atimDuration;
> @@ -1031,7 +1031,7 @@
> struct work_struct promisc_task;
> struct {
> struct sk_buff *skb;
> - int fid;
> + int fid, hardirq;
> struct work_struct task;
> } xmit, xmit11;
> struct net_device *wifidev;
> @@ -1348,7 +1348,12 @@
> netif_stop_queue(dev);
> priv->xmit.task.func = (void (*)(void *))airo_do_xmit;
> priv->xmit.task.data = (void *)dev;
> - schedule_work(&priv->xmit.task);
> + if (priv->xmit.hardirq) {
> + priv->xmit.hardirq = 0;
> + schedule_work(&priv->xmit.task);
> + return;
> + }
> + schedule_delayed_work(&priv->xmit.task, 1);
> return;
> }
> status = transmit_802_3_packet (priv, fids[fid], skb->data);
> @@ -1393,6 +1398,7 @@
> fids[i] |= (len << 16);
> priv->xmit.skb = skb;
> priv->xmit.fid = i;
> + priv->xmit.hardirq = 1;
> airo_do_xmit(dev);
> }
> return 0;
> @@ -1410,7 +1416,12 @@
> netif_stop_queue(dev);
> priv->xmit11.task.func = (void (*)(void *))airo_do_xmit11;
> priv->xmit11.task.data = (void *)dev;
> - schedule_work(&priv->xmit11.task);
> + if (priv->xmit11.hardirq) {
> + priv->xmit11.hardirq = 0;
> + schedule_work(&priv->xmit11.task);
> + return;
> + }
> + schedule_delayed_work(&priv->xmit11.task, 1);
> return;
> }
> status = transmit_802_11_packet (priv, fids[fid], skb->data);
> @@ -1485,7 +1496,7 @@
> } else {
> ai->stats_task.func = (void (*)(void *))airo_read_stats;
> ai->stats_task.data = (void *)ai;
> - schedule_work(&ai->stats_task);
> + schedule_delayed_work(&ai->stats_task, 1);
> }
> }
>
> @@ -1508,7 +1519,7 @@
> } else {
> ai->promisc_task.func = (void (*)(void *))airo_end_promisc;
> ai->promisc_task.data = (void *)ai;
> - schedule_work(&ai->promisc_task);
> + schedule_delayed_work(&ai->promisc_task, 1);
> }
> }
>
> @@ -1524,7 +1535,7 @@
> } else {
> ai->promisc_task.func = (void (*)(void *))airo_set_promisc;
> ai->promisc_task.data = (void *)ai;
> - schedule_work(&ai->promisc_task);
> + schedule_delayed_work(&ai->promisc_task, 1);
> }
> }
>
> @@ -1710,6 +1721,14 @@
> sema_init(&ai->sem, 1);
> ai->need_commit = 0;
> ai->config.len = 0;
> + INIT_WORK(&ai->stats_task, NULL, NULL);
> + INIT_WORK(&ai->promisc_task, NULL, NULL);
> + INIT_WORK(&ai->xmit.task, NULL, NULL);
> + INIT_WORK(&ai->xmit11.task, NULL, NULL);
> + INIT_WORK(&ai->mic_task, NULL, NULL);
> +#ifdef WIRELESS_EXT
> + INIT_WORK(&ai->event_task, NULL, NULL);
> +#endif
> rc = add_airo_dev( dev );
> if (rc)
> goto err_out_free;
> @@ -1859,7 +1878,7 @@
> } else {
> ai->event_task.func = (void (*)(void *))airo_send_event;
> ai->event_task.data = (void *)dev;
> - schedule_work(&ai->event_task);
> + schedule_delayed_work(&ai->event_task, 1);
> }
> }
> #endif
> @@ -1876,7 +1895,7 @@
> } else {
> ai->mic_task.func = (void (*)(void *))airo_read_mic;
> ai->mic_task.data = (void *)ai;
> - schedule_work(&ai->mic_task);
> + schedule_delayed_work(&ai->mic_task, 1);
> }
> }
>
> @@ -4090,7 +4109,7 @@
>
> #ifdef CONFIG_PCI
> printk( KERN_INFO "airo: Probing for PCI adapters\n" );
> - rc = pci_module_init(&airo_driver);
> + rc = pci_register_driver(&airo_driver);
> printk( KERN_INFO "airo: Finished probing for PCI adapters\n" );
> #endif
>
> @@ -4102,6 +4121,7 @@
>
> static void __exit airo_cleanup_module( void )
> {
> + pci_unregister_driver(&airo_driver);
> while( airo_devices ) {
> printk( KERN_INFO "airo: Unregistering %s\n", airo_devices->dev->name );
> stop_airo_card( airo_devices->dev, 1 );
> @@ -5160,7 +5180,7 @@
> & status_rid.bssid[i][2]
> & status_rid.bssid[i][3]
> & status_rid.bssid[i][4]
> - & status_rid.bssid[i][5])!=-1 &&
> + & status_rid.bssid[i][5])!=0xff &&
> (status_rid.bssid[i][0]
> | status_rid.bssid[i][1]
> | status_rid.bssid[i][2]
>
>
>

2003-08-05 09:48:50

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] airo driver: fix races, oops, etc..

On Tue, 2003-08-05 at 10:53, Javier Achirica wrote:
> I've integrated this patch in my code. I've done a major change: Instead
> of using schedule_delayed_work(), I create a new workqueue and use
> queue_work() on that queue. As all tasks sleep in the same lock, I can
> queue them there and make them sleep instead of requeueing them.
>
> I haven't sent them to Jeff yet, as I want to do more testing. If you want
> to help testing them, please tell me.

Well... creating a work queue means you create one thread per CPU, that
sucks a bit don't think ? Maybe using a single thread for the driver
with your own queuing primitives...

Ben.

2003-08-05 10:19:21

by Javier Achirica

[permalink] [raw]
Subject: Re: [PATCH] airo driver: fix races, oops, etc..



On 5 Aug 2003, Benjamin Herrenschmidt wrote:

> On Tue, 2003-08-05 at 10:53, Javier Achirica wrote:
> > I've integrated this patch in my code. I've done a major change: Instead
> > of using schedule_delayed_work(), I create a new workqueue and use
> > queue_work() on that queue. As all tasks sleep in the same lock, I can
> > queue them there and make them sleep instead of requeueing them.
> >
> > I haven't sent them to Jeff yet, as I want to do more testing. If you want
> > to help testing them, please tell me.
>
> Well... creating a work queue means you create one thread per CPU, that
> sucks a bit don't think ? Maybe using a single thread for the driver
> with your own queuing primitives...

You're right. In my PC, with just one CPU, it doesn't make a difference
:-)

Anyway, I took a look at the workqueue code and looks like I only need a
"create_workqueue()" that only creates one thread. This is what
create_workqueue_thread() does, but it's static. Oh, well. I can also
destroy all threads but one or, as you say, write my own code. I have to
think about it....

Javier Achirica



2003-08-07 07:50:37

by Javier Achirica

[permalink] [raw]
Subject: Re: [PATCH] airo driver: fix races, oops, etc..



On 5 Aug 2003, Benjamin Herrenschmidt wrote:

> On Tue, 2003-08-05 at 10:53, Javier Achirica wrote:
> > I've integrated this patch in my code. I've done a major change: Instead
> > of using schedule_delayed_work(), I create a new workqueue and use
> > queue_work() on that queue. As all tasks sleep in the same lock, I can
> > queue them there and make them sleep instead of requeueing them.
> >
> > I haven't sent them to Jeff yet, as I want to do more testing. If you want
> > to help testing them, please tell me.
>
> Well... creating a work queue means you create one thread per CPU, that
> sucks a bit don't think ? Maybe using a single thread for the driver
> with your own queuing primitives...

I've been studying the problem for a while and I've implemented a solution
using a single kernel thread and a wait queue for synchronization. I've
tested it and looks like it works fine. It can be used both in 2.4
and 2.6 kernels. Before submitting a patch with it I'd like someone with
experience in this kind of code to take a look at it just in case I'm
doing something dumb. Jeff? :-)

Javier Achirica

2003-08-07 14:53:24

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] airo driver: fix races, oops, etc..

Javier Achirica wrote:
> I've been studying the problem for a while and I've implemented a solution
> using a single kernel thread and a wait queue for synchronization. I've
> tested it and looks like it works fine. It can be used both in 2.4
> and 2.6 kernels. Before submitting a patch with it I'd like someone with
> experience in this kind of code to take a look at it just in case I'm
> doing something dumb. Jeff? :-)


Unless the patch is huge (100K or more), post it to linux-kernel, and CC
it to me and Benjamin Herrenschmidt.

Jeff



2003-08-08 09:08:46

by Javier Achirica

[permalink] [raw]
Subject: Re: [PATCH] airo driver: fix races, oops, etc..



On Thu, 7 Aug 2003, Jeff Garzik wrote:

> Javier Achirica wrote:
> > I've been studying the problem for a while and I've implemented a solution
> > using a single kernel thread and a wait queue for synchronization. I've
> > tested it and looks like it works fine. It can be used both in 2.4
> > and 2.6 kernels. Before submitting a patch with it I'd like someone with
> > experience in this kind of code to take a look at it just in case I'm
> > doing something dumb. Jeff? :-)
>
> Unless the patch is huge (100K or more), post it to linux-kernel, and CC
> it to me and Benjamin Herrenschmidt.

Here it goes. This version should be applied to the latest patches I sent
you in the 2.4 tree. Comments are welcome.

Thanks,
Javier Achirica


Attachments:
thread.diff (31.49 kB)