Experience with the rtl8187 driver has shown that mac80211 can make
calls to the config callback routine in rapid succession. This patch
creates a mutex that protects the private data in several of the routines
called by mac80211.
Signed-off-by: Larry Finger <[email protected]>
---
John,
I think this is 2.6.27 material. It is a definite bug, but I cannot point
to a single bug report implicating this lack of protection as the cause.
Your call.
Larry
---
Index: linux-2.6/drivers/net/wireless/p54/p54.h
===================================================================
--- linux-2.6.orig/drivers/net/wireless/p54/p54.h
+++ linux-2.6/drivers/net/wireless/p54/p54.h
@@ -52,6 +52,7 @@ struct p54_common {
int (*open)(struct ieee80211_hw *dev);
void (*stop)(struct ieee80211_hw *dev);
int mode;
+ struct mutex conf_mutex;
u8 mac_addr[ETH_ALEN];
u8 bssid[ETH_ALEN];
struct pda_iq_autocal_entry *iq_autocal;
Index: linux-2.6/drivers/net/wireless/p54/p54common.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/p54/p54common.c
+++ linux-2.6/drivers/net/wireless/p54/p54common.c
@@ -886,9 +886,12 @@ static void p54_remove_interface(struct
static int p54_config(struct ieee80211_hw *dev, struct ieee80211_conf *conf)
{
int ret;
+ struct p54_common *priv = dev->priv;
+ mutex_lock(&priv->conf_mutex);
ret = p54_set_freq(dev, cpu_to_le16(conf->channel->center_freq));
p54_set_vdcf(dev);
+ mutex_unlock(&priv->conf_mutex);
return ret;
}
@@ -898,10 +901,12 @@ static int p54_config_interface(struct i
{
struct p54_common *priv = dev->priv;
+ mutex_lock(&priv->conf_mutex);
p54_set_filter(dev, 0, priv->mac_addr, conf->bssid, 0, 1, 0, 0xF642);
p54_set_filter(dev, 0, priv->mac_addr, conf->bssid, 2, 0, 0, 0);
p54_set_leds(dev, 1, !is_multicast_ether_addr(conf->bssid), 0);
memcpy(priv->bssid, conf->bssid, ETH_ALEN);
+ mutex_unlock(&priv->conf_mutex);
return 0;
}
@@ -1009,6 +1014,7 @@ struct ieee80211_hw *p54_init_common(siz
}
p54_init_vdcf(dev);
+ mutex_init(&priv->conf_mutex);
return dev;
}
On Monday 04 August 2008 00:58:36 Larry Finger wrote:
> Experience with the rtl8187 driver has shown that mac80211 can make
> calls to the config callback routine in rapid succession. This patch
> creates a mutex that protects the private data in several of the routines
> called by mac80211.
>
> Signed-off-by: Larry Finger <[email protected]>
> ---
>
> John,
>
> I think this is 2.6.27 material. It is a definite bug, but I cannot point
> to a single bug report implicating this lack of protection as the cause.
> Your call.
>
> Larry
Well the reason why there isn't any bug report about it is maybe
because unlike most other devices we don't program one single
setting per time, but always a "group of similar ones" at once so
the device should always be in a sane state in theory.
So as long as mac80211 provides at least some kind of protection
against it's own concurrency, we are in save waters even without
any fancy kind of locking.
Regards,
Chr
On Fri, 2008-08-22 at 21:57 +0200, Chr wrote:
> On Thursday 07 August 2008 02:56:09 Larry Finger wrote:
> > Chr wrote:
> > > Well the reason why there isn't any bug report about it is maybe
> > > because unlike most other devices we don't program one single
> > > setting per time, but always a "group of similar ones" at once so
> > > the device should always be in a sane state in theory.
> > >
> > > So as long as mac80211 provides at least some kind of protection
> > > against it's own concurrency, we are in save waters even without
> > > any fancy kind of locking.
> > >
> > > Regards,
> > > Chr
> >
> > For the config callback, mac80211 does not protect against concurrency. We
> > learned that with rtl8187, where this routine ran somewhat longer. After
> > the power setting routine was added to the wext interface in mac80211, the
> > driver broke due to concurrent calls to the config routine and was fixed by
> > just this kind of mutex, which is why I added this protection for p54.
> >
> I just got a reply from an anonymous p54pci (2.6.27-rc4) user, with the
> following problem:
>
> INFO: task prism54pci:10881 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> prism54pci D 0000000000000001 0 10881 2
> ffff8800780dbde0 0000000000000046 0000000000000000 ffffffff802375cb
> ffffffff80778740 0000000000000286 ffff88007c632f38 ffff88007c5d8b40
> ffffffff806c2330 ffff88007c5d8d70 0000000001018740 ffff88007c5d8d70
> Call Trace:
> [<ffffffff802375cb>] __mod_timer+0xb7/0xc5
> [<ffffffff802297d7>] dequeue_task_fair+0x4d/0xce
> [<ffffffff805446ff>] __mutex_lock_slowpath+0x6a/0xa0
> [<ffffffff805445a9>] mutex_lock+0x1a/0x1e
> [<ffffffffa01d48f0>] p54_config+0x1a/0x46 [p54common]
> [<ffffffffa01a7e73>] ieee80211_sta_scan_work+0xf8/0x1b8 [mac80211]
> [<ffffffffa01a7d7b>] ieee80211_sta_scan_work+0x0/0x1b8 [mac80211]
> [<ffffffff8023d158>] run_workqueue+0x79/0xfe
> [<ffffffff8023d42e>] worker_thread+0x96/0xa5
> [<ffffffff8024056c>] autoremove_wake_function+0x0/0x2e
> [<ffffffff8023d398>] worker_thread+0x0/0xa5
> [<ffffffff8024025a>] kthread+0x47/0x73
> [<ffffffff8022c919>] schedule_tail+0x27/0x5f
> [<ffffffff8020c279>] child_rip+0xa/0x11
> [<ffffffff80240213>] kthread+0x0/0x73
> [<ffffffff8020c26f>] child_rip+0x0/0x11
>
> I have no idea going on here since locking is so simple here that it
> shouldn't misbehave at all!?!
Maybe some error path is missing an unlock or something simple like
that? I haven't got a clue.
johannes
On Thursday 07 August 2008 02:56:09 Larry Finger wrote:
> Chr wrote:
> > Well the reason why there isn't any bug report about it is maybe
> > because unlike most other devices we don't program one single
> > setting per time, but always a "group of similar ones" at once so
> > the device should always be in a sane state in theory.
> >
> > So as long as mac80211 provides at least some kind of protection
> > against it's own concurrency, we are in save waters even without
> > any fancy kind of locking.
> >
> > Regards,
> > Chr
>
> For the config callback, mac80211 does not protect against concurrency. We
> learned that with rtl8187, where this routine ran somewhat longer. After
> the power setting routine was added to the wext interface in mac80211, the
> driver broke due to concurrent calls to the config routine and was fixed by
> just this kind of mutex, which is why I added this protection for p54.
>
I just got a reply from an anonymous p54pci (2.6.27-rc4) user, with the
following problem:
INFO: task prism54pci:10881 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
prism54pci D 0000000000000001 0 10881 2
ffff8800780dbde0 0000000000000046 0000000000000000 ffffffff802375cb
ffffffff80778740 0000000000000286 ffff88007c632f38 ffff88007c5d8b40
ffffffff806c2330 ffff88007c5d8d70 0000000001018740 ffff88007c5d8d70
Call Trace:
[<ffffffff802375cb>] __mod_timer+0xb7/0xc5
[<ffffffff802297d7>] dequeue_task_fair+0x4d/0xce
[<ffffffff805446ff>] __mutex_lock_slowpath+0x6a/0xa0
[<ffffffff805445a9>] mutex_lock+0x1a/0x1e
[<ffffffffa01d48f0>] p54_config+0x1a/0x46 [p54common]
[<ffffffffa01a7e73>] ieee80211_sta_scan_work+0xf8/0x1b8 [mac80211]
[<ffffffffa01a7d7b>] ieee80211_sta_scan_work+0x0/0x1b8 [mac80211]
[<ffffffff8023d158>] run_workqueue+0x79/0xfe
[<ffffffff8023d42e>] worker_thread+0x96/0xa5
[<ffffffff8024056c>] autoremove_wake_function+0x0/0x2e
[<ffffffff8023d398>] worker_thread+0x0/0xa5
[<ffffffff8024025a>] kthread+0x47/0x73
[<ffffffff8022c919>] schedule_tail+0x27/0x5f
[<ffffffff8020c279>] child_rip+0xa/0x11
[<ffffffff80240213>] kthread+0x0/0x73
[<ffffffff8020c26f>] child_rip+0x0/0x11
I have no idea going on here since locking is so simple here that it
shouldn't misbehave at all!?!
Regards,
Chr
Chr wrote:
>
> Well the reason why there isn't any bug report about it is maybe
> because unlike most other devices we don't program one single
> setting per time, but always a "group of similar ones" at once so
> the device should always be in a sane state in theory.
>
> So as long as mac80211 provides at least some kind of protection
> against it's own concurrency, we are in save waters even without
> any fancy kind of locking.
>
> Regards,
> Chr
For the config callback, mac80211 does not protect against concurrency. We
learned that with rtl8187, where this routine ran somewhat longer. After the
power setting routine was added to the wext interface in mac80211, the driver
broke due to concurrent calls to the config routine and was fixed by just this
kind of mutex, which is why I added this protection for p54.
Larry
Chr wrote:
> On Thursday 07 August 2008 02:56:09 Larry Finger wrote:
>> Chr wrote:
>>> Well the reason why there isn't any bug report about it is maybe
>>> because unlike most other devices we don't program one single
>>> setting per time, but always a "group of similar ones" at once so
>>> the device should always be in a sane state in theory.
>>>
>>> So as long as mac80211 provides at least some kind of protection
>>> against it's own concurrency, we are in save waters even without
>>> any fancy kind of locking.
>>>
>>> Regards,
>>> Chr
>> For the config callback, mac80211 does not protect against concurrency. We
>> learned that with rtl8187, where this routine ran somewhat longer. After
>> the power setting routine was added to the wext interface in mac80211, the
>> driver broke due to concurrent calls to the config routine and was fixed by
>> just this kind of mutex, which is why I added this protection for p54.
>>
> I just got a reply from an anonymous p54pci (2.6.27-rc4) user, with the
> following problem:
>
> INFO: task prism54pci:10881 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> prism54pci D 0000000000000001 0 10881 2
> ffff8800780dbde0 0000000000000046 0000000000000000 ffffffff802375cb
> ffffffff80778740 0000000000000286 ffff88007c632f38 ffff88007c5d8b40
> ffffffff806c2330 ffff88007c5d8d70 0000000001018740 ffff88007c5d8d70
> Call Trace:
> [<ffffffff802375cb>] __mod_timer+0xb7/0xc5
> [<ffffffff802297d7>] dequeue_task_fair+0x4d/0xce
> [<ffffffff805446ff>] __mutex_lock_slowpath+0x6a/0xa0
> [<ffffffff805445a9>] mutex_lock+0x1a/0x1e
> [<ffffffffa01d48f0>] p54_config+0x1a/0x46 [p54common]
> [<ffffffffa01a7e73>] ieee80211_sta_scan_work+0xf8/0x1b8 [mac80211]
> [<ffffffffa01a7d7b>] ieee80211_sta_scan_work+0x0/0x1b8 [mac80211]
> [<ffffffff8023d158>] run_workqueue+0x79/0xfe
> [<ffffffff8023d42e>] worker_thread+0x96/0xa5
> [<ffffffff8024056c>] autoremove_wake_function+0x0/0x2e
> [<ffffffff8023d398>] worker_thread+0x0/0xa5
> [<ffffffff8024025a>] kthread+0x47/0x73
> [<ffffffff8022c919>] schedule_tail+0x27/0x5f
> [<ffffffff8020c279>] child_rip+0xa/0x11
> [<ffffffff80240213>] kthread+0x0/0x73
> [<ffffffff8020c26f>] child_rip+0x0/0x11
>
> I have no idea going on here since locking is so simple here that it
> shouldn't misbehave at all!?!
>
> Regards,
> Chr
>
With only one spinlock that is set in p54_assign_addresses() and one
mutex that is set in p54_config() and in p54_config_interface(), there
are not too many possibilities. I don't see any paths that miss an
unlock. The only explanation I can see is the following:
There has to be an entry to p54_config() with the spinlock already
set. This call locks the mutex and proceeds to call p54_set_freq(),
which in turn calls p54_assign_addresses(). Execution would be held at
the spin_lock.
If p54_config is called again, the mutex is already locked, which
would lead to the bug reported. We know that mac80211 can call the
config routine before completion of the previous entry. I don't
understand why p54_assign_addresses() does not run to completion and
clear the spinlock.
I'm sure this explanation is wrong, so please shoot it down.
Larry