2013-06-04 20:22:59

by Johannes Berg

[permalink] [raw]
Subject: [PATCH] cfg80211: fix potential deadlock regression

From: Johannes Berg <[email protected]>

My big locking cleanups caused a problem by registering the
rfkill instance with the RTNL held, while the callback also
acquires the RTNL. This potentially causes a deadlock since
the two locks used (rfkill mutex and RTNL) can be acquired
in two different orders. Fix this by (un)registering rfkill
without holding the RTNL. This needs to be done after the
device struct is registered, but that can also be done w/o
holding the RTNL.

Signed-off-by: Johannes Berg <[email protected]>
---
net/wireless/core.c | 23 ++++++++---------------
1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/net/wireless/core.c b/net/wireless/core.c
index 221e76b..99d86dd 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -555,14 +555,18 @@ int wiphy_register(struct wiphy *wiphy)
/* check and set up bitrates */
ieee80211_set_bitrate_flags(wiphy);

- rtnl_lock();

res = device_add(&rdev->wiphy.dev);
+ if (res)
+ return res;
+
+ res = rfkill_register(rdev->rfkill);
if (res) {
- rtnl_unlock();
+ device_del(&rdev->wiphy.dev);
return res;
}

+ rtnl_lock();
/* set up regulatory info */
wiphy_regulatory_register(wiphy);

@@ -589,17 +593,6 @@ int wiphy_register(struct wiphy *wiphy)

cfg80211_debugfs_rdev_add(rdev);

- res = rfkill_register(rdev->rfkill);
- if (res) {
- device_del(&rdev->wiphy.dev);
-
- debugfs_remove_recursive(rdev->wiphy.debugfsdir);
- list_del_rcu(&rdev->list);
- wiphy_regulatory_deregister(wiphy);
- rtnl_unlock();
- return res;
- }
-
rdev->wiphy.registered = true;
rtnl_unlock();
return 0;
@@ -636,11 +629,11 @@ void wiphy_unregister(struct wiphy *wiphy)
rtnl_unlock();
__count == 0; }));

+ rfkill_unregister(rdev->rfkill);
+
rtnl_lock();
rdev->wiphy.registered = false;

- rfkill_unregister(rdev->rfkill);
-
BUG_ON(!list_empty(&rdev->wdev_list));

/*
--
1.8.0



2013-08-30 12:23:12

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: fix potential deadlock regression

On Wed, 2013-08-28 at 21:49 +0200, Maxime Bizon wrote:
> On Tue, 2013-06-04 at 22:22 +0200, Johannes Berg wrote:
>
> > - rtnl_lock();
> >
> > res = device_add(&rdev->wiphy.dev);
> > + if (res)
> > + return res;
>
> I just ran across a regression caused by this commit

Sorry, yeah ... this was the locking cleanups, we did get this right
before ...

> I'm again getting uevent notifications for wireless devices that are not
> yet properly registered (ENODEV on NL80211 when using sysfs phy id)
>
> I originally fixed the bug by taking the cfg80211 mutex across the
> whole registration:

[...]

Yeah, but then I got rid of the cfg80211_mutex :)

> It does not seem we can reverse the rfkill_register() and device_add()
> because wiphy dev is a parent of rfkill dev.
>
> any idea to fix this ?

I think this should be OK:
http://p.sipsolutions.net/28cf9ed446845440.txt, can you try?

johannes


2013-08-28 19:58:45

by Maxime Bizon

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: fix potential deadlock regression


On Tue, 2013-06-04 at 22:22 +0200, Johannes Berg wrote:

> - rtnl_lock();
>
> res = device_add(&rdev->wiphy.dev);
> + if (res)
> + return res;

I just ran across a regression caused by this commit

I'm again getting uevent notifications for wireless devices that are not
yet properly registered (ENODEV on NL80211 when using sysfs phy id)

I originally fixed the bug by taking the cfg80211 mutex across the
whole registration:

commit 5a652052fedbd7869572c757dd2ffc2ed420c69d
Author: Maxime Bizon <[email protected]>
Date: Wed Jul 21 17:21:38 2010 +0200

cfg80211: fix race between sysfs and cfg80211

device_add() is called before adding the phy to the cfg80211 device
list.

So if a userspace program uses sysfs uevents to detect new phy
devices, and queries nl80211 to get phy info, it can get ENODEV even
though the phy exists in sysfs.

An easy workaround is to hold the cfg80211 mutex until the phy is
present in sysfs/cfg80211/debugfs.


It does not seem we can reverse the rfkill_register() and device_add()
because wiphy dev is a parent of rfkill dev.

any idea to fix this ?

Thanks,

--
Maxime



2013-09-02 10:26:45

by Maxime Bizon

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: fix potential deadlock regression


On Fri, 2013-08-30 at 14:23 +0200, Johannes Berg wrote:

> I think this should be OK:
> http://p.sipsolutions.net/28cf9ed446845440.txt, can you try?

Works for me

Thanks

--
Maxime