LinuxLists.cc - [PATCH v2 1/2] mac80211: fix mesh deadlock

2013-06-10 20:19:50

Subject: [PATCH v2 1/2] mac80211: fix mesh deadlock

The patch "cfg80211/mac80211: use cfg80211 wdev mutex in
mac80211" introduced several deadlocks by converting the
ifmsh->mtx to wdev->mtx. Solve these by:

1. drop the cancel_work_sync() in ieee80211_stop_mesh().
Instead make the mesh work conditional on whether the mesh
is running or not.
2. lock the mesh work with sdata_lock() to protect beacon
updates and prevent races with wdev->mesh_id_len or
cfg80211.

Signed-off-by: Thomas Pedersen <[email protected]>
---
net/mac80211/mesh.c | 29 +++++++++++++++++------------
net/mac80211/mesh_plink.c | 7 +------
2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c
index 73a597b..d5faf91 100644
--- a/net/mac80211/mesh.c
+++ b/net/mac80211/mesh.c
@@ -579,9 +579,7 @@ static void ieee80211_mesh_housekeeping(struct ieee80211_sub_if_data *sdata)
mesh_path_expire(sdata);

changed = mesh_accept_plinks_update(sdata);
- sdata_lock(sdata);
ieee80211_mbss_info_change_notify(sdata, changed);
- sdata_unlock(sdata);

mod_timer(&ifmsh->housekeeping_timer,
round_jiffies(jiffies +
@@ -788,12 +786,10 @@ void ieee80211_stop_mesh(struct ieee80211_sub_if_data *sdata)
sdata->vif.bss_conf.enable_beacon = false;
clear_bit(SDATA_STATE_OFFCHANNEL_BEACON_STOPPED, &sdata->state);
ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_BEACON_ENABLED);
- sdata_lock(sdata);
bcn = rcu_dereference_protected(ifmsh->beacon,
lockdep_is_held(&sdata->wdev.mtx));
rcu_assign_pointer(ifmsh->beacon, NULL);
kfree_rcu(bcn, rcu_head);
- sdata_unlock(sdata);

/* flush STAs and mpaths on this iface */
sta_info_flush(sdata);
@@ -806,14 +802,6 @@ void ieee80211_stop_mesh(struct ieee80211_sub_if_data *sdata)
del_timer_sync(&sdata->u.mesh.housekeeping_timer);
del_timer_sync(&sdata->u.mesh.mesh_path_root_timer);
del_timer_sync(&sdata->u.mesh.mesh_path_timer);
- /*
- * If the timer fired while we waited for it, it will have
- * requeued the work. Now the work will be running again
- * but will not rearm the timer again because it checks
- * whether the interface is running, which, at this point,
- * it no longer is.
- */
- cancel_work_sync(&sdata->work);

local->fif_other_bss--;
atomic_dec(&local->iff_allmultis);
@@ -954,6 +942,12 @@ void ieee80211_mesh_rx_queued_mgmt(struct ieee80211_sub_if_data *sdata,
struct ieee80211_mgmt *mgmt;
u16 stype;

+ sdata_lock(sdata);
+
+ /* mesh already went down */
+ if (!sdata->wdev.mesh_id_len)
+ goto out;
+
rx_status = IEEE80211_SKB_RXCB(skb);
mgmt = (struct ieee80211_mgmt *) skb->data;
stype = le16_to_cpu(mgmt->frame_control) & IEEE80211_FCTL_STYPE;
@@ -971,12 +965,20 @@ void ieee80211_mesh_rx_queued_mgmt(struct ieee80211_sub_if_data *sdata,
ieee80211_mesh_rx_mgmt_action(sdata, mgmt, skb->len, rx_status);
break;
}
+out:
+ sdata_unlock(sdata);
}

void ieee80211_mesh_work(struct ieee80211_sub_if_data *sdata)
{
struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh;

+ sdata_lock(sdata);
+
+ /* mesh already went down */
+ if (!sdata->wdev.mesh_id_len)
+ goto out;
+
if (ifmsh->preq_queue_len &&
time_after(jiffies,
ifmsh->last_preq + msecs_to_jiffies(ifmsh->mshcfg.dot11MeshHWMPpreqMinInterval)))
@@ -996,6 +998,9 @@ void ieee80211_mesh_work(struct ieee80211_sub_if_data *sdata)

if (test_and_clear_bit(MESH_WORK_DRIFT_ADJUST, &ifmsh->wrkq_flags))
mesh_sync_adjust_tbtt(sdata);
+
+out:
+ sdata_unlock(sdata);
}

void ieee80211_mesh_notify_scan_completed(struct ieee80211_local *local)
diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c
index 6c4da99..09bebed 100644
--- a/net/mac80211/mesh_plink.c
+++ b/net/mac80211/mesh_plink.c
@@ -517,9 +517,7 @@ void mesh_neighbour_update(struct ieee80211_sub_if_data *sdata,
ieee80211_mps_frame_release(sta, elems);
out:
rcu_read_unlock();
- sdata_lock(sdata);
ieee80211_mbss_info_change_notify(sdata, changed);
- sdata_unlock(sdata);
}

static void mesh_plink_timer(unsigned long data)
@@ -1070,9 +1068,6 @@ void mesh_rx_plink_frame(struct ieee80211_sub_if_data *sdata,

rcu_read_unlock();

- if (changed) {
- sdata_lock(sdata);
+ if (changed)
ieee80211_mbss_info_change_notify(sdata, changed);
- sdata_unlock(sdata);
- }
}
--
1.7.10.4

2013-06-11 11:30:21

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] mac80211: update mesh beacon on workqueue

On Mon, 2013-06-10 at 13:17 -0700, Thomas Pedersen wrote:

> + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh;
> + u32 bit;
> +
> + /* if we race with running work, worst case this work becomes a noop */
> + for_each_set_bit(bit, (unsigned long *)&changed,
> + sizeof(changed) * BITS_PER_BYTE)

This isn't valid, it happens to work on little endian platforms but will
fail on big endian 64-bit ones, because you have this in memory (0 is
the lowest order nibble):

76 54 32 10 -- -- -- --

and now you point an unsigned long pointer to it, so you interpret the
"--" as the lowest bits.

More generally, I'd argue that mesh is being a bit odd here, flushing
stations turing mesh stop can and will actually cause a BSS info update
after the mesh interface has already been stopped (beaconing has been
disabled in the driver.) This seems rather odd. Maybe it would be better
to move the beacon update out of mesh_sta_cleanup() and into
ieee80211_mesh_housekeeping() in some way? Although it'd also have to be
done in the station handling in cfg.c but that shouldn't be a problem?

Note also that the way you did this is rather odd, ieee80211_stop_mesh()
could cause to schedule out to the workqueue for the update, but then
the update won't happen. It's a bit racy though, because you could stop
and restart the mesh and then the workqueue runs or something? Overall
this approach seems a bit brittle?

johannes

2013-06-10 20:19:52

by Thomas Pedersen

[permalink] [raw]

Subject: [PATCH v2 2/2] mac80211: update mesh beacon on workqueue

Fixes yet another deadlock on calling sta_info_flush()
with the sdata_lock() held. Should make it easier to
reason about locking in the future, since the sdata_lock()
is now held on all mesh work.

Signed-off-by: Thomas Pedersen <[email protected]>
---

v2:
read all changed bits & drop macro (Johannes)

net/mac80211/ieee80211_i.h | 1 +
net/mac80211/mesh.c | 44 ++++++++++++++++++++++++++++++++------------
net/mac80211/mesh.h | 2 ++
3 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 7a6f1a0..f79156d 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -543,6 +543,7 @@ struct ieee80211_if_mesh {
struct timer_list mesh_path_root_timer;

unsigned long wrkq_flags;
+ unsigned long mbss_changed;

u8 mesh_id[IEEE80211_MAX_MESH_ID_LEN];
size_t mesh_id_len;
diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c
index d5faf91..2499679 100644
--- a/net/mac80211/mesh.c
+++ b/net/mac80211/mesh.c
@@ -161,11 +161,8 @@ void mesh_sta_cleanup(struct sta_info *sta)
del_timer_sync(&sta->plink_timer);
}

- if (changed) {
- sdata_lock(sdata);
+ if (changed)
ieee80211_mbss_info_change_notify(sdata, changed);
- sdata_unlock(sdata);
- }
}

int mesh_rmc_init(struct ieee80211_sub_if_data *sdata)
@@ -719,14 +716,15 @@ ieee80211_mesh_rebuild_beacon(struct ieee80211_sub_if_data *sdata)
void ieee80211_mbss_info_change_notify(struct ieee80211_sub_if_data *sdata,
u32 changed)
{
- if (sdata->vif.bss_conf.enable_beacon &&
- (changed & (BSS_CHANGED_BEACON |
- BSS_CHANGED_HT |
- BSS_CHANGED_BASIC_RATES |
- BSS_CHANGED_BEACON_INT)))
- if (ieee80211_mesh_rebuild_beacon(sdata))
- return;
- ieee80211_bss_info_change_notify(sdata, changed);
+ struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh;
+ u32 bit;
+
+ /* if we race with running work, worst case this work becomes a noop */
+ for_each_set_bit(bit, (unsigned long *)&changed,
+ sizeof(changed) * BITS_PER_BYTE)
+ set_bit(BIT(bit), &ifmsh->mbss_changed);
+ set_bit(MESH_WORK_MBSS_CHANGED, &ifmsh->wrkq_flags);
+ ieee80211_queue_work(&sdata->local->hw, &sdata->work);
}

int ieee80211_start_mesh(struct ieee80211_sub_if_data *sdata)
@@ -969,6 +967,26 @@ out:
sdata_unlock(sdata);
}

+static void mesh_bss_info_changed(struct ieee80211_sub_if_data *sdata)
+{
+ struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh;
+ u32 bit, changed = 0;
+
+ for_each_set_bit(bit, (unsigned long *)&ifmsh->mbss_changed,
+ sizeof(changed) * BITS_PER_BYTE)
+ changed |= test_and_clear_bit(BIT(bit), &ifmsh->mbss_changed);
+
+ if (sdata->vif.bss_conf.enable_beacon &&
+ (changed & (BSS_CHANGED_BEACON |
+ BSS_CHANGED_HT |
+ BSS_CHANGED_BASIC_RATES |
+ BSS_CHANGED_BEACON_INT)))
+ if (ieee80211_mesh_rebuild_beacon(sdata))
+ return;
+
+ ieee80211_bss_info_change_notify(sdata, changed);
+}
+
void ieee80211_mesh_work(struct ieee80211_sub_if_data *sdata)
{
struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh;
@@ -999,6 +1017,8 @@ void ieee80211_mesh_work(struct ieee80211_sub_if_data *sdata)
if (test_and_clear_bit(MESH_WORK_DRIFT_ADJUST, &ifmsh->wrkq_flags))
mesh_sync_adjust_tbtt(sdata);

+ if (test_and_clear_bit(MESH_WORK_MBSS_CHANGED, &ifmsh->wrkq_flags))
+ mesh_bss_info_changed(sdata);
out:
sdata_unlock(sdata);
}
diff --git a/net/mac80211/mesh.h b/net/mac80211/mesh.h
index 8b4d9a3..be28f9b 100644
--- a/net/mac80211/mesh.h
+++ b/net/mac80211/mesh.h
@@ -57,6 +57,7 @@ enum mesh_path_flags {
* grow
* @MESH_WORK_ROOT: the mesh root station needs to send a frame
* @MESH_WORK_DRIFT_ADJUST: time to compensate for clock drift relative to other
+ * @MESH_WORK_MBSS_CHANGED: rebuild beacon and notify driver of BSS changes
* mesh nodes
*/
enum mesh_deferred_task_flags {
@@ -65,6 +66,7 @@ enum mesh_deferred_task_flags {
MESH_WORK_GROW_MPP_TABLE,
MESH_WORK_ROOT,
MESH_WORK_DRIFT_ADJUST,
+ MESH_WORK_MBSS_CHANGED,
};

/**
--
1.7.10.4

2013-06-11 11:15:25

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] mac80211: fix mesh deadlock

On Mon, 2013-06-10 at 13:17 -0700, Thomas Pedersen wrote:
> The patch "cfg80211/mac80211: use cfg80211 wdev mutex in
> mac80211" introduced several deadlocks by converting the
> ifmsh->mtx to wdev->mtx. Solve these by:
>
> 1. drop the cancel_work_sync() in ieee80211_stop_mesh().
> Instead make the mesh work conditional on whether the mesh
> is running or not.
> 2. lock the mesh work with sdata_lock() to protect beacon
> updates and prevent races with wdev->mesh_id_len or
> cfg80211.

Applied, thanks.

johannes

2013-06-11 20:32:51

by Thomas Pedersen

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] mac80211: update mesh beacon on workqueue

On Tue, Jun 11, 2013 at 4:30 AM, Johannes Berg
<[email protected]> wrote:
> On Mon, 2013-06-10 at 13:17 -0700, Thomas Pedersen wrote:
>
>> + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh;
>> + u32 bit;
>> +
>> + /* if we race with running work, worst case this work becomes a noop */
>> + for_each_set_bit(bit, (unsigned long *)&changed,
>> + sizeof(changed) * BITS_PER_BYTE)
>
> This isn't valid, it happens to work on little endian platforms but will
> fail on big endian 64-bit ones, because you have this in memory (0 is
> the lowest order nibble):
>
> 76 54 32 10 -- -- -- --

> and now you point an unsigned long pointer to it, so you interpret the
> "--" as the lowest bits.

OK I was just trying to make the compiler happy, but that makes sense.
Assigning changed (u32) to an unsigned long then getting the address
of that should move the u32 into the lower half of an unsigned long on
a 64-bit BE system?
Thanks for explaining this.

> More generally, I'd argue that mesh is being a bit odd here, flushing
> stations turing mesh stop can and will actually cause a BSS info update
> after the mesh interface has already been stopped (beaconing has been
> disabled in the driver.) This seems rather odd. Maybe it would be better
> to move the beacon update out of mesh_sta_cleanup() and into
> ieee80211_mesh_housekeeping() in some way? Although it'd also have to be
> done in the station handling in cfg.c but that shouldn't be a problem?

Yes it is odd to queue a bss info update but never do so. I don't know
if it really matters though. The problem is mesh_sta_cleanup() is
called from several paths: mac80211 mesh runtime, stop_mesh(), and
cfg80211. I think this is a fairly clean way of satisfying all the
users (mesh work queued from stop_mesh() is a noop if we check
ifmsh->mesh_id_len instead of the wdev->mesh_id_len).

It sounds like you'd like beacon updates to be asynchronous, which
this patch already accomplishes :)

> Note also that the way you did this is rather odd, ieee80211_stop_mesh()
> could cause to schedule out to the workqueue for the update, but then
> the update won't happen. It's a bit racy though, because you could stop
> and restart the mesh and then the workqueue runs or something? Overall
> this approach seems a bit brittle?

I guess if you clear the ifmsh->wrkq_flags at the end of stop_mesh()
this wouldn't happen. Also as long as the check to ensure no mesh work
is performed while not joined is in place, we should be ok.

--
Thomas