LinuxLists.cc - [PATCH] mac80211: fix and simplify mesh locking

2011-05-14 09:00:55

Subject: [PATCH] mac80211: fix and simplify mesh locking

From: Johannes Berg <[email protected]>

The locking in mesh_{mpath,mpp}_table_grow not only
has an rcu_read_unlock() missing, it's also racy
(though really only technically since it's invoked
from a single function only) since it obtains the
new size of the table without any locking, so two
invocations of the function could attempt the same
resize.

Additionally, it uses synchronize_rcu() which is
rather expensive and can be avoided trivially here.

Modify the functions to only use the table lock
and use call_rcu() instead of synchronize_rcu().

Signed-off-by: Johannes Berg <[email protected]>
---
net/mac80211/mesh.h | 3 +++
net/mac80211/mesh_pathtbl.c | 44 ++++++++++++++++++++++----------------------
2 files changed, 25 insertions(+), 22 deletions(-)

--- a/net/mac80211/mesh_pathtbl.c 2011-05-14 10:47:15.000000000 +0200
+++ b/net/mac80211/mesh_pathtbl.c 2011-05-14 10:55:42.000000000 +0200
@@ -370,52 +370,52 @@ err_path_alloc:
return err;
}

+static void mesh_table_free_rcu(struct rcu_head *rcu)
+{
+ struct mesh_table *tbl = container_of(rcu, struct mesh_table, rcu_head);
+
+ mesh_table_free(tbl, false);
+}
+
void mesh_mpath_table_grow(void)
{
struct mesh_table *oldtbl, *newtbl;

- rcu_read_lock();
- newtbl = mesh_table_alloc(rcu_dereference(mesh_paths)->size_order + 1);
- if (!newtbl)
- return;
write_lock_bh(&pathtbl_resize_lock);
+ newtbl = mesh_table_alloc(mesh_paths->size_order + 1);
+ if (!newtbl)
+ goto out;
oldtbl = mesh_paths;
if (mesh_table_grow(mesh_paths, newtbl) < 0) {
- rcu_read_unlock();
__mesh_table_free(newtbl);
- write_unlock_bh(&pathtbl_resize_lock);
- return;
+ goto out;
}
- rcu_read_unlock();
rcu_assign_pointer(mesh_paths, newtbl);
- write_unlock_bh(&pathtbl_resize_lock);

- synchronize_rcu();
- mesh_table_free(oldtbl, false);
+ call_rcu(&oldtbl->rcu_head, mesh_table_free_rcu);
+
+ out:
+ write_unlock_bh(&pathtbl_resize_lock);
}

void mesh_mpp_table_grow(void)
{
struct mesh_table *oldtbl, *newtbl;

- rcu_read_lock();
- newtbl = mesh_table_alloc(rcu_dereference(mpp_paths)->size_order + 1);
- if (!newtbl)
- return;
write_lock_bh(&pathtbl_resize_lock);
+ newtbl = mesh_table_alloc(mpp_paths->size_order + 1);
+ if (!newtbl)
+ goto out;
oldtbl = mpp_paths;
if (mesh_table_grow(mpp_paths, newtbl) < 0) {
- rcu_read_unlock();
__mesh_table_free(newtbl);
- write_unlock_bh(&pathtbl_resize_lock);
- return;
+ goto out;
}
- rcu_read_unlock();
rcu_assign_pointer(mpp_paths, newtbl);
- write_unlock_bh(&pathtbl_resize_lock);
+ call_rcu(&oldtbl->rcu_head, mesh_table_free_rcu);

- synchronize_rcu();
- mesh_table_free(oldtbl, false);
+ out:
+ write_unlock_bh(&pathtbl_resize_lock);
}

int mpp_path_add(u8 *dst, u8 *mpp, struct ieee80211_sub_if_data *sdata)
--- a/net/mac80211/mesh.h 2011-05-14 10:52:14.000000000 +0200
+++ b/net/mac80211/mesh.h 2011-05-14 10:52:35.000000000 +0200
@@ -120,6 +120,7 @@ struct mesh_path {
* buckets
* @mean_chain_len: maximum average length for the hash buckets' list, if it is
* reached, the table will grow
+ * rcu_head: RCU head to free the table
*/
struct mesh_table {
/* Number of buckets will be 2^N */
@@ -132,6 +133,8 @@ struct mesh_table {
int (*copy_node) (struct hlist_node *p, struct mesh_table *newtbl);
int size_order;
int mean_chain_len;
+
+ struct rcu_head rcu_head;
};

/* Recent multicast cache */

2011-05-19 01:40:54

by Javier Cardona

[permalink] [raw]

Subject: Re: [PATCH] mac80211: fix and simplify mesh locking

On Wed, May 18, 2011 at 3:45 PM, Johannes Berg
<[email protected]> wrote:
> On Tue, 2011-05-17 at 16:05 -0700, Javier Cardona wrote:
>
>> I'm seeing this after trying your patch, probably because the
>> allocations mesh_table_alloc() can block. ?In the past I had tried to
>> allocate the table before entering the critical section. ?If that is
>> not possible for the race condition you mention, ?then I guess we'll
>> have to make those allocations GFP_ATOMIC?
>
> Err, now I'm confused, how could the code have done non-atomic
> allocations while under rcu_read_lock()? I guess there was just no
> verification there?

I'm not sure. I can tell you that the bug message is only triggered
after your patch was applied, same .config.
Maybe i have an rcu configuration that does not check that.

Is making the allocations atomic an acceptable solution?

Javier

2011-05-17 23:05:50

by Javier Cardona

[permalink] [raw]

Subject: Re: [PATCH] mac80211: fix and simplify mesh locking

On Sat, May 14, 2011 at 2:20 AM, Johannes Berg
<[email protected]> wrote:
> On Sat, 2011-05-14 at 11:00 +0200, Johannes Berg wrote:
>> From: Johannes Berg <[email protected]>
>>
>> The locking in mesh_{mpath,mpp}_table_grow not only
>> has an rcu_read_unlock() missing, it's also racy
>> (though really only technically since it's invoked
>> from a single function only) since it obtains the
>> new size of the table without any locking, so two
>> invocations of the function could attempt the same
>> resize.
>
> Actually, it _can_ happen, if you have multiple mesh interfaces.

I'm seeing this after trying your patch, probably because the
allocations mesh_table_alloc() can block. In the past I had tried to
allocate the table before entering the critical section. If that is
not possible for the race condition you mention, then I guess we'll
have to make those allocations GFP_ATOMIC?

363.767523] BUG: scheduling while atomic: kworker/u:2/621/0x10000200
[ 363.768239] 3 locks held by kworker/u:2/621:
[ 363.768516] #0: (name){.+.+.+}, at: [<c10429e7>]
process_one_work+0x17c/0x31d
[ 363.769605] #1: ((&sdata->work)){+.+.+.}, at: [<c10429e7>]
process_one_work+0x17c/0x31d
[ 363.770149] #2: (pathtbl_resize_lock){++.-+.}, at: [<c8fb01ce>]
mesh_mpath_table_grow+0xf/0x5c [mac80211]
[ 363.771122] Modules linked in: mac80211_hwsim mac80211
[ 363.771824] Pid: 621, comm: kworker/u:2 Not tainted 2.6.39-rc7-wl+ #399
[ 363.772304] Call Trace:
[ 363.772611] [<c8faf69f>] ? mesh_table_alloc+0x25/0xc2 [mac80211]
[ 363.772969] [<c102c049>] __schedule_bug+0x5e/0x65
[ 363.773350] [<c14672af>] schedule+0x68/0x699
[ 363.773608] [<c1058328>] ? __lock_acquire+0xab3/0xb59
[ 363.773900] [<c105694e>] ? valid_state+0x1a/0x13d
[ 363.774244] [<c1056c10>] ? mark_lock+0x19f/0x1d9
[ 363.774527] [<c105720b>] ? check_usage_forwards+0x68/0x68
[ 363.774873] [<c1056c8d>] ? mark_held_locks+0x43/0x5b
[ 363.775303] [<c8faf69f>] ? mesh_table_alloc+0x25/0xc2 [mac80211]
[ 363.775654] [<c102e544>] __cond_resched+0x16/0x26
[ 363.775918] [<c1467a92>] _cond_resched+0x1d/0x28

Javier

--
Javier Cardona
cozybit Inc.
http://www.cozybit.com

2011-05-17 23:13:58

by Javier Cardona

[permalink] [raw]

Subject: [PATCH] mac80211: Don't sleep when growing the mesh path

After commit 1928ecab620907a0953f811316d05f367f3f4dba (mac80211: fix and
simplify mesh locking) mesh table allocation is performed with the
pathtbl_resize_lock taken. Under those conditions one should not sleep.
This patch makes the allocations GFP_ATOMIC to prevent that.

Signed-off-by: Javier Cardona <[email protected]>
---
net/mac80211/mesh_pathtbl.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/mac80211/mesh_pathtbl.c b/net/mac80211/mesh_pathtbl.c
index 6abe25d..0d2faac 100644
--- a/net/mac80211/mesh_pathtbl.c
+++ b/net/mac80211/mesh_pathtbl.c
@@ -76,12 +76,12 @@ static struct mesh_table *mesh_table_alloc(int size_order)
int i;
struct mesh_table *newtbl;

- newtbl = kmalloc(sizeof(struct mesh_table), GFP_KERNEL);
+ newtbl = kmalloc(sizeof(struct mesh_table), GFP_ATOMIC);
if (!newtbl)
return NULL;

newtbl->hash_buckets = kzalloc(sizeof(struct hlist_head) *
- (1 << size_order), GFP_KERNEL);
+ (1 << size_order), GFP_ATOMIC);

if (!newtbl->hash_buckets) {
kfree(newtbl);
@@ -89,7 +89,7 @@ static struct mesh_table *mesh_table_alloc(int size_order)
}

newtbl->hashwlock = kmalloc(sizeof(spinlock_t) *
- (1 << size_order), GFP_KERNEL);
+ (1 << size_order), GFP_ATOMIC);
if (!newtbl->hashwlock) {
kfree(newtbl->hash_buckets);
kfree(newtbl);
--
1.7.1

2011-05-14 09:20:08

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH] mac80211: fix and simplify mesh locking

On Sat, 2011-05-14 at 11:00 +0200, Johannes Berg wrote:
> From: Johannes Berg <[email protected]>
>
> The locking in mesh_{mpath,mpp}_table_grow not only
> has an rcu_read_unlock() missing, it's also racy
> (though really only technically since it's invoked
> from a single function only) since it obtains the
> new size of the table without any locking, so two
> invocations of the function could attempt the same
> resize.

Actually, it _can_ happen, if you have multiple mesh interfaces.

johannes

2011-05-18 22:43:16

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH] mac80211: fix and simplify mesh locking

On Tue, 2011-05-17 at 16:05 -0700, Javier Cardona wrote:

> I'm seeing this after trying your patch, probably because the
> allocations mesh_table_alloc() can block. In the past I had tried to
> allocate the table before entering the critical section. If that is
> not possible for the race condition you mention, then I guess we'll
> have to make those allocations GFP_ATOMIC?

Err, now I'm confused, how could the code have done non-atomic
allocations while under rcu_read_lock()? I guess there was just no
verification there?

johannes

2011-05-19 04:14:24

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH] mac80211: fix and simplify mesh locking

On Wed, 18 May 2011 18:40:33 -0700, Javier Cardona wrote:
>>> I'm seeing this after trying your patch, probably because the
>>> allocations mesh_table_alloc() can block. In the past I had tried
>>> to
>>> allocate the table before entering the critical section. If that
>>> is
>>> not possible for the race condition you mention, then I guess
>>> we'll
>>> have to make those allocations GFP_ATOMIC?
>>
>> Err, now I'm confused, how could the code have done non-atomic
>> allocations while under rcu_read_lock()? I guess there was just no
>> verification there?
>
> I'm not sure. I can tell you that the bug message is only triggered
> after your patch was applied, same .config.
> Maybe i have an rcu configuration that does not check that.
>
> Is making the allocations atomic an acceptable solution?

It looks like it's the only solution :-)
Anyway they should've been atomic even before already, not sure why RCU
didn't complain but it was definitely not right to do GFP_KERNEL
allocations under RCU lock, but now that I think about it ISTR that
whether it warns or not might depend on the RCU implementation you
selected in the config.

johannes