2013-06-11 19:00:08

by Ben Greear

[permalink] [raw]
Subject: kmemleak report in 3.9.5+, related to cfg80211_inform_bss_frame

I see several reports similar to the one below while doing some
kmemleak testing on my 3.9.5+ tree (with local patches applied):

http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.9.dev.y/.git;a=summary


While looking through the code, I found this in scan.c:cfg80211_bss_update

} else {
/*
* Ok so we found a beacon, and don't have an entry. If
* it's a beacon with hidden SSID, we might be in for an
* expensive search for any probe responses that should
* be grouped with this beacon for updates ...
*/
if (!cfg80211_combine_bsses(dev, new)) {
kfree(new);
goto drop;
}
}

I don't know if this is the culprit that I am seeing, but I believe
we should be freeing the tmp.pub.beacon_ies (if it is not NULL)
before doing the 'goto drop'?


The kmemleak report is below:


unreferenced object 0xffff8801c8e41e78 (size 192):
comm "kworker/u:2", pid 157, jiffies 4295509873 (age 86582.869s)
hex dump (first 32 bytes):
41 0d 00 30 02 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b A..0....kkkkkkkk
6b 6b 6b 6b 6b 6b 6b 6b 69 00 00 00 00 0c 2e 32 kkkkkkkki......2
backtrace:
[<ffffffff815de7bf>] kmemleak_alloc+0x73/0x98
[<ffffffff8118b4d4>] slab_post_alloc_hook+0x28/0x2a
[<ffffffff8118d605>] __kmalloc+0xf9/0x122
[<ffffffffa027cb27>] cfg80211_inform_bss_frame+0x114/0x1f8 [cfg80211]
[<ffffffffa03d6865>] ieee80211_bss_info_update+0x66/0x21f [mac80211]
[<ffffffffa040aec6>] ieee80211_rx_bss_info+0x12f/0x1ca [mac80211]
[<ffffffffa040b017>] ieee80211_rx_mgmt_probe_resp+0xb6/0x197 [mac80211]
[<ffffffffa040e8a3>] ieee80211_sta_rx_queued_mgmt+0xdd/0x60e [mac80211]
[<ffffffffa03df0ee>] ieee80211_iface_work+0x238/0x2cc [mac80211]
[<ffffffff810b0cd3>] process_one_work+0x292/0x42e
[<ffffffff810b36af>] worker_thread+0x14f/0x264
[<ffffffff810b7bea>] kthread+0xc7/0xcf
[<ffffffff815f64ec>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff


(gdb) l *(cfg80211_inform_bss_frame+0x114)
0x8b27 is in cfg80211_inform_bss_frame (/home/greearb/git/linux-3.9.dev.y/net/wireless/scan.c:960).
955 ielen, channel);
956 if (!channel)
957 return NULL;
958
959 ies = kmalloc(sizeof(*ies) + ielen, gfp);
960 if (!ies)
961 return NULL;
962 ies->len = ielen;
963 ies->tsf = le64_to_cpu(mgmt->u.probe_resp.timestamp);
964 memcpy(ies->data, mgmt->u.probe_resp.variable, ielen);
(gdb)



Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com



2013-06-11 19:08:08

by Ben Greear

[permalink] [raw]
Subject: Re: kmemleak report in 3.9.5+, related to cfg80211_inform_bss_frame

On 06/11/2013 12:00 PM, Ben Greear wrote:
> I see several reports similar to the one below while doing some
> kmemleak testing on my 3.9.5+ tree (with local patches applied):
>
> http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.9.dev.y/.git;a=summary
>
>
> While looking through the code, I found this in scan.c:cfg80211_bss_update
>
> } else {
> /*
> * Ok so we found a beacon, and don't have an entry. If
> * it's a beacon with hidden SSID, we might be in for an
> * expensive search for any probe responses that should
> * be grouped with this beacon for updates ...
> */
> if (!cfg80211_combine_bsses(dev, new)) {
> kfree(new);
> goto drop;
> }
> }
>
> I don't know if this is the culprit that I am seeing, but I believe
> we should be freeing the tmp.pub.beacon_ies (if it is not NULL)
> before doing the 'goto drop'?

Err, I guess not. Seems the only way that combine_bsses can return false is
if beacon_ies is null. So, I guess it must be something else...


>
>
> The kmemleak report is below:
>
>
> unreferenced object 0xffff8801c8e41e78 (size 192):
> comm "kworker/u:2", pid 157, jiffies 4295509873 (age 86582.869s)
> hex dump (first 32 bytes):
> 41 0d 00 30 02 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b A..0....kkkkkkkk
> 6b 6b 6b 6b 6b 6b 6b 6b 69 00 00 00 00 0c 2e 32 kkkkkkkki......2
> backtrace:
> [<ffffffff815de7bf>] kmemleak_alloc+0x73/0x98
> [<ffffffff8118b4d4>] slab_post_alloc_hook+0x28/0x2a
> [<ffffffff8118d605>] __kmalloc+0xf9/0x122
> [<ffffffffa027cb27>] cfg80211_inform_bss_frame+0x114/0x1f8 [cfg80211]
> [<ffffffffa03d6865>] ieee80211_bss_info_update+0x66/0x21f [mac80211]
> [<ffffffffa040aec6>] ieee80211_rx_bss_info+0x12f/0x1ca [mac80211]
> [<ffffffffa040b017>] ieee80211_rx_mgmt_probe_resp+0xb6/0x197 [mac80211]
> [<ffffffffa040e8a3>] ieee80211_sta_rx_queued_mgmt+0xdd/0x60e [mac80211]
> [<ffffffffa03df0ee>] ieee80211_iface_work+0x238/0x2cc [mac80211]
> [<ffffffff810b0cd3>] process_one_work+0x292/0x42e
> [<ffffffff810b36af>] worker_thread+0x14f/0x264
> [<ffffffff810b7bea>] kthread+0xc7/0xcf
> [<ffffffff815f64ec>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff
>
>
> (gdb) l *(cfg80211_inform_bss_frame+0x114)
> 0x8b27 is in cfg80211_inform_bss_frame (/home/greearb/git/linux-3.9.dev.y/net/wireless/scan.c:960).
> 955 ielen, channel);
> 956 if (!channel)
> 957 return NULL;
> 958
> 959 ies = kmalloc(sizeof(*ies) + ielen, gfp);
> 960 if (!ies)
> 961 return NULL;
> 962 ies->len = ielen;
> 963 ies->tsf = le64_to_cpu(mgmt->u.probe_resp.timestamp);
> 964 memcpy(ies->data, mgmt->u.probe_resp.variable, ielen);
> (gdb)
>
>
>
> Thanks,
> Ben
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2013-06-15 19:18:52

by Ben Greear

[permalink] [raw]
Subject: Re: kmemleak report in 3.9.5+, related to cfg80211_inform_bss_frame

On 06/15/2013 10:11 AM, Johannes Berg wrote:
> On Fri, 2013-06-14 at 15:48 -0700, Ben Greear wrote:
>
>> I've instrumented all (as far as I can tell) allocation points
>> and destruction points for the ies, and I am keeping a separate
>> list of structures to record some info about each ies.
>>
>> I loaded up lots of stations, let them bounce around for a while,
>> and then did an 'rmmod ath9k'.
>>
>> I still see 14 ies entries in my debug list. Should
>> I expect to see zero ies data structures left after
>> I rmmod all wifi drivers, or is it normal for a few
>> to be left around?
>
> No, that's decidedly not normal, but that's pretty much just what
> kmemleak already told us, no? Good to verify, but doesn't really help
> pinpoint the problem either, I'd say :-(

Well, I wanted to make sure it wasn't false-positive in kmemleak.

Sometime around March someone reported a similar issue but decided
kmemleak was wrong (and you fixed one corner case that was reported
at the same time)..

I tried instrumenting the bss as well. From what I can tell, we are
leaking them, as when I print out all bss for the phy* objects, I only
get 14. But, I see several hundred that were allocated and never freed.

kmemleak didn't show leaked bss, so possibly I am wrong about this, or
maybe they are still (properly?) referenced from the virtual station
interfaces.

Is there any good reason that there would be a significant amount of bss
objects in the system that are not in the bss_list for the wiphys?

This is from my debug code, but perhaps it gives an idea (actually,
the ies are undercounted, as 31500 was the buffer size I
allowed for debugfs...I'll work on better instrumentation to
give an accurate count of still-allocated ies.)

[root@LEC2220-1 ~]# cat /debug/ieee80211/wiphy*/bss|wc
14 112 1288
[root@LEC2220-1 ~]# cat /debug/ieee80211/ies |wc
642 1286 31500
[root@LEC2220-1 ~]# cat /debug/ieee80211/all_bss|wc
382 764 16044


The good news is that it appears the bss growth (at least) does not grow too fast..this
is after around 20 hours of running the torture test that constantly is trying to
(re)associate 400 stations with an AP that can handle only 127.

Thanks,
Ben


>
> johannes
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2013-06-14 22:48:38

by Ben Greear

[permalink] [raw]
Subject: Re: kmemleak report in 3.9.5+, related to cfg80211_inform_bss_frame

On 06/11/2013 05:36 PM, Ben Greear wrote:
> On 06/11/2013 12:08 PM, Ben Greear wrote:
>> On 06/11/2013 12:00 PM, Ben Greear wrote:
>>> I see several reports similar to the one below while doing some
>>> kmemleak testing on my 3.9.5+ tree (with local patches applied):
>>>
>>> http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.9.dev.y/.git;a=summary
>
>>> The kmemleak report is below:
>>>
>>>
>>> unreferenced object 0xffff8801c8e41e78 (size 192):
>>> comm "kworker/u:2", pid 157, jiffies 4295509873 (age 86582.869s)
>>> hex dump (first 32 bytes):
>>> 41 0d 00 30 02 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b A..0....kkkkkkkk
>>> 6b 6b 6b 6b 6b 6b 6b 6b 69 00 00 00 00 0c 2e 32 kkkkkkkki......2
>>> backtrace:
>>> [<ffffffff815de7bf>] kmemleak_alloc+0x73/0x98
>>> [<ffffffff8118b4d4>] slab_post_alloc_hook+0x28/0x2a
>>> [<ffffffff8118d605>] __kmalloc+0xf9/0x122
>>> [<ffffffffa027cb27>] cfg80211_inform_bss_frame+0x114/0x1f8 [cfg80211]
>>> [<ffffffffa03d6865>] ieee80211_bss_info_update+0x66/0x21f [mac80211]
>>> [<ffffffffa040aec6>] ieee80211_rx_bss_info+0x12f/0x1ca [mac80211]
>>> [<ffffffffa040b017>] ieee80211_rx_mgmt_probe_resp+0xb6/0x197 [mac80211]
>>> [<ffffffffa040e8a3>] ieee80211_sta_rx_queued_mgmt+0xdd/0x60e [mac80211]
>>> [<ffffffffa03df0ee>] ieee80211_iface_work+0x238/0x2cc [mac80211]
>>> [<ffffffff810b0cd3>] process_one_work+0x292/0x42e
>>> [<ffffffff810b36af>] worker_thread+0x14f/0x264
>>> [<ffffffff810b7bea>] kthread+0xc7/0xcf
>>> [<ffffffff815f64ec>] ret_from_fork+0x7c/0xb0
>>> [<ffffffffffffffff>] 0xffffffffffffffff

I am still working on trying to figure this one out.

I've instrumented all (as far as I can tell) allocation points
and destruction points for the ies, and I am keeping a separate
list of structures to record some info about each ies.

I loaded up lots of stations, let them bounce around for a while,
and then did an 'rmmod ath9k'.

I still see 14 ies entries in my debug list. Should
I expect to see zero ies data structures left after
I rmmod all wifi drivers, or is it normal for a few
to be left around?


Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2013-06-12 00:36:45

by Ben Greear

[permalink] [raw]
Subject: Re: kmemleak report in 3.9.5+, related to cfg80211_inform_bss_frame

On 06/11/2013 12:08 PM, Ben Greear wrote:
> On 06/11/2013 12:00 PM, Ben Greear wrote:
>> I see several reports similar to the one below while doing some
>> kmemleak testing on my 3.9.5+ tree (with local patches applied):
>>
>> http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.9.dev.y/.git;a=summary

>> The kmemleak report is below:
>>
>>
>> unreferenced object 0xffff8801c8e41e78 (size 192):
>> comm "kworker/u:2", pid 157, jiffies 4295509873 (age 86582.869s)
>> hex dump (first 32 bytes):
>> 41 0d 00 30 02 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b A..0....kkkkkkkk
>> 6b 6b 6b 6b 6b 6b 6b 6b 69 00 00 00 00 0c 2e 32 kkkkkkkki......2
>> backtrace:
>> [<ffffffff815de7bf>] kmemleak_alloc+0x73/0x98
>> [<ffffffff8118b4d4>] slab_post_alloc_hook+0x28/0x2a
>> [<ffffffff8118d605>] __kmalloc+0xf9/0x122
>> [<ffffffffa027cb27>] cfg80211_inform_bss_frame+0x114/0x1f8 [cfg80211]
>> [<ffffffffa03d6865>] ieee80211_bss_info_update+0x66/0x21f [mac80211]
>> [<ffffffffa040aec6>] ieee80211_rx_bss_info+0x12f/0x1ca [mac80211]
>> [<ffffffffa040b017>] ieee80211_rx_mgmt_probe_resp+0xb6/0x197 [mac80211]
>> [<ffffffffa040e8a3>] ieee80211_sta_rx_queued_mgmt+0xdd/0x60e [mac80211]
>> [<ffffffffa03df0ee>] ieee80211_iface_work+0x238/0x2cc [mac80211]
>> [<ffffffff810b0cd3>] process_one_work+0x292/0x42e
>> [<ffffffff810b36af>] worker_thread+0x14f/0x264
>> [<ffffffff810b7bea>] kthread+0xc7/0xcf
>> [<ffffffff815f64ec>] ret_from_fork+0x7c/0xb0
>> [<ffffffffffffffff>] 0xffffffffffffffff

Something else came to mind on this.

To determine if we should delete an old pointer to memory,
we do an rcu_access_pointer to read the old value, and
we are assigning with rcu_assign_pointer.

Could this be racing so that rcu_access_pointer returns NULL
when looking for the old pointer, but other threads manage
to set the pointer more than once, leaking all but the last
to be set?

For instance, this code:

if (found) {
/* Update IEs */
if (rcu_access_pointer(tmp->pub.proberesp_ies)) {
const struct cfg80211_bss_ies *old;

old = rcu_access_pointer(found->pub.proberesp_ies);

rcu_assign_pointer(found->pub.proberesp_ies,
tmp->pub.proberesp_ies);
/* Override possible earlier Beacon frame IEs */
rcu_assign_pointer(found->pub.ies,
tmp->pub.proberesp_ies);
if (old)
kfree_rcu((struct cfg80211_bss_ies *)old,
rcu_head);


I don't see a huge number of leaks..but they are definitely
accumulating if kmemleak is to be believed...

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2013-06-15 17:11:19

by Johannes Berg

[permalink] [raw]
Subject: Re: kmemleak report in 3.9.5+, related to cfg80211_inform_bss_frame

On Fri, 2013-06-14 at 15:48 -0700, Ben Greear wrote:

> I've instrumented all (as far as I can tell) allocation points
> and destruction points for the ies, and I am keeping a separate
> list of structures to record some info about each ies.
>
> I loaded up lots of stations, let them bounce around for a while,
> and then did an 'rmmod ath9k'.
>
> I still see 14 ies entries in my debug list. Should
> I expect to see zero ies data structures left after
> I rmmod all wifi drivers, or is it normal for a few
> to be left around?

No, that's decidedly not normal, but that's pretty much just what
kmemleak already told us, no? Good to verify, but doesn't really help
pinpoint the problem either, I'd say :-(

johannes