2015-05-25 21:01:13

by Larry Finger

[permalink] [raw]
Subject: Memory leak in btusb

Hi,

While using kmemleak to check for memory leaks in a wireless driver, I noticed
the following stack traceback for a leak in btusb:

<ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
[<ffffffffa06029d6>] btusb_recv_intr+0x136/0x180 [btusb]
[<ffffffffa0602ad8>] btusb_intr_complete+0xb8/0x150 [btusb]
[<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120

To eliminate a false positive, I unloaded the driver and got the following for
the virtual address:

[<ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
[<ffffffffa06029d6>] 0xffffffffa06029d6
[<ffffffffa0602ad8>] 0xffffffffa0602ad8
[<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120

The source is from wireless-drivers-next.git with pulls from bluetooth-next.git
and Torvalds mainline git repo that were done on May 25. The source includes
commit 04b8c8143d46453a443ac32bfcd76ec952605765 with the subject "Bluetooth:
btusb: fix Realtek suspend/resume".

The Bluetooth device in use is made by Intel with USB ID 8087:07dc.

Thanks,

Larry


2015-06-06 22:54:25

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Memory leak in btusb

Hi Larry,

>> since the interrupt endpoint is only used for HCI events, you do not even need connections. This could be anything simple like HCI command complete event.
>>
>> Now I wonder if this is something odd where the HCI event fits completely into a single interrupt URB or something special like that. If you get get the content of that URB, then I might be able to track this down. Or do you have a chance to run btmon and correlate the timestamps.
>
> Synchronizing the timing between kmemleak and btmon is a little difficult as the former reports time only in jiffies and seconds before present. I will keep looking, but I think the leaked allocation happened at roughly 16:17:52. The only things that btmon logged are

have you tried dmesg -T to actually get human readable timestamps.

>
>> HCI Event: Number of Completed Packets (0x13) plen 5
> [hci0] 8
> Num handles: 1
> Handle: 256
> Count: 1
>> ACL Data RX: Handle 256 flags 0x02 dlen 12
> [hci0] 16:18:49.598733
> L2CAP: Disconnection Response (0x07) ident 4 len 4
> Destination CID: 65
> Source CID: 64
> < HCI Command: Read Clock Offset (0x01|0x001f) plen 2 [hci0] 16:18:51.601920
> Handle: 256
>> HCI Event: Command Status (0x0f) plen 4
> [hci0] 16:18:51.602808
> Read Clock Offset (0x01|0x001f) ncmd 1
> Status: Success (0x00)
> < HCI Command: Disconnect (0x01|0x0006) plen 3 [hci0] 16:18:51.603069
> Handle: 256
> Reason: Remote User Terminated Connection (0x13)
>> HCI Event: Command Status (0x0f) plen 4
> [hci0] 16:18:51.603750
> Disconnect (0x01|0x0006) ncmd 1
> Status: Success (0x00)
>> HCI Event: Read Clock Offset Complete (0x1c) plen 5
> [hci0] 16:18:51.606847
> Status: Success (0x00)
> Handle: 256
> Clock offset: 0x4568
>> HCI Event: Disconnect Complete (0x05) plen 4
> [hci0] 16:18:51.687885
> Status: Success (0x00)
> Handle: 256
> Reason: Connection Terminated By Local Host (0x16)
> @ Device Disconnected: 00:26:5F:D4:27:72 (0) reason 2
> < HCI Command: Write Scan Enable (0x03|0x001a) plen 1 [hci0] 16:18:51.701737
> Scan enable: Page Scan (0x02)
>> HCI Event: Command Complete (0x0e) plen 4
> [hci0] 16:18:51.702749
> Write Scan Enable (0x03|0x001a) ncmd 2
> Status: Success (0x00)
> @ Device Removed: 00:26:5F:D4:27:72 (0)
> < HCI Command: Write Scan Enable (0x03|0x001a) plen 1 [hci0] 16:19:04.771489
> Scan enable: No Scans (0x00)
>
> Sorry, previous message sent early by mistake.
>
> Is it possible that the Disconnect event is the source of the leak?

that would be just wild guessing. Any event would be good for it. If you say the leaked SKB is 16:17:52, then none of these events is it. They are pretty much accurate time wise. The time is when the event has been reassembled. The Bluetooth core does timestamp the packets.

I wonder if the max packet size for interrupt is either 8 octets or 64 octets. So a good candidate might actually connect request, connect complete, remote features complete or remote name complete packets.

You could try to capture a binary usbmon trace in PCAP format and see what is happening on the USB side and how things are actually fragmented.

Regards

Marcel


2015-06-06 21:38:24

by Larry Finger

[permalink] [raw]
Subject: Re: Memory leak in btusb

On 06/06/2015 01:31 PM, Marcel Holtmann wrote:
> Hi Larry,
>
> since the interrupt endpoint is only used for HCI events, you do not even need connections. This could be anything simple like HCI command complete event.
>
> Now I wonder if this is something odd where the HCI event fits completely into a single interrupt URB or something special like that. If you get get the content of that URB, then I might be able to track this down. Or do you have a chance to run btmon and correlate the timestamps.

Synchronizing the timing between kmemleak and btmon is a little difficult as the
former reports time only in jiffies and seconds before present. I will keep
looking, but I think the leaked allocation happened at roughly 16:17:52. The
only things that btmon logged are

> HCI Event: Number of Completed Packets (0x13) plen 5
[hci0] 8
Num handles: 1
Handle: 256
Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 12
[hci0] 16:18:49.598733
L2CAP: Disconnection Response (0x07) ident 4 len 4
Destination CID: 65
Source CID: 64
< HCI Command: Read Clock Offset (0x01|0x001f) plen 2
[hci0] 16:18:51.601920
Handle: 256
> HCI Event: Command Status (0x0f) plen 4
[hci0] 16:18:51.602808
Read Clock Offset (0x01|0x001f) ncmd 1
Status: Success (0x00)
< HCI Command: Disconnect (0x01|0x0006) plen 3
[hci0] 16:18:51.603069
Handle: 256
Reason: Remote User Terminated Connection (0x13)
> HCI Event: Command Status (0x0f) plen 4
[hci0] 16:18:51.603750
Disconnect (0x01|0x0006) ncmd 1
Status: Success (0x00)
> HCI Event: Read Clock Offset Complete (0x1c) plen 5
[hci0] 16:18:51.606847
Status: Success (0x00)
Handle: 256
Clock offset: 0x4568
> HCI Event: Disconnect Complete (0x05) plen 4
[hci0] 16:18:51.687885
Status: Success (0x00)
Handle: 256
Reason: Connection Terminated By Local Host (0x16)
@ Device Disconnected: 00:26:5F:D4:27:72 (0) reason 2
< HCI Command: Write Scan Enable (0x03|0x001a) plen 1
[hci0] 16:18:51.701737
Scan enable: Page Scan (0x02)
> HCI Event: Command Complete (0x0e) plen 4
[hci0] 16:18:51.702749
Write Scan Enable (0x03|0x001a) ncmd 2
Status: Success (0x00)
@ Device Removed: 00:26:5F:D4:27:72 (0)
< HCI Command: Write Scan Enable (0x03|0x001a) plen 1
[hci0] 16:19:04.771489
Scan enable: No Scans (0x00)

Sorry, previous message sent early by mistake.

Is it possible that the Disconnect event is the source of the leak?

I will try to coordinate the clocks.

Larry

2015-06-06 21:31:07

by Larry Finger

[permalink] [raw]
Subject: Re: Memory leak in btusb

On 06/06/2015 01:31 PM, Marcel Holtmann wrote:
> Hi Larry,
>
> since the interrupt endpoint is only used for HCI events, you do not even need connections. This could be anything simple like HCI command complete event.
>
> Now I wonder if this is something odd where the HCI event fits completely into a single interrupt URB or something special like that. If you get get the content of that URB, then I might be able to track this down. Or do you have a chance to run btmon and correlate the timestamps.

Synchronizing the timing between kmemleak and btmon is a little difficult as the
former reports time only in jiffies and seconds before present. I will keep
looking, but I think the leaked allocation happened at roughly 16:17:52. The
only things that btmon logged are

> HCI Event: Number of Completed Packets (0x13) plen 5
[hci0] 8
Num handles: 1
Handle: 256
Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 12
[hci0] 16:18:49.598733
L2CAP: Disconnection Response (0x07) ident 4 len 4
Destination CID: 65
Source CID: 64
< HCI Command: Read Clock Offset (0x01|0x001f) plen 2
[hci0] 16:18:51.601920
Handle: 256
> HCI Event: Command Status (0x0f) plen 4
[hci0] 16:18:51.602808
Read Clock Offset (0x01|0x001f) ncmd 1
Status: Success (0x00)
< HCI Command: Disconnect (0x01|0x0006) plen 3
[hci0] 16:18:51.603069
Handle: 256
Reason: Remote User Terminated Connection (0x13)
> HCI Event: Command Status (0x0f) plen 4
[hci0] 16:18:51.603750
Disconnect (0x01|0x0006) ncmd 1
Status: Success (0x00)
> HCI Event: Read Clock Offset Complete (0x1c) plen 5
[hci0] 16:18:51.606847
Status: Success (0x00)
Handle: 256
Clock offset: 0x4568
> HCI Event: Disconnect Complete (0x05) plen 4
[hci0] 16:18:51.687885
Status: Success (0x00)
Handle: 256
Reason: Connection Terminated By Local Host (0x16)
@ Device Disconnected: 00:26:5F:D4:27:72 (0) reason 2
< HCI Command: Write Scan Enable (0x03|0x001a) plen 1
[hci0] 16:18:51.701737
Scan enable: Page Scan (0x02)
> HCI Event: Command Complete (0x0e) plen 4
[hci0] 16:18:51.702749
Write Scan Enable (0x03|0x001a) ncmd 2
Status: Success (0x00)
@ Device Removed: 00:26:5F:D4:27:72 (0)
< HCI Command: Write Scan Enable (0x03|0x001a) plen 1
[hci0] 16:19:04.771489
Scan enable: No Scans (0x00)

2015-06-06 18:31:24

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Memory leak in btusb

Hi Larry,

>>> While using kmemleak to check for memory leaks in a wireless driver, I noticed the following stack traceback for a leak in btusb:
>>>
>>> <ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
>>> [<ffffffffa06029d6>] btusb_recv_intr+0x136/0x180 [btusb]
>>> [<ffffffffa0602ad8>] btusb_intr_complete+0xb8/0x150 [btusb]
>>> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120
>>>
>>> To eliminate a false positive, I unloaded the driver and got the following for the virtual address:
>>>
>>> [<ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
>>> [<ffffffffa06029d6>] 0xffffffffa06029d6
>>> [<ffffffffa0602ad8>] 0xffffffffa0602ad8
>>> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120
>>
>> if this is really an alloc_skb from btusb_recv_intr, then it is the HCI event reassembly handling. Meaning this is data->evt_skb since that is the only one that is ever allocated there.
>>
>> All the SKBs allocated in that function are consumed by the core or later on freed by btusb_free_frags. Is this some rare case on suspend/resume where we forget to free the frags when we get disconnected and re-enumerate via probe?
>>
>> When the core consumes this SKB via hci_recv_frame it really consumes it. If for some reason this function returns an error, it still frees the SKB. So for all intense in purposes it could be even void.
>>
>> So I need some more info on what is causing this memory leak. The one in the Intel setup routine was obvious. If that does not fix it, then this is not obvious. You might need to check which alloc_skb this really is. As I said, if it is the one in btusb_recv_intr, then it is the data->evt_skb that is leaking in some corner case.
>
> Yes, it is the alloc_skb from btusb_recv_intr() and data->evt_skb is leaking, but it is not due to a suspend/resume. I have no clue as to what corner case I might be triggering. I have two Bluetooth mice, but neither appears to be working. To test, I have connected to a cell phone, but that does not allow for much data exchange. Even so, I now have 5 of these leaks from btusb_recv_intr(). As before, all of them are real leaks.

since the interrupt endpoint is only used for HCI events, you do not even need connections. This could be anything simple like HCI command complete event.

Now I wonder if this is something odd where the HCI event fits completely into a single interrupt URB or something special like that. If you get get the content of that URB, then I might be able to track this down. Or do you have a chance to run btmon and correlate the timestamps.

Regards

Marcel


2015-06-06 18:03:51

by Larry Finger

[permalink] [raw]
Subject: Re: Memory leak in btusb

On 06/06/2015 11:37 AM, Marcel Holtmann wrote:
> Hi Larry,
>
>> While using kmemleak to check for memory leaks in a wireless driver, I noticed the following stack traceback for a leak in btusb:
>>
>> <ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
>> [<ffffffffa06029d6>] btusb_recv_intr+0x136/0x180 [btusb]
>> [<ffffffffa0602ad8>] btusb_intr_complete+0xb8/0x150 [btusb]
>> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120
>>
>> To eliminate a false positive, I unloaded the driver and got the following for the virtual address:
>>
>> [<ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
>> [<ffffffffa06029d6>] 0xffffffffa06029d6
>> [<ffffffffa0602ad8>] 0xffffffffa0602ad8
>> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120
>
> if this is really an alloc_skb from btusb_recv_intr, then it is the HCI event reassembly handling. Meaning this is data->evt_skb since that is the only one that is ever allocated there.
>
> All the SKBs allocated in that function are consumed by the core or later on freed by btusb_free_frags. Is this some rare case on suspend/resume where we forget to free the frags when we get disconnected and re-enumerate via probe?
>
> When the core consumes this SKB via hci_recv_frame it really consumes it. If for some reason this function returns an error, it still frees the SKB. So for all intense in purposes it could be even void.
>
> So I need some more info on what is causing this memory leak. The one in the Intel setup routine was obvious. If that does not fix it, then this is not obvious. You might need to check which alloc_skb this really is. As I said, if it is the one in btusb_recv_intr, then it is the data->evt_skb that is leaking in some corner case.

Yes, it is the alloc_skb from btusb_recv_intr() and data->evt_skb is leaking,
but it is not due to a suspend/resume. I have no clue as to what corner case I
might be triggering. I have two Bluetooth mice, but neither appears to be
working. To test, I have connected to a cell phone, but that does not allow for
much data exchange. Even so, I now have 5 of these leaks from btusb_recv_intr().
As before, all of them are real leaks.

Larry

2015-06-06 16:37:41

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Memory leak in btusb

Hi Larry,

> While using kmemleak to check for memory leaks in a wireless driver, I noticed the following stack traceback for a leak in btusb:
>
> <ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
> [<ffffffffa06029d6>] btusb_recv_intr+0x136/0x180 [btusb]
> [<ffffffffa0602ad8>] btusb_intr_complete+0xb8/0x150 [btusb]
> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120
>
> To eliminate a false positive, I unloaded the driver and got the following for the virtual address:
>
> [<ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
> [<ffffffffa06029d6>] 0xffffffffa06029d6
> [<ffffffffa0602ad8>] 0xffffffffa0602ad8
> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120

if this is really an alloc_skb from btusb_recv_intr, then it is the HCI event reassembly handling. Meaning this is data->evt_skb since that is the only one that is ever allocated there.

All the SKBs allocated in that function are consumed by the core or later on freed by btusb_free_frags. Is this some rare case on suspend/resume where we forget to free the frags when we get disconnected and re-enumerate via probe?

When the core consumes this SKB via hci_recv_frame it really consumes it. If for some reason this function returns an error, it still frees the SKB. So for all intense in purposes it could be even void.

So I need some more info on what is causing this memory leak. The one in the Intel setup routine was obvious. If that does not fix it, then this is not obvious. You might need to check which alloc_skb this really is. As I said, if it is the one in btusb_recv_intr, then it is the data->evt_skb that is leaking in some corner case.

Regards

Marcel


2015-06-06 16:03:24

by Larry Finger

[permalink] [raw]
Subject: Re: Memory leak in btusb

On 06/06/2015 01:23 AM, Marcel Holtmann wrote:

> actually if this is only this device causing it, this might fix it:
>
> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> index 94c6c048130f..e6815c678898 100644
> --- a/drivers/bluetooth/btusb.c
> +++ b/drivers/bluetooth/btusb.c
> @@ -1614,6 +1614,8 @@ static int btusb_setup_intel(struct hci_dev *hdev)
> }
> fw_ptr = fw->data;
>
> + kfree_skb(skb);
> +
> /* This Intel specific command enables the manufacturer mode of the
> * controller.
> *
>
> Totally untested, but it seems we forget to free that SKB and it would explain if you only see this on Intel Wilkens Peak and Stone Peak controllers.

Marcel,

Thanks. My knowledge of btusb and the hci library is too meagre for me to find
leaks.

I have tested this patch. To me, this looks like a real leak, particularly since
skb is overwritten a few lines down; however, the one I see is still present. My
device does load a firmware patch file. This patch may or may not be needed, but
it does no harm that I see.

For the record, my device loads patch file ibt-hw-37.7.10-fw-1.80.2.3.d.bseq.

Larry

2015-06-06 06:23:17

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Memory leak in btusb

Hi Larry,

> While using kmemleak to check for memory leaks in a wireless driver, I noticed the following stack traceback for a leak in btusb:
>
> <ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
> [<ffffffffa06029d6>] btusb_recv_intr+0x136/0x180 [btusb]
> [<ffffffffa0602ad8>] btusb_intr_complete+0xb8/0x150 [btusb]
> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120
>
> To eliminate a false positive, I unloaded the driver and got the following for the virtual address:
>
> [<ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
> [<ffffffffa06029d6>] 0xffffffffa06029d6
> [<ffffffffa0602ad8>] 0xffffffffa0602ad8
> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120
>
> The source is from wireless-drivers-next.git with pulls from bluetooth-next.git and Torvalds mainline git repo that were done on May 25. The source includes commit 04b8c8143d46453a443ac32bfcd76ec952605765 with the subject "Bluetooth: btusb: fix Realtek suspend/resume".
>
> The Bluetooth device in use is made by Intel with USB ID 8087:07dc.

actually if this is only this device causing it, this might fix it:

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 94c6c048130f..e6815c678898 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -1614,6 +1614,8 @@ static int btusb_setup_intel(struct hci_dev *hdev)
}
fw_ptr = fw->data;

+ kfree_skb(skb);
+
/* This Intel specific command enables the manufacturer mode of the
* controller.
*

Totally untested, but it seems we forget to free that SKB and it would explain if you only see this on Intel Wilkens Peak and Stone Peak controllers.

Regards

Marcel


2015-06-06 06:20:03

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Memory leak in btusb

Hi Larry,

> While using kmemleak to check for memory leaks in a wireless driver, I noticed the following stack traceback for a leak in btusb:
>
> <ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
> [<ffffffffa06029d6>] btusb_recv_intr+0x136/0x180 [btusb]
> [<ffffffffa0602ad8>] btusb_intr_complete+0xb8/0x150 [btusb]
> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120
>
> To eliminate a false positive, I unloaded the driver and got the following for the virtual address:
>
> [<ffffffff8160257e>] __alloc_skb+0x7e/0x2b0
> [<ffffffffa06029d6>] 0xffffffffa06029d6
> [<ffffffffa0602ad8>] 0xffffffffa0602ad8
> [<ffffffff8156ccb2>] __usb_hcd_giveback_urb+0x72/0x120
>
> The source is from wireless-drivers-next.git with pulls from bluetooth-next.git and Torvalds mainline git repo that were done on May 25. The source includes commit 04b8c8143d46453a443ac32bfcd76ec952605765 with the subject "Bluetooth: btusb: fix Realtek suspend/resume".
>
> The Bluetooth device in use is made by Intel with USB ID 8087:07dc.

does this only happen with the Intel device or with all Bluetooth USB dongles?

Regards

Marcel