Return-Path: MIME-Version: 1.0 In-Reply-To: References: <1283519240-1235-1-git-send-email-Andrei.Emeltchenko.news@gmail.com> Date: Mon, 6 Sep 2010 16:25:19 +0300 Message-ID: Subject: Re: [PATCH] Bluetooth: check L2CAP length in first ACL fragment From: Andrei Emeltchenko To: Mat Martineau Cc: linux-bluetooth@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-bluetooth-owner@vger.kernel.org List-ID: Hi, On Mon, Sep 6, 2010 at 2:32 PM, Andrei Emeltchenko wrote: > Hi, > > On Fri, Sep 3, 2010 at 7:26 PM, Mat Martineau wrote: >>> Trace below is received when using stress tools sending big >>> fragmented L2CAP packets. >>> ... >>> [ 1712.798492] swapper: page allocation failure. order:4, mode:0x4020 >>> [ 1712.804809] [] (unwind_backtrace+0x0/0xdc) from [] >>> (__alloc_pages_nodemask+0x4) >>> [ 1712.814666] [] (__alloc_pages_nodemask+0x47c/0x4d4) from >>> [] (__get_free_pages+) >>> [ 1712.824645] [] (__get_free_pages+0x10/0x3c) from [] >>> (__alloc_skb+0x4c/0xfc) >>> [ 1712.833465] [] (__alloc_skb+0x4c/0xfc) from [] >>> (l2cap_recv_acldata+0xf0/0x1f8 ) >>> [ 1712.843322] [] (l2cap_recv_acldata+0xf0/0x1f8 [l2cap]) from >>> [] (hci_rx_task+0x) >>> ... >> >> What is logged right before this allocation failure? ?There should be a >> "Start: total len %d, frag len %d" line. ?Actually, all log messages from >> l2cap_recv_acldata leading up to the failure would be helpful. > > When I enable logs system behave differently because of huge traffic > to syslog :-) > >>> After applying patch dmesg looks like: >>> ... >>> l2cap_recv_acldata: Frame exceeding recv MTU (len 64182, MTU 1013) >>> l2cap_recv_acldata: Unexpected continuation frame (len 34) >>> ... >>> >>> Signed-off-by: Andrei Emeltchenko >>> --- >>> net/bluetooth/l2cap.c | ? 11 +++++++++++ >>> 1 files changed, 11 insertions(+), 0 deletions(-) >>> >>> diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c >>> index 9fad312..21824d7 100644 >>> --- a/net/bluetooth/l2cap.c >>> +++ b/net/bluetooth/l2cap.c >>> @@ -4654,6 +4654,8 @@ static int l2cap_recv_acldata(struct hci_conn *hcon, >>> struct sk_buff *skb, u16 fl >>> >>> ? ? ? ?if (flags & ACL_START) { >>> ? ? ? ? ? ? ? ?struct l2cap_hdr *hdr; >>> + ? ? ? ? ? ? ? struct sock *sk; >>> + ? ? ? ? ? ? ? u16 cid; >>> ? ? ? ? ? ? ? ?int len; >>> >>> ? ? ? ? ? ? ? ?if (conn->rx_len) { >>> @@ -4672,6 +4674,7 @@ static int l2cap_recv_acldata(struct hci_conn *hcon, >>> struct sk_buff *skb, u16 fl >>> d >>> ? ? ? ? ? ? ? ?hdr = (struct l2cap_hdr *) skb->data; >>> ? ? ? ? ? ? ? ?len = __le16_to_cpu(hdr->len) + L2CAP_HDR_SIZE; >>> + ? ? ? ? ? ? ? cid = __le16_to_cpu(hdr->cid); >> >> Some stress tools also send L2CAP data one byte at a time - the start >> fragment may not have all four header bytes. ?l2cap_recv_acldata() currently >> allows short start fragments with two or three bytes, which would not >> contain a valid CID. > > Yes currently we allow 2 bytes in start fragment and check hdr->len. > What about following change (require 4 bytes in start fragment) > > --------------------------------------------- > - ? ? ? ? ? ? ? if (skb->len < 2) { > + ? ? ? ? ? ? ? if (skb->len < L2CAP_HDR_SIZE) { > ? ? ? ? ? ? ? ? ? ? ? ?BT_ERR("Frame is too short (len %d)", skb->len); > ? ? ? ? ? ? ? ? ? ? ? ?l2cap_conn_unreliable(conn, ECOMM); > ? ? ? ? ? ? ? ? ? ? ? ?goto drop; > --------------------------------------------- I have found in "BLUETOOTH SPECIFICATION Version 4.0 [Vol 3] page 36" "Note: Start Fragments always begin with the Basic L2CAP header of a PDU." So the statement above is correct. Regards, Andrei > >>> ? ? ? ? ? ? ? ?if (len == skb->len) { >>> ? ? ? ? ? ? ? ? ? ? ? ?/* Complete frame received */ >>> @@ -4688,6 +4691,14 @@ static int l2cap_recv_acldata(struct hci_conn >>> *hcon, struct sk_buff *skb, u16 fl >>> ? ? ? ? ? ? ? ? ? ? ? ?goto drop; >>> ? ? ? ? ? ? ? ?} >>> >>> + ? ? ? ? ? ? ? sk = l2cap_get_chan_by_scid(&conn->chan_list, cid); >> >> There are several issues here. >> >> l2cap_get_chan_by_scid() might return NULL, which will definitely happen >> with signaling or connectionless data frames. >> >> l2cap_get_chan_by_scid() also calls bh_lock_sock() if a channel is found, >> and I don't see that you added a matching call to bh_unlock_sock() if >> l2cap_get_chan_by_scid() returns a non-NULL value. > > My mistake here. What about following check: > > --------------------------------------- > @@ -3916,6 +3919,19 @@ static int l2cap_recv_acldata(struct hci_conn > *hcon, struct sk_buff *skb, > ? ? ? ? ? ? ? ? ? ? ? ?goto drop; > ? ? ? ? ? ? ? ?} > > + ? ? ? ? ? ? ? sk = l2cap_get_chan_by_scid(&conn->chan_list, cid); > + ? ? ? ? ? ? ? if (!sk) > + ? ? ? ? ? ? ? ? ? ? ? goto drop; > + > + ? ? ? ? ? ? ? if (l2cap_pi(sk)->imtu < len) { > + ? ? ? ? ? ? ? ? ? ? ? BT_ERR("Frame exceeding recv MTU (len %d, MTU %d)", > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? len, l2cap_pi(sk)->imtu); > + ? ? ? ? ? ? ? ? ? ? ? conn->rx_len = 0; /* needed? */ > + ? ? ? ? ? ? ? ? ? ? ? bh_unlock_sock(sk); > + ? ? ? ? ? ? ? ? ? ? ? goto drop; > + ? ? ? ? ? ? ? } else > + ? ? ? ? ? ? ? ? ? ? ? bh_unlock_sock(sk); > + > ? ? ? ? ? ? ? ?/* Allocate skb for the complete frame (with header) */ > ? ? ? ? ? ? ? ?conn->rx_skb = bt_skb_alloc(len, GFP_ATOMIC); > ? ? ? ? ? ? ? ?if (!conn->rx_skb) > > -------------------------------------- > >> >>> + ? ? ? ? ? ? ? if (l2cap_pi(sk)->imtu < len) { >>> + ? ? ? ? ? ? ? ? ? ? ? BT_ERR("Frame exceeding recv MTU (len %d, MTU >>> %d)", >>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? len, l2cap_pi(sk)->imtu); >>> + ? ? ? ? ? ? ? ? ? ? ? conn->rx_len = 0; /* needed? */ >>> + ? ? ? ? ? ? ? ? ? ? ? goto drop; >>> + ? ? ? ? ? ? ? } >>> + >>> ? ? ? ? ? ? ? ?/* Allocate skb for the complete frame (with header) */ >>> ? ? ? ? ? ? ? ?conn->rx_skb = bt_skb_alloc(len, GFP_ATOMIC); >>> ? ? ? ? ? ? ? ?if (!conn->rx_skb) >> >> In general, this approach adds an extra channel lookup and an extra >> lock/unlock on every ACL start frame. > > Yes this adds extra lock/unlock but only if the frame is not complete. There > shouldn't be many fragmented L2CAP packets. > >> ?Even if the MTU is known to be >> exceeded on with the start frame, no more than 64k is ever allocated (which >> shouldn't cause problems in itself). > > I think that for the mobile device this can be a problem since this > shall be contiguous > memory. > >> The root of the problem may be heap >> corruption due to a buffer overrun, which seems possible due to the crash >> deep in the memory allocator. > > Regards, > Andrei >