Return-Path: Sender: "Gustavo F. Padovan" Date: Wed, 15 Sep 2010 15:42:35 -0300 From: "Gustavo F. Padovan" To: Mat Martineau Cc: linux-bluetooth@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, marcel@holtmann.org, davem@davemloft.net Subject: Re: Possible regression with skb_clone() in 2.6.36 Message-ID: <20100915184235.GA31685@vigoh> References: <1283988727-1456-1-git-send-email-padovan@profusion.mobi> <20100910194509.GC19693@vigoh> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20100910194509.GC19693@vigoh> List-ID: Hi Mat, * Gustavo F. Padovan [2010-09-10 16:45:09 -0300]: > Hi Mat, > > * Mat Martineau [2010-09-10 09:53:31 -0700]: > > > > > Gustavo - > > > > I'm not sure why the streaming code used to work, but this does not > > look like an skb_clone() problem. Your patch to remove the > > skb_clone() call in l2cap_streaming_send() addresses the root cause of > > this crash. > > > > On Wed, 8 Sep 2010, Gustavo F. Padovan wrote: > > > > > I've been experiencing some problems when running the L2CAP Streaming mode in > > > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That > > > wasn't happening before, so I think we may have a regression here (I didn't > > > find where yet). The crash log is below. > > > > > > The following patch does not fix the regression, but shows that removing the > > > skb_clone() call from l2cap_streaming_send() we workaround the problem. The > > > patch is good anyway because it saves memory and time. > > > > > > By now I have no idea on how to fix this. > > > > > > > > > > This has to do with the sk->sk_wmem_alloc accounting that controls the > > amount of write buffer space used on the socket. > > > > When the L2CAP streaming mode socket segments its data, it allocates > > memory using sock_alloc_send_skb() (via bt_skb_send_alloc()). Before > > that allocation call returns, skb_set_owner_w() is called on the new > > skb. This adds to sk->sk_wmem_alloc and sets skb->destructor so that > > sk->sk_wmem_alloc is correctly updated when the skb is freed. > > > > When that skb is cloned, the clone is not "owned" by the write buffer. > > The clone's destructor is set to NULL in __skb_clone(). The version > > of l2cap_streaming_send() that runs out of memory is passing the > > non-owned skb clone down to the HCI layer. The original skb (the one > > that's "owned by w") is immediately freed, which adjusts > > sk->sk_wmem_alloc back down - the socket thinks it has unlimited write > > buffer space. As a result, bt_skb_send_alloc() never blocks waiting > > for buffer space (or returns EAGAIN for nonblocking writes) and the > > HCI send queue keeps growing. > > If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on > the cloned skb should solve the problem, but it doesn't. That's exactly > what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was > working fine before, i.e, we have a regression here. I've run some other tests and what you said also fixes the problem for Streaming Mode. If I use skb_set_owner_w() on the cloned skb, everything works fine. But we still have the problem for ERTM as I described. send() blocks wainting for memory. The regression is there yet. :( > > > > > This isn't a problem for the ERTM sends, because the original skbs are > > kept in the ERTM tx queue until they are acked. Once they're acked, > > the write buffer space is freed and additional skbs can be allocated. > > It affects ERTM as well, but in that case the kernel doesn't crash > because ERTM block on sending trying to allocate memory. Then we are not > able to receive any ack (everything stays queued in sk_backlog_queue as > the sk is owned by the user) and ERTM stalls. > -- Gustavo F. Padovan ProFUSION embedded systems - http://profusion.mobi