Return-Path: Subject: Re: [RFC bluetooth-next 00/20] bluetooth: rework 6lowpan implementation To: Luiz Augusto von Dentz References: <20160711195044.25343-1-aar@pengutronix.de> <5b41b0e5-c017-20d9-c94f-11fb1ab2847d@pengutronix.de> Cc: linux-wpan@vger.kernel.org, kernel@pengutronix.de, kaspar@schleiser.de, Jukka Rissanen , "linux-bluetooth@vger.kernel.org" , Patrik Flykt From: Alexander Aring Message-ID: <9025f3ab-9b4d-5120-d078-35b6df5b8beb@pengutronix.de> Date: Wed, 13 Jul 2016 11:12:08 +0200 MIME-Version: 1.0 In-Reply-To: <5b41b0e5-c017-20d9-c94f-11fb1ab2847d@pengutronix.de> Content-Type: text/plain; charset=utf-8 Sender: linux-bluetooth-owner@vger.kernel.org List-ID: Hi, On 07/12/2016 08:35 PM, Alexander Aring wrote: > > Hi, > > On 07/12/2016 04:51 PM, Luiz Augusto von Dentz wrote: > ... >>> >>> HOW TO REPRODUCE: >>> >>> It's simple, do some high payloads which makes lot tx/rcv traffic on both sides: >>> >>> Node A: >>> >>> ping6 $IP_NODEB%6lo0 -s 60000 >>> >>> Node B: >>> >>> ping6 ff02::1%6lo0 >> >> Im not sure I understand what you are trying to do with a packet of >> 60Kb, the l2cap_chan can most likely only do 1280 bytes and even with > > 60Kb is just some incredible high value to produce on both sides tx/rx > data. The other side will normally send some "defragmentation failure" > icmp messages. > > This value is just to produce the high traffic, not practice payload. > >> fragmentation this will consume the credits quicker than we can >> receive more so it will eventually starting buffering until it gets >> stuck. Note that it is probably still receiving credits but we > > This is exactly what I like to do. Produce a lot of data to kill L3 > layer, but this should not kill the L2 layer. Killing the L2 layer is > what's happend here. > > If I do a lot of data on e.g. tcp traffic and the tcp connection will be > closed -> that's fine for me but afterwards starting a new L3 connection > should be still working. This is not the case here, I can kill L2 layer > and nothing works anymore, because L2 running into deadlock. > >> probably have so much data buffered that all the credits are consumed >> almost immediately after receiving, or you don't see -EAGAIN more than >> once? >> > > My example shows that we transmit some data on both nodes, that is > important to check the l2cap_chan_send return type. > > When I do that above and running into the deadlock the IPv6 connection > runs "NS" messages only, because it still can send on L3 layer but the > L2 layer is killed -> I get -EAGAIN in l2cap_chan_send always on both > nodes because tx_credits are on both nodes zero and will not be > incremented again. > > So far I know these tx_credits will be incremented only if somebody > sends something again, right? This will never be the case again and I am > stucked in a deadlock which should not be there. > > to your question: > > "or you don't see -EAGAIN more than once" > > I see -EAGAIN always at calling l2cap_chan_send on both nodes. I think I > can wait forever, after an year I still see "-EAGAIN" when calling > l2cap_chan_send. > > Nevertheless, I will try to hit the deadlock (-EAGAIN) on both nodes so > nobody transmits anything anymore. Then waiting six hours and look if > still -EAGAIN is there on both nodes. I will report again here, after > that time it should working again, but I don't believe that. > I waited about ~ 6hours, deadlock was still there. Node A: transmit return value -11 send 6lo0 failed chan f617cc00 xmit ndisc 6lo0 dst: ff:ff:ff:ff:ff:ff src: 5c:f3:70:68:c1:76 xmit 6lo0 starts multicasting l2cap_chan_send 6lo0 dst: 5c:f3:70:6b:3b:1f type 1 src: 5c:f3:70:68:c1:76 chan f617cc00 transmit return value -11 send 6lo0 failed chan f617cc00 xmit ndisc 6lo0 dst: ff:ff:ff:ff:ff:ff src: 5c:f3:70:68:c1:76 xmit 6lo0 starts multicasting l2cap_chan_send 6lo0 dst: 5c:f3:70:6b:3b:1f type 1 src: 5c:f3:70:68:c1:76 chan f617cc00 transmit return value -11 send 6lo0 failed chan f617cc00 xmit ndisc 6lo0 dst: ff:ff:ff:ff:ff:ff src: 5c:f3:70:68:c1:76 xmit 6lo0 starts multicasting l2cap_chan_send 6lo0 dst: 5c:f3:70:6b:3b:1f type 1 src: 5c:f3:70:68:c1:76 chan f617cc00 transmit return value -11 send 6lo0 failed chan f617cc00 xmit ndisc 6lo0 dst: ff:ff:ff:ff:ff:ff src: 5c:f3:70:68:c1:76 xmit 6lo0 starts multicasting l2cap_chan_send 6lo0 dst: 5c:f3:70:6b:3b:1f type 1 src: 5c:f3:70:68:c1:76 chan f617cc00 transmit return value -11 send 6lo0 failed chan f617cc00 Node B: xmit ndisc 6lo0 dst: ff:ff:ff:ff:ff:ff src: 5c:f3:70:6b:3b:1f xmit 6lo0 starts multicasting l2cap_chan_send 6lo0 dst: 5c:f3:70:68:c1:76 type 1 src: 5c:f3:70:6b:3b:1f chan f62aac00 transmit return value -11 send 6lo0 failed chan f62aac00 xmit ndisc 6lo0 dst: ff:ff:ff:ff:ff:ff src: 5c:f3:70:6b:3b:1f xmit 6lo0 starts multicasting l2cap_chan_send 6lo0 dst: 5c:f3:70:68:c1:76 type 1 src: 5c:f3:70:6b:3b:1f chan f62aac00 transmit return value -11 send 6lo0 failed chan f62aac00 xmit ndisc 6lo0 dst: ff:ff:ff:ff:ff:ff src: 5c:f3:70:6b:3b:1f xmit 6lo0 starts multicasting l2cap_chan_send 6lo0 dst: 5c:f3:70:68:c1:76 type 1 src: 5c:f3:70:6b:3b:1f chan f62aac00 transmit return value -11 send 6lo0 failed chan f62aac00 --- They simple show the same stuff, but with different addresses. I rethink about your words and I suppose you told me that tx_credits and -EAGAIN are normal that this stuff occurs. I agree, It also occurs before when I run: ping6 $IP_NODEB%6lo0 -s 60000 which is normal. But what should not happen is that there is some little tiny window (race) that both tx_credits are reached to zero and then nobody transmit anything anymore, that's the tx deadlock which I hit here. And this deadlock occurs that I killed the L2CAP chan connection. I think it will work when I create a new chan (because new tx_credits). The connection isn't broken, I just can not transmit anything because tx_credits. - Alex