2014-02-13 18:03:05

by Ivaylo Dimitrov

[permalink] [raw]
Subject: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624

Since next-21012014 I started to receive "Received disconnect from
x.x.x.x: 2: Bad packet length 3149594624" errors when trying to
establish ssh session over wifi to my N900, running Maemo 5. I bisected
and it turned out that commit dce5c9e35bc4085bd33eccdb8c9ec5a643507a14
"ARM: 7928/1: kconfig: select HAVE_EFFICIENT_UNALIGNED_ACCESS for CPUv6+
&& MMU" is the first bad. There is nothing related in either dmesg or
syslog on the device. Reverting that commit on 3.14-rc1 make those
errors disappear.

As my knowledge on how to deal with that stops here, advises on what to
do next to find the real problem are appreciated.

Ivo


2014-02-13 18:21:49

by Will Deacon

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624

Hi Ivo,

Thanks for bisecting this.

On Thu, Feb 13, 2014 at 06:02:57PM +0000, Ivaylo Dimitrov wrote:
> Since next-21012014 I started to receive "Received disconnect from
> x.x.x.x: 2: Bad packet length 3149594624" errors when trying to
> establish ssh session over wifi to my N900, running Maemo 5. I bisected
> and it turned out that commit dce5c9e35bc4085bd33eccdb8c9ec5a643507a14
> "ARM: 7928/1: kconfig: select HAVE_EFFICIENT_UNALIGNED_ACCESS for CPUv6+
> && MMU" is the first bad. There is nothing related in either dmesg or
> syslog on the device. Reverting that commit on 3.14-rc1 make those
> errors disappear.

That's certainly unexpected. The n900 has (iirc) a Cortex-A8, which as an
ARMv7 core, can perform unaligned accesses to normal, cacheable memory in
hardware.

Can you provide your .config and/or any information about your network chip
please? There's a chance that the driver is doing something odd.

Cheers,

Will

2014-02-13 18:43:23

by Ivaylo Dimitrov

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624



On 13.02.2014 20:21, Will Deacon wrote:
>
> That's certainly unexpected. The n900 has (iirc) a Cortex-A8, which as an
> ARMv7 core, can perform unaligned accesses to normal, cacheable memory in
> hardware.
>

Yep, Cortex-A8 and it has no problem to do unaligned memory accesses
AFAIK. I suspect it is a driver issue, not CPU.

> Can you provide your .config and/or any information about your network chip
> please? There's a chance that the driver is doing something odd.

The chip is TI wl1251, you can find the config file here(actually this
is the tree I am using to boot 3.14-rc1 on N900) -
https://gitorious.org/linux-n900/freemangordons-linux-n900/source/1434dbd7fbc5ec257b6cd6c547689b79177d1937:arch/arm/configs/rx51_defconfig

Thanks,
Ivo

2014-02-13 19:29:48

by Will Deacon

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624

On Thu, Feb 13, 2014 at 06:43:15PM +0000, Ivaylo Dimitrov wrote:
>
>
> On 13.02.2014 20:21, Will Deacon wrote:
> >
> > That's certainly unexpected. The n900 has (iirc) a Cortex-A8, which as an
> > ARMv7 core, can perform unaligned accesses to normal, cacheable memory in
> > hardware.
> >
>
> Yep, Cortex-A8 and it has no problem to do unaligned memory accesses
> AFAIK. I suspect it is a driver issue, not CPU.
>
> > Can you provide your .config and/or any information about your network chip
> > please? There's a chance that the driver is doing something odd.
>
> The chip is TI wl1251, you can find the config file here(actually this
> is the tree I am using to boot 3.14-rc1 on N900) -
> https://gitorious.org/linux-n900/freemangordons-linux-n900/source/1434dbd7fbc5ec257b6cd6c547689b79177d1937:arch/arm/configs/rx51_defconfig

Ok, so based on that config I think we can narrow down the unaligned use to
the following files:


# Crypto
crypto/memneq.c:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
crypto/memneq.c:#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
crypto/memneq.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
crypto/memneq.c:#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */

# Network
include/linux/etherdevice.h:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
include/linux/etherdevice.h:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
include/linux/etherdevice.h:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
include/linux/etherdevice.h:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
include/linux/etherdevice.h:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
net/mac80211/rx.c:#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
net/ipv4/netfilter/arp_tables.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

# Probably not relevant
kernel/printk/printk.c:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
kernel/taskstats.c:#if defined(CONFIG_64BIT) && !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
lib/lz4/lz4defs.h:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) \
lib/lz4/lz4defs.h:#else /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
lib/lzo/lzo1x_compress.c:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && defined(LZO_USE_CTZ64)
lib/lzo/lzo1x_compress.c:#elif defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && defined(LZO_USE_CTZ32)
lib/lzo/lzo1x_decompress_safe.c:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
lib/lzo/lzo1x_decompress_safe.c:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
lib/lzo/lzo1x_decompress_safe.c:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
lib/strncpy_from_user.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
lib/zlib_inflate/inffast.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS


Can you try hacking crypto/memneq.c so that it doesn't use
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS please? That would at least point the
finger at net/mac80211/rx.c or similar.

Cheers,

Will

2014-02-14 16:12:50

by Ivaylo Dimitrov

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624



On 13.02.2014 21:29, Will Deacon wrote:
>
> Can you try hacking crypto/memneq.c so that it doesn't use
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS please? That would at least point the
> finger at net/mac80211/rx.c or similar.
>
> Cheers,
>
> Will
>

Well, I am lazy so I hacked net/mac80211/rx.c first:

index c24ca0d..6839c77 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1963,7 +1963,7 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
}
}

-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+//#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
if (skb) {
/* 'align' will only take the values 0 or 2 here since all
* frames are required to be aligned to 2-byte boundaries
@@ -1987,7 +1987,7 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
}
}
}
-#endif
+//#endif

if (skb) {
/* deliver to local stack */


and that seems to fix the problem.

I am not sure whom I should forward the problem.

Thanks,
Ivo

2014-02-14 16:25:47

by Will Deacon

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624

On Fri, Feb 14, 2014 at 04:12:44PM +0000, Ivaylo Dimitrov wrote:
> On 13.02.2014 21:29, Will Deacon wrote:
> >
> > Can you try hacking crypto/memneq.c so that it doesn't use
> > CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS please? That would at least point the
> > finger at net/mac80211/rx.c or similar.
> >
>
> Well, I am lazy so I hacked net/mac80211/rx.c first:

No problem, thanks for having a go.

> index c24ca0d..6839c77 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1963,7 +1963,7 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
> }
> }
>
> -#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +//#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> if (skb) {
> /* 'align' will only take the values 0 or 2 here since all
> * frames are required to be aligned to 2-byte boundaries
> @@ -1987,7 +1987,7 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
> }
> }
> }
> -#endif
> +//#endif
>
> if (skb) {
> /* deliver to local stack */
>
>
> and that seems to fix the problem.
>
> I am not sure whom I should forward the problem.

Well, we probably need a bit more to go on, because I doubt that this code
is to blame. More likely, the issue is in the caller.
Looking at drivers/net/wireless/ti/wl1251/rx.c:182

/* The actual length doesn't include the target's alignment */
skb->len = desc->length - PLCP_HEADER_LENGTH;

fc = (u16 *)skb->data;

if ((*fc & IEEE80211_FCTL_STYPE) == IEEE80211_STYPE_BEACON)
beacon = 1;

wl1251_rx_status(wl, desc, &status, beacon);

wl1251_debug(DEBUG_RX, "rx skb 0x%p: %d B %s", skb, skb->len,
beacon ? "beacon" : "");

memcpy(IEEE80211_SKB_RXCB(skb), &status, sizeof(status));
ieee80211_rx_ni(wl->hw, skb);

I wonder whether that first line (with the comment about alignment) is
assuming some behaviour from the mac80211 layer.

You could try putting back the UNALIGNED_ACCESS in net/mac80211/rx.c and
commenting out the skb->len = desc->length - PLCP_HEADER_LENGTH; line
above.

Adding the original author (I think) and John Linville, since I'm well out
of my depth in this code!

Will

2014-02-14 17:11:34

by Ivaylo Dimitrov

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624



On 14.02.2014 18:24, Will Deacon wrote:
>
> You could try putting back the UNALIGNED_ACCESS in net/mac80211/rx.c and
> commenting out the skb->len = desc->length - PLCP_HEADER_LENGTH; line
> above.

This seems to fix the issue too, but I am afraid there is more to be
done, as I see

Feb 14 18:33:46 Nokia-N900 kernel: [ 88.599853] wlan0: authenticate
with 00:23:cd:17:86:d0
Feb 14 18:33:46 Nokia-N900 kernel: [ 88.781860] wlan0: send auth to
00:23:cd:17:86:d0 (try 1/3)
Feb 14 18:33:46 Nokia-N900 kernel: [ 88.799804] wlan0: authenticated
Feb 14 18:33:46 Nokia-N900 kernel: [ 88.806243] wlan0: associating
with AP with corrupt probe response
Feb 14 18:33:46 Nokia-N900 kernel: [ 88.814544] wlan0: associate with
00:23:cd:17:86:d0 (try 1/3)
Feb 14 18:33:46 Nokia-N900 kernel: [ 88.826416] wlan0: RX AssocResp
from 00:23:cd:17:86:d0 (capab=0x431 status=0 aid=2)

spit in dmesg log. AFAIR this is the first time I see "associating with
AP with corrupt probe response" in dmesg, no matter of the kernel
version. Despite of that error message, wlan seems to work as it should,
so far. I guess we need the driver's author saying on the issue.

Thanks,
Ivo

2014-02-23 19:06:44

by Ivaylo Dimitrov

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624



On 14.02.2014 19:11, Ivaylo Dimitrov wrote:
>
>
> On 14.02.2014 18:24, Will Deacon wrote:
>>
>> You could try putting back the UNALIGNED_ACCESS in net/mac80211/rx.c and
>> commenting out the skb->len = desc->length - PLCP_HEADER_LENGTH; line
>> above.
>
> This seems to fix the issue too, but I am afraid there is more to be
> done, as I see
>
> Feb 14 18:33:46 Nokia-N900 kernel: [ 88.599853] wlan0: authenticate
> with 00:23:cd:17:86:d0
> Feb 14 18:33:46 Nokia-N900 kernel: [ 88.781860] wlan0: send auth to
> 00:23:cd:17:86:d0 (try 1/3)
> Feb 14 18:33:46 Nokia-N900 kernel: [ 88.799804] wlan0: authenticated
> Feb 14 18:33:46 Nokia-N900 kernel: [ 88.806243] wlan0: associating
> with AP with corrupt probe response
> Feb 14 18:33:46 Nokia-N900 kernel: [ 88.814544] wlan0: associate with
> 00:23:cd:17:86:d0 (try 1/3)
> Feb 14 18:33:46 Nokia-N900 kernel: [ 88.826416] wlan0: RX AssocResp
> from 00:23:cd:17:86:d0 (capab=0x431 status=0 aid=2)
>
> spit in dmesg log. AFAIR this is the first time I see "associating with
> AP with corrupt probe response" in dmesg, no matter of the kernel
> version. Despite of that error message, wlan seems to work as it should,
> so far. I guess we need the driver's author saying on the issue.
>
> Thanks,
> Ivo

ping?

2014-02-24 16:53:49

by Kalle Valo

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624

Will Deacon <[email protected]> writes:
> Well, we probably need a bit more to go on, because I doubt that this code
> is to blame. More likely, the issue is in the caller.
> Looking at drivers/net/wireless/ti/wl1251/rx.c:182
>
> /* The actual length doesn't include the target's alignment */
> skb->len = desc->length - PLCP_HEADER_LENGTH;
>
> fc = (u16 *)skb->data;
>
> if ((*fc & IEEE80211_FCTL_STYPE) == IEEE80211_STYPE_BEACON)
> beacon = 1;
>
> wl1251_rx_status(wl, desc, &status, beacon);
>
> wl1251_debug(DEBUG_RX, "rx skb 0x%p: %d B %s", skb, skb->len,
> beacon ? "beacon" : "");
>
> memcpy(IEEE80211_SKB_RXCB(skb), &status, sizeof(status));
> ieee80211_rx_ni(wl->hw, skb);
>
> I wonder whether that first line (with the comment about alignment) is
> assuming some behaviour from the mac80211 layer.
>
> You could try putting back the UNALIGNED_ACCESS in net/mac80211/rx.c and
> commenting out the skb->len = desc->length - PLCP_HEADER_LENGTH; line
> above.
>
> Adding the original author (I think) and John Linville, since I'm well out
> of my depth in this code!

I haven't touched wl1251 for years. You should send questions to
linux-wireless list and CC people who have worked on it recently.

--
Kalle Valo

2014-02-25 07:56:20

by Ivaylo Dimitrov

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624

Hi,

On 14.02.2014 18:24, Will Deacon wrote:
> You could try putting back the UNALIGNED_ACCESS in net/mac80211/rx.c and
> commenting out the skb->len = desc->length - PLCP_HEADER_LENGTH; line
> above.
>

the following patch

diff --git a/drivers/net/wireless/ti/wl1251/rx.c
b/drivers/net/wireless/ti/wl1251/rx.c
index 123c4bb..cde0eaf 100644
--- a/drivers/net/wireless/ti/wl1251/rx.c
+++ b/drivers/net/wireless/ti/wl1251/rx.c
@@ -180,7 +180,7 @@ static void wl1251_rx_body(struct wl1251 *wl,
wl1251_mem_read(wl, rx_packet_ring_addr, rx_buffer, length);

/* The actual length doesn't include the target's alignment */
- skb->len = desc->length - PLCP_HEADER_LENGTH;
+ skb_trim(skb, desc->length - PLCP_HEADER_LENGTH);

fc = (u16 *)skb->data;

seems to fix the issue, including those "corrupt probe response"
messages in dmesg log (I took that 'skb_trim' from the original Nokia
kernel). Will send a properly formatted patch shortly.

Thanks,
Ivo

2014-02-25 10:44:48

by Pavel Machek

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624

Hi!

> >>You could try putting back the UNALIGNED_ACCESS in net/mac80211/rx.c and
> >>commenting out the skb->len = desc->length - PLCP_HEADER_LENGTH; line
> >>above.
> >
> >This seems to fix the issue too, but I am afraid there is more to be
> >done, as I see
> >
> >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.599853] wlan0: authenticate
> >with 00:23:cd:17:86:d0
> >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.781860] wlan0: send auth to
> >00:23:cd:17:86:d0 (try 1/3)
> >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.799804] wlan0: authenticated
> >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.806243] wlan0: associating
> >with AP with corrupt probe response
> >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.814544] wlan0: associate with
> >00:23:cd:17:86:d0 (try 1/3)
> >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.826416] wlan0: RX AssocResp
> >from 00:23:cd:17:86:d0 (capab=0x431 status=0 aid=2)
> >
> >spit in dmesg log. AFAIR this is the first time I see "associating with
> >AP with corrupt probe response" in dmesg, no matter of the kernel
> >version. Despite of that error message, wlan seems to work as it should,
> >so far. I guess we need the driver's author saying on the issue.
>
> ping?

If Will does not respond, we should simply revert
dce5c9e35bc4085bd33eccdb8c9ec5a643507a14 . Regressions are not
acceptable.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-02-25 10:57:06

by Will Deacon

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624

On Tue, Feb 25, 2014 at 10:44:44AM +0000, Pavel Machek wrote:
> > >>You could try putting back the UNALIGNED_ACCESS in net/mac80211/rx.c and
> > >>commenting out the skb->len = desc->length - PLCP_HEADER_LENGTH; line
> > >>above.
> > >
> > >This seems to fix the issue too, but I am afraid there is more to be
> > >done, as I see
> > >
> > >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.599853] wlan0: authenticate
> > >with 00:23:cd:17:86:d0
> > >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.781860] wlan0: send auth to
> > >00:23:cd:17:86:d0 (try 1/3)
> > >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.799804] wlan0: authenticated
> > >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.806243] wlan0: associating
> > >with AP with corrupt probe response
> > >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.814544] wlan0: associate with
> > >00:23:cd:17:86:d0 (try 1/3)
> > >Feb 14 18:33:46 Nokia-N900 kernel: [ 88.826416] wlan0: RX AssocResp
> > >from 00:23:cd:17:86:d0 (capab=0x431 status=0 aid=2)
> > >
> > >spit in dmesg log. AFAIR this is the first time I see "associating with
> > >AP with corrupt probe response" in dmesg, no matter of the kernel
> > >version. Despite of that error message, wlan seems to work as it should,
> > >so far. I guess we need the driver's author saying on the issue.
> >
> > ping?
>
> If Will does not respond, we should simply revert
> dce5c9e35bc4085bd33eccdb8c9ec5a643507a14 . Regressions are not
> acceptable.

Seriously Pavel, give me a break. I already responded pretty quickly to this
bug report, took a look at where the issue could be and proposed a fix
(which Ivaylo tested successfully). Now, wireless drivers are *waay* outside
my area of expertise, so I added some other folks to CC in the hope that
they could help turn my fix into something upstreamable. Meanwhile, Ivaylo
looks like he has something better than what I came up with already.

If you care about this driver, why don't you help us fix it rather than
making unhelpful suggestions from the sidelines?

Will

2014-02-25 10:58:38

by Will Deacon

[permalink] [raw]
Subject: Re: [BISECTED] ssh - Received disconnect from x.x.x.x: 2: Bad packet length 3149594624

Hi Ivaylo,

On Tue, Feb 25, 2014 at 07:56:11AM +0000, Ivaylo Dimitrov wrote:
> On 14.02.2014 18:24, Will Deacon wrote:
> > You could try putting back the UNALIGNED_ACCESS in net/mac80211/rx.c and
> > commenting out the skb->len = desc->length - PLCP_HEADER_LENGTH; line
> > above.
> >
>
> the following patch
>
> diff --git a/drivers/net/wireless/ti/wl1251/rx.c
> b/drivers/net/wireless/ti/wl1251/rx.c
> index 123c4bb..cde0eaf 100644
> --- a/drivers/net/wireless/ti/wl1251/rx.c
> +++ b/drivers/net/wireless/ti/wl1251/rx.c
> @@ -180,7 +180,7 @@ static void wl1251_rx_body(struct wl1251 *wl,
> wl1251_mem_read(wl, rx_packet_ring_addr, rx_buffer, length);
>
> /* The actual length doesn't include the target's alignment */
> - skb->len = desc->length - PLCP_HEADER_LENGTH;
> + skb_trim(skb, desc->length - PLCP_HEADER_LENGTH);
>
> fc = (u16 *)skb->data;
>
> seems to fix the issue, including those "corrupt probe response"
> messages in dmesg log (I took that 'skb_trim' from the original Nokia
> kernel). Will send a properly formatted patch shortly.

Good idea -- don't forget to CC linux-wireless, as suggested by Kalle.

Will