2012-09-14 13:57:26

by Artem Bityutskiy

[permalink] [raw]
Subject: regression: tethering fails in 3.5 with iwlwifi

Hi Johannes,

We are using iwlwifi driver in Tizen IVI. We were using kernel v3.4 and
wifi tethering worked fine. But after upgrading to v3.5 it stopped
working.

I am trying to bisect this already many hours - very difficult, because
tethering did not work in kernels around v3.4-rcX for some different
reasons, so just 'git bisect' does not work.

Anyway, what I found out is:

bad: cb62ab7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
good: 31ed8e6 Merge branch 'dentry-cleanups' (dcache access cleanups and optimizations)

I spent a lot of time trying to bisect - tough, no luck. Then I the
following trick.

I reset to the last good commit: 31ed8e6

Now I know that Dave's tree broke tethering (merge cb62ab7 by Linus).

I start _partioally_ merge Dave's tree into 31ed8e6.

I found out this way the following:

bad: 816a785 David S. Miller Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
good: 011e3c6 David S. Miller Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Note, the second (good) line means that if you merge 011e3c6 to 31ed8e6
like this:

git reset --hard 31ed8e6
git merge 011e3c6

tethering works fine. The merge has no conflicts.

But if you merge 816a785 into 31ed8e6 like this:

git reset --hard 31ed8e6
git merge 011e3c6

then tethering does not work. There are non-trivial merge conflicts
though. Note also, that here I had to apply

bb3e10f mac80211: check IEEE80211_HW_QUEUE_CONTROL in ieee80211_check_queues()

because without this patch there was an WARN() and nothing worked at
all.

This all means that one of the following broke tehering:

1. David's merge of the wireless tree which contains a lot of your
iwlwifi changes.
2. The conflicts resolution.

How I resolved conflicts? I actually did the same David did it his
merge:

0d6c4a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

This is maximum I can do without considerable amount of effort. Any idea
what would the bug be?

Could you also take a look at the conflict resolutions for iwlwifi in
0d6c4a2?

Thanks a lot!

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-17 15:17:08

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Mon, 2012-09-17 at 17:15 +0200, Eric Dumazet wrote:
> Hi
>
> I would say some part of the stack assumes a header (I dont know wich
> one ?) is pulled in skb->head/data, and thats a bug.
>
> Always use pskb_may_pull(skb, XXX) to make sure you can access XXX
> bytes in skb->data
>
> It would help if you describe the problem, because I have no history (
> I am not linux-wireless subscriber).

No worries, I'll take care of it, I've just not gotten to it yet and for
that I apologize. I'll either look at it later today or tomorrow.

johannes


2012-09-18 13:15:16

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Tue, 2012-09-18 at 13:45 +0200, Johannes Berg wrote:
> On Tue, 2012-09-18 at 10:44 +0200, Johannes Berg wrote:
>
> > Thanks. I just got that far as well, but I hadn't set up a DHCP
> > server ... I'll debug further, I just confirmed that association still
> > works.
>
> Ok I can't reproduce the problem. You're going to have to be more
> specific about what connman sets up, I think.
>
> Here's what I did:
>
> * brctl addbr br0
> * ip addr add ... br0
> * ip link set br0 up
> * start dnsmasq on br0
> * start wpa_supplicant on wlan0, with -Dnl80211 -bbr0 -iwlan0 -c../cfg
> * brctl addif br0 wlan0
>
> Maybe this isn't exactly what connman does, but it works for me with
> kernel 3.5.4, a 6230 Intel device and 1.0 wpa_supplicant.
>
> Can you capture exactly what connman does?

I can with some amount of efforts - I'd need to read the code. If you
would be kind / enthusiastic enough to try this with Tizen images, that
would be a lot easier for me.

But let me CC Samuel who did tethering support for connman. Samuel, if
you wish to provide some input, here is the link to this discussion:

http://thread.gmane.org/gmane.linux.kernel.wireless.general/97389

> Also, are you sure that reverting just the carrier state patch isn't
> sufficient? It seems to me that the carrier state patch might confuse
> the bridge code, but I really see no reason Eric's change should have
> any impact at all.

Yes, 100% sure. I spent a lot of time bisecting this, and found out that
in my setup tethering starts working only if I revert both.

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-20 15:32:26

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 17:31 +0200, Eric Dumazet wrote:

> Artem, could you send one capture of one such DHCP big packet ?
>
> tcpdump -p -n -s 2000 -i wlanX -X
>

You also can add the -v flag



2012-09-20 15:27:17

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 17:22 +0200, Johannes Berg wrote:
> On Thu, 2012-09-20 at 17:18 +0200, Eric Dumazet wrote:
>
> > > Note I think the failing packets are dhcp packets, and connman seems to
> > > use an AF_INET, SOCK_DGRAM, IPPROTO_UDP socket for those, and binds it
> > > to the right device. I'd be quite surprised though if UDP code had
> > > issues with paged Rx??
>
> > You told me : socket(PF_PACKET, SOCK_RAW, htons(ETH_P_EAPOL))
> >
> > So I looked at raw code.
>
> and even found an issue. Artem wasn't really sure, but looking at his
> information again it seems that the EAPOL packets (those on the raw
> sockets) do go through, and then DHCP fails. In the printk he did for
> me, I can see that the EAPOL packets are small enough to get pulled in
> completely in iwlwifi, and the bigger DHCP packets (~600 bytes) only go
> through if we linearize them.

OK, its a bug in UDP, I'll send a patch asap.





2012-09-18 08:43:31

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Tue, 2012-09-18 at 11:40 +0300, Artem Bityutskiy wrote:
> On Tue, 2012-09-18 at 09:09 +0200, Johannes Berg wrote:
> > Hm. This is curious. *How* does AP mode fail? You said earlier stations
> > failed to connect, but can they actually get associated? I'm curious
> > because while I could see the carrier state thing causing issues (on old
> > hostapd versions that we didn't test), I'm not sure I see how the skb
> > payload makes a difference since mac80211 will linearize management
> > frames (so association should work) and data frames don't matter (so
> > authorization should work) ... unless there's a bug somewhere in
> > mac80211 where it forgets to pskb_may_pull(), but I don't see any such
> > place right now.
>
> I am not sure what happens. In the failure case the high-level symptom
> (in NetwokManager or in Windows) is that "failed to set IP address".
> I've captured 2 tcpdumps - for the working (those 2 patches reverted)
> and non-working cases (vanilla 3.5.4). I'll send them to you privately.

Thanks. I just got that far as well, but I hadn't set up a DHCP
server ... I'll debug further, I just confirmed that association still
works.

johannes


2012-09-20 13:22:08

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 16:22 +0300, Artem Bityutskiy wrote:

>
> OK, I've tried almost this (see below) and it solves my issue:
>
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index 965e6ec..7f079d0 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1798,9 +1798,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
>
> if (skb) {
> /* deliver to local stack */
> - skb->protocol = eth_type_trans(skb, dev);
> - memset(skb->cb, 0, sizeof(skb->cb));
> - netif_receive_skb(skb);
> + if (pskb_may_pull(skb, 40)) {
> + skb->protocol = eth_type_trans(skb, dev);
> + memset(skb->cb, 0, sizeof(skb->cb));
> + netif_receive_skb(skb);
> + } else {
> + kfree_skb(skb);
> + }
> }
> }
>

OK but you cant do that, or small frames will be dropped.

Anyway its a hack, we should find the buggy layer.

You could use dropwatch (drop_monitor) to check where frame is dropped.

modprobe drop_monitor
dropwatch -l kas




2012-09-20 12:47:13

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 14:45 +0200, Eric Dumazet wrote:
> I guess you only need to make sure 14 bytes of ethernet header are
> available before eth_type_trans(skb, dev);
>
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index 61c621e..ffe5f84 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1795,9 +1795,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
>
> if (skb) {
> /* deliver to local stack */
> - skb->protocol = eth_type_trans(skb, dev);
> - memset(skb->cb, 0, sizeof(skb->cb));
> - netif_receive_skb(skb);
> + if (pskb_may_pull(skb, sizeof(struct ethhdr))) {
> + skb->protocol = eth_type_trans(skb, dev);
> + memset(skb->cb, 0, sizeof(skb->cb));
> + netif_receive_skb(skb);
> + } else {
> + kfree_skb(skb);
> + }
> }
> }

Yeah I was looking at the same code just now. However, we had actually
inserted the skb_linearize() *after* eth_type_trans(), so I'm confused.
Maybe it still works, more or less by accident?

johannes


2012-09-20 14:25:54

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 16:22 +0300, Artem Bityutskiy wrote:
> On Thu, 2012-09-20 at 15:04 +0200, Eric Dumazet wrote:
> > Try to pull 40 bytes : Thats OK for tcp performance, because 40 bytes
> > is the minimum size of IP+TCP headers
> >
> > pskb_may_pull(skb, 40);
>
> OK, I've tried almost this (see below) and it solves my issue:
>
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index 965e6ec..7f079d0 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1798,9 +1798,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
>
> if (skb) {
> /* deliver to local stack */
> - skb->protocol = eth_type_trans(skb, dev);
> - memset(skb->cb, 0, sizeof(skb->cb));
> - netif_receive_skb(skb);
> + if (pskb_may_pull(skb, 40)) {
> + skb->protocol = eth_type_trans(skb, dev);
> + memset(skb->cb, 0, sizeof(skb->cb));
> + netif_receive_skb(skb);
> + } else {
> + kfree_skb(skb);
> + }
> }
> }
>

Please remove this hack and try the following bugfix in raw handler

icmp_filter() should not modify skb, or else its caller should not
assume ip_hdr() is unchanged.

net/ipv4/raw.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index f242578..3fa8c96 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -128,25 +128,30 @@ found:
}

/*
- * 0 - deliver
- * 1 - block
+ * false - deliver
+ * true - block
*/
-static __inline__ int icmp_filter(struct sock *sk, struct sk_buff *skb)
+static bool icmp_filter(struct sock *sk, const struct sk_buff *skb)
{
- int type;
-
- if (!pskb_may_pull(skb, sizeof(struct icmphdr)))
- return 1;
-
- type = icmp_hdr(skb)->type;
- if (type < 32) {
+ __u8 _type;
+ const __u8 *type;
+
+ type = skb_header_pointer(skb,
+ skb_transport_offset(skb) +
+ offsetof(struct icmphdr, type),
+ sizeof(_type),
+ &_type);
+ if (!type)
+ return true;
+
+ if (*type < 32) {
__u32 data = raw_sk(sk)->filter.data;

- return ((1 << type) & data) != 0;
+ return ((1U << *type) & data) != 0;
}

/* Do not block unknown ICMP types */
- return 0;
+ return false;
}

/* IP input processing comes here for RAW socket delivery.



2012-09-20 15:04:02

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 16:56 +0200, Eric Dumazet wrote:
> On Thu, 2012-09-20 at 17:37 +0300, Artem Bityutskiy wrote:
> > On Thu, 2012-09-20 at 16:25 +0200, Eric Dumazet wrote:
> > > Please remove this hack and try the following bugfix in raw handler
> > >
> > > icmp_filter() should not modify skb, or else its caller should not
> > > assume ip_hdr() is unchanged.
> >
> > Did not help :(
> >
>
> What gives (after some failures)
>
> cat /proc/net/raw

-sh-4.1# cat /proc/net/raw
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops


--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-20 13:15:50

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 15:04 +0200, Eric Dumazet wrote:
> Try to pull 40 bytes : Thats OK for tcp performance, because 40 bytes
> is the minimum size of IP+TCP headers
>
> pskb_may_pull(skb, 40);
>
> (instead of your skb_linearize(skb);)

Oh well, I just realize all these mails were on netdev.

Sorry for the noise.



2012-09-20 14:56:16

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 17:37 +0300, Artem Bityutskiy wrote:
> On Thu, 2012-09-20 at 16:25 +0200, Eric Dumazet wrote:
> > Please remove this hack and try the following bugfix in raw handler
> >
> > icmp_filter() should not modify skb, or else its caller should not
> > assume ip_hdr() is unchanged.
>
> Did not help :(
>

What gives (after some failures)

cat /proc/net/raw



2012-09-20 14:33:11

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 16:25 +0200, Eric Dumazet wrote:
> Please remove this hack and try the following bugfix in raw handler
>
> icmp_filter() should not modify skb, or else its caller should not
> assume ip_hdr() is unchanged.

Did not help :(

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-20 12:40:25

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

OK, but which netif_receive_skb(), as I count 5 occurrences in
net/mac80211/rx.c ?

Instead of skb_linearize() calls

you could try several values of XXX in

pskb_may_pull(skb, XXX)

So that you make sure XXX bytes are available in skb->head, and not
the whole frame.

One you know the limit for XXX, it might give a clue where a
pskb_may_pull(skb, XXX) is missing.

On Thu, Sep 20, 2012 at 2:35 PM, Johannes Berg
<[email protected]> wrote:
> Hi,
>
>> > 56138f5 iwlwifi: dont pull too much payload in skb head
>> > 3edaf3e mac80211: manage AP netdev carrier state
>>
>> The second patch (AP carrier state) actually exposed a connman issue
>> which I fixed and submitted a connman patch:
>> http://lists.connman.net/pipermail/connman/2012-September/011232.html
>>
>> However, Eric's patch still causes tethering problems to me.
>
>
> Let me recap a bit. Artem is using connman, which sets up the wifi
> interface as part of a bridge. It runs wpa_supplicant to create an AP
> (only AP and mesh mode interfaces can be bridged anyway).
>
>
> Eric, you said:
>
>> I would say some part of the stack assumes a header (I dont know wich
>> one ?) is pulled in skb->head/data, and thats a bug.
>>
>> Always use pskb_may_pull(skb, XXX) to make sure you can access XXX
>> bytes in skb->data
>
> I thought we'd figure out which part of the stack assumes a header, so I
> asked Artem to test a one-line patch which adds "skb_linearize()" before
> "netif_receive_skb()" in mac80211. This makes it work, but I'm not sure
> where after that the bad assumption might be.
>
> johannes
>

2012-09-14 16:34:14

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Fri, 2012-09-14 at 17:02 +0300, Artem Bityutskiy wrote:
> Hi Johannes,
>
> We are using iwlwifi driver in Tizen IVI. We were using kernel v3.4 and
> wifi tethering worked fine. But after upgrading to v3.5 it stopped
> working.
>
> I am trying to bisect this already many hours - very difficult, because
> tethering did not work in kernels around v3.4-rcX for some different
> reasons, so just 'git bisect' does not work.

OK, my latest finding is:

bad: 3edaf3e Johannes Berg mac80211: manage AP netdev carrier state
good: fe40cb6 Ashok Nagarajan mac80211: Check basic rates when peering

However, just reverting in 3.5 does not solve the issue. I am out of
ideas how to proceed.

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-20 12:54:01

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 14:45 +0200, Eric Dumazet wrote:
> I guess you only need to make sure 14 bytes of ethernet header are
> available before eth_type_trans(skb, dev);
>
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index 61c621e..ffe5f84 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1795,9 +1795,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
>
> if (skb) {
> /* deliver to local stack */
> - skb->protocol = eth_type_trans(skb, dev);
> - memset(skb->cb, 0, sizeof(skb->cb));
> - netif_receive_skb(skb);
> + if (pskb_may_pull(skb, sizeof(struct ethhdr))) {
> + skb->protocol = eth_type_trans(skb, dev);
> + memset(skb->cb, 0, sizeof(skb->cb));
> + netif_receive_skb(skb);
> + } else {
> + kfree_skb(skb);
> + }
> }
> }

Does not help, this one does:

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 61c621e..6888586 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1797,6 +1797,7 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
/* deliver to local stack */
skb->protocol = eth_type_trans(skb, dev);
memset(skb->cb, 0, sizeof(skb->cb));
+ skb_linearize(skb);
netif_receive_skb(skb);
}

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-20 15:22:21

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 17:18 +0200, Eric Dumazet wrote:

> > Note I think the failing packets are dhcp packets, and connman seems to
> > use an AF_INET, SOCK_DGRAM, IPPROTO_UDP socket for those, and binds it
> > to the right device. I'd be quite surprised though if UDP code had
> > issues with paged Rx??

> You told me : socket(PF_PACKET, SOCK_RAW, htons(ETH_P_EAPOL))
>
> So I looked at raw code.

and even found an issue. Artem wasn't really sure, but looking at his
information again it seems that the EAPOL packets (those on the raw
sockets) do go through, and then DHCP fails. In the printk he did for
me, I can see that the EAPOL packets are small enough to get pulled in
completely in iwlwifi, and the bigger DHCP packets (~600 bytes) only go
through if we linearize them.

johannes


2012-09-20 12:46:53

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

Or its the lines with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS ?

What arch is it ?


On Thu, Sep 20, 2012 at 2:45 PM, Eric Dumazet <[email protected]> wrote:
> I guess you only need to make sure 14 bytes of ethernet header are
> available before eth_type_trans(skb, dev);
>
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index 61c621e..ffe5f84 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1795,9 +1795,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
>
> if (skb) {
> /* deliver to local stack */
> - skb->protocol = eth_type_trans(skb, dev);
> - memset(skb->cb, 0, sizeof(skb->cb));
> - netif_receive_skb(skb);
> + if (pskb_may_pull(skb, sizeof(struct ethhdr))) {
> + skb->protocol = eth_type_trans(skb, dev);
> + memset(skb->cb, 0, sizeof(skb->cb));
> + netif_receive_skb(skb);
> + } else {
> + kfree_skb(skb);
> + }
> }
> }
>
>
> On Thu, Sep 20, 2012 at 2:40 PM, Eric Dumazet <[email protected]> wrote:
>> OK, but which netif_receive_skb(), as I count 5 occurrences in
>> net/mac80211/rx.c ?
>>
>> Instead of skb_linearize() calls
>>
>> you could try several values of XXX in
>>
>> pskb_may_pull(skb, XXX)
>>
>> So that you make sure XXX bytes are available in skb->head, and not
>> the whole frame.
>>
>> One you know the limit for XXX, it might give a clue where a
>> pskb_may_pull(skb, XXX) is missing.
>>
>> On Thu, Sep 20, 2012 at 2:35 PM, Johannes Berg
>> <[email protected]> wrote:
>>> Hi,
>>>
>>>> > 56138f5 iwlwifi: dont pull too much payload in skb head
>>>> > 3edaf3e mac80211: manage AP netdev carrier state
>>>>
>>>> The second patch (AP carrier state) actually exposed a connman issue
>>>> which I fixed and submitted a connman patch:
>>>> http://lists.connman.net/pipermail/connman/2012-September/011232.html
>>>>
>>>> However, Eric's patch still causes tethering problems to me.
>>>
>>>
>>> Let me recap a bit. Artem is using connman, which sets up the wifi
>>> interface as part of a bridge. It runs wpa_supplicant to create an AP
>>> (only AP and mesh mode interfaces can be bridged anyway).
>>>
>>>
>>> Eric, you said:
>>>
>>>> I would say some part of the stack assumes a header (I dont know wich
>>>> one ?) is pulled in skb->head/data, and thats a bug.
>>>>
>>>> Always use pskb_may_pull(skb, XXX) to make sure you can access XXX
>>>> bytes in skb->data
>>>
>>> I thought we'd figure out which part of the stack assumes a header, so I
>>> asked Artem to test a one-line patch which adds "skb_linearize()" before
>>> "netif_receive_skb()" in mac80211. This makes it work, but I'm not sure
>>> where after that the bad assumption might be.
>>>
>>> johannes
>>>

2012-09-20 13:04:51

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

Try to pull 40 bytes : Thats OK for tcp performance, because 40 bytes
is the minimum size of IP+TCP headers

pskb_may_pull(skb, 40);

(instead of your skb_linearize(skb);)


On Thu, Sep 20, 2012 at 2:58 PM, Artem Bityutskiy
<[email protected]> wrote:
> On Thu, 2012-09-20 at 14:45 +0200, Eric Dumazet wrote:
>> I guess you only need to make sure 14 bytes of ethernet header are
>> available before eth_type_trans(skb, dev);
>>
>> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
>> index 61c621e..ffe5f84 100644
>> --- a/net/mac80211/rx.c
>> +++ b/net/mac80211/rx.c
>> @@ -1795,9 +1795,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
>>
>> if (skb) {
>> /* deliver to local stack */
>> - skb->protocol = eth_type_trans(skb, dev);
>> - memset(skb->cb, 0, sizeof(skb->cb));
>> - netif_receive_skb(skb);
>> + if (pskb_may_pull(skb, sizeof(struct ethhdr))) {
>> + skb->protocol = eth_type_trans(skb, dev);
>> + memset(skb->cb, 0, sizeof(skb->cb));
>> + netif_receive_skb(skb);
>> + } else {
>> + kfree_skb(skb);
>> + }
>> }
>> }
>
> Does not help, this one does:
>
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index 61c621e..6888586 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1797,6 +1797,7 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
> /* deliver to local stack */
> skb->protocol = eth_type_trans(skb, dev);
> memset(skb->cb, 0, sizeof(skb->cb));
> + skb_linearize(skb);
> netif_receive_skb(skb);
> }
>
> --
> Best Regards,
> Artem Bityutskiy

2012-09-20 12:48:17

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

Or its a buggy protocol ?

IP/UDP/TCP definitely works, but maybe another protocol assumes its
header is in skb->head


On Thu, Sep 20, 2012 at 2:47 PM, Johannes Berg
<[email protected]> wrote:
> On Thu, 2012-09-20 at 14:45 +0200, Eric Dumazet wrote:
>> I guess you only need to make sure 14 bytes of ethernet header are
>> available before eth_type_trans(skb, dev);
>>
>> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
>> index 61c621e..ffe5f84 100644
>> --- a/net/mac80211/rx.c
>> +++ b/net/mac80211/rx.c
>> @@ -1795,9 +1795,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
>>
>> if (skb) {
>> /* deliver to local stack */
>> - skb->protocol = eth_type_trans(skb, dev);
>> - memset(skb->cb, 0, sizeof(skb->cb));
>> - netif_receive_skb(skb);
>> + if (pskb_may_pull(skb, sizeof(struct ethhdr))) {
>> + skb->protocol = eth_type_trans(skb, dev);
>> + memset(skb->cb, 0, sizeof(skb->cb));
>> + netif_receive_skb(skb);
>> + } else {
>> + kfree_skb(skb);
>> + }
>> }
>> }
>
> Yeah I was looking at the same code just now. However, we had actually
> inserted the skb_linearize() *after* eth_type_trans(), so I'm confused.
> Maybe it still works, more or less by accident?
>
> johannes
>

2012-09-20 12:45:02

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

I guess you only need to make sure 14 bytes of ethernet header are
available before eth_type_trans(skb, dev);

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 61c621e..ffe5f84 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1795,9 +1795,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)

if (skb) {
/* deliver to local stack */
- skb->protocol = eth_type_trans(skb, dev);
- memset(skb->cb, 0, sizeof(skb->cb));
- netif_receive_skb(skb);
+ if (pskb_may_pull(skb, sizeof(struct ethhdr))) {
+ skb->protocol = eth_type_trans(skb, dev);
+ memset(skb->cb, 0, sizeof(skb->cb));
+ netif_receive_skb(skb);
+ } else {
+ kfree_skb(skb);
+ }
}
}


On Thu, Sep 20, 2012 at 2:40 PM, Eric Dumazet <[email protected]> wrote:
> OK, but which netif_receive_skb(), as I count 5 occurrences in
> net/mac80211/rx.c ?
>
> Instead of skb_linearize() calls
>
> you could try several values of XXX in
>
> pskb_may_pull(skb, XXX)
>
> So that you make sure XXX bytes are available in skb->head, and not
> the whole frame.
>
> One you know the limit for XXX, it might give a clue where a
> pskb_may_pull(skb, XXX) is missing.
>
> On Thu, Sep 20, 2012 at 2:35 PM, Johannes Berg
> <[email protected]> wrote:
>> Hi,
>>
>>> > 56138f5 iwlwifi: dont pull too much payload in skb head
>>> > 3edaf3e mac80211: manage AP netdev carrier state
>>>
>>> The second patch (AP carrier state) actually exposed a connman issue
>>> which I fixed and submitted a connman patch:
>>> http://lists.connman.net/pipermail/connman/2012-September/011232.html
>>>
>>> However, Eric's patch still causes tethering problems to me.
>>
>>
>> Let me recap a bit. Artem is using connman, which sets up the wifi
>> interface as part of a bridge. It runs wpa_supplicant to create an AP
>> (only AP and mesh mode interfaces can be bridged anyway).
>>
>>
>> Eric, you said:
>>
>>> I would say some part of the stack assumes a header (I dont know wich
>>> one ?) is pulled in skb->head/data, and thats a bug.
>>>
>>> Always use pskb_may_pull(skb, XXX) to make sure you can access XXX
>>> bytes in skb->data
>>
>> I thought we'd figure out which part of the stack assumes a header, so I
>> asked Artem to test a one-line patch which adds "skb_linearize()" before
>> "netif_receive_skb()" in mac80211. This makes it work, but I'm not sure
>> where after that the bad assumption might be.
>>
>> johannes
>>

2012-09-21 17:24:48

by David Miller

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

From: Eric Dumazet <[email protected]>
Date: Thu, 20 Sep 2012 16:25:49 +0200

> Please remove this hack and try the following bugfix in raw handler
>
> icmp_filter() should not modify skb, or else its caller should not
> assume ip_hdr() is unchanged.

Right, good catch.

Please submit this fix formally Eric, thanks a lot.

2012-09-20 15:31:12

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 17:27 +0200, Eric Dumazet wrote:
> On Thu, 2012-09-20 at 17:22 +0200, Johannes Berg wrote:
> > On Thu, 2012-09-20 at 17:18 +0200, Eric Dumazet wrote:
> >
> > > > Note I think the failing packets are dhcp packets, and connman seems to
> > > > use an AF_INET, SOCK_DGRAM, IPPROTO_UDP socket for those, and binds it
> > > > to the right device. I'd be quite surprised though if UDP code had
> > > > issues with paged Rx??
> >
> > > You told me : socket(PF_PACKET, SOCK_RAW, htons(ETH_P_EAPOL))
> > >
> > > So I looked at raw code.
> >
> > and even found an issue. Artem wasn't really sure, but looking at his
> > information again it seems that the EAPOL packets (those on the raw
> > sockets) do go through, and then DHCP fails. In the printk he did for
> > me, I can see that the EAPOL packets are small enough to get pulled in
> > completely in iwlwifi, and the bigger DHCP packets (~600 bytes) only go
> > through if we linearize them.
>
> OK, its a bug in UDP, I'll send a patch asap.

Oh well, false alarm.

Artem, could you send one capture of one such DHCP big packet ?

tcpdump -p -n -s 2000 -i wlanX -X



2012-09-20 13:34:41

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 15:22 +0200, Eric Dumazet wrote:
> On Thu, 2012-09-20 at 16:22 +0300, Artem Bityutskiy wrote:
>
> >
> > OK, I've tried almost this (see below) and it solves my issue:
> >
> > diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> > index 965e6ec..7f079d0 100644
> > --- a/net/mac80211/rx.c
> > +++ b/net/mac80211/rx.c
> > @@ -1798,9 +1798,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
> >
> > if (skb) {
> > /* deliver to local stack */
> > - skb->protocol = eth_type_trans(skb, dev);
> > - memset(skb->cb, 0, sizeof(skb->cb));
> > - netif_receive_skb(skb);
> > + if (pskb_may_pull(skb, 40)) {
> > + skb->protocol = eth_type_trans(skb, dev);
> > + memset(skb->cb, 0, sizeof(skb->cb));
> > + netif_receive_skb(skb);
> > + } else {
> > + kfree_skb(skb);
> > + }
> > }
> > }
> >
>
> OK but you cant do that, or small frames will be dropped.
>
> Anyway its a hack, we should find the buggy layer.
>
> You could use dropwatch (drop_monitor) to check where frame is dropped.
>
> modprobe drop_monitor
> dropwatch -l kas

I do not have dropwatch in Tizen, I need to find the sources and compile
the user-space part. Any hint where to download it?

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-20 15:04:47

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 16:56 +0200, Eric Dumazet wrote:
> On Thu, 2012-09-20 at 17:37 +0300, Artem Bityutskiy wrote:
> > On Thu, 2012-09-20 at 16:25 +0200, Eric Dumazet wrote:
> > > Please remove this hack and try the following bugfix in raw handler
> > >
> > > icmp_filter() should not modify skb, or else its caller should not
> > > assume ip_hdr() is unchanged.
> >
> > Did not help :(
> >
>
> What gives (after some failures)
>
> cat /proc/net/raw

Note I think the failing packets are dhcp packets, and connman seems to
use an AF_INET, SOCK_DGRAM, IPPROTO_UDP socket for those, and binds it
to the right device. I'd be quite surprised though if UDP code had
issues with paged Rx??

johannes


2012-09-20 12:34:50

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

Hi,

> > 56138f5 iwlwifi: dont pull too much payload in skb head
> > 3edaf3e mac80211: manage AP netdev carrier state
>
> The second patch (AP carrier state) actually exposed a connman issue
> which I fixed and submitted a connman patch:
> http://lists.connman.net/pipermail/connman/2012-September/011232.html
>
> However, Eric's patch still causes tethering problems to me.


Let me recap a bit. Artem is using connman, which sets up the wifi
interface as part of a bridge. It runs wpa_supplicant to create an AP
(only AP and mesh mode interfaces can be bridged anyway).


Eric, you said:

> I would say some part of the stack assumes a header (I dont know wich
> one ?) is pulled in skb->head/data, and thats a bug.
>
> Always use pskb_may_pull(skb, XXX) to make sure you can access XXX
> bytes in skb->data

I thought we'd figure out which part of the stack assumes a header, so I
asked Artem to test a one-line patch which adds "skb_linearize()" before
"netif_receive_skb()" in mac80211. This makes it work, but I'm not sure
where after that the bad assumption might be.

johannes


2012-09-20 15:19:44

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 17:05 +0200, Johannes Berg wrote:

> Note I think the failing packets are dhcp packets, and connman seems to
> use an AF_INET, SOCK_DGRAM, IPPROTO_UDP socket for those, and binds it
> to the right device. I'd be quite surprised though if UDP code had
> issues with paged Rx??
>

It could be a checksum issue ?

"netstat -s" could give a clue...



2012-09-20 15:18:20

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 17:05 +0200, Johannes Berg wrote:
> On Thu, 2012-09-20 at 16:56 +0200, Eric Dumazet wrote:
> > On Thu, 2012-09-20 at 17:37 +0300, Artem Bityutskiy wrote:
> > > On Thu, 2012-09-20 at 16:25 +0200, Eric Dumazet wrote:
> > > > Please remove this hack and try the following bugfix in raw handler
> > > >
> > > > icmp_filter() should not modify skb, or else its caller should not
> > > > assume ip_hdr() is unchanged.
> > >
> > > Did not help :(
> > >
> >
> > What gives (after some failures)
> >
> > cat /proc/net/raw
>
> Note I think the failing packets are dhcp packets, and connman seems to
> use an AF_INET, SOCK_DGRAM, IPPROTO_UDP socket for those, and binds it
> to the right device. I'd be quite surprised though if UDP code had
> issues with paged Rx??
>
> johannes
>

You told me : socket(PF_PACKET, SOCK_RAW, htons(ETH_P_EAPOL))

So I looked at raw code.





2012-09-18 11:45:16

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Tue, 2012-09-18 at 10:44 +0200, Johannes Berg wrote:

> Thanks. I just got that far as well, but I hadn't set up a DHCP
> server ... I'll debug further, I just confirmed that association still
> works.

Ok I can't reproduce the problem. You're going to have to be more
specific about what connman sets up, I think.

Here's what I did:

* brctl addbr br0
* ip addr add ... br0
* ip link set br0 up
* start dnsmasq on br0
* start wpa_supplicant on wlan0, with -Dnl80211 -bbr0 -iwlan0 -c../cfg
* brctl addif br0 wlan0

Maybe this isn't exactly what connman does, but it works for me with
kernel 3.5.4, a 6230 Intel device and 1.0 wpa_supplicant.

Can you capture exactly what connman does?

Also, are you sure that reverting just the carrier state patch isn't
sufficient? It seems to me that the carrier state patch might confuse
the bridge code, but I really see no reason Eric's change should have
any impact at all.

johannes


2012-09-20 12:51:11

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

So its using a RAW socket ?

On Thu, Sep 20, 2012 at 2:50 PM, Johannes Berg
<[email protected]> wrote:
> On Thu, 2012-09-20 at 14:48 +0200, Eric Dumazet wrote:
>> Or its a buggy protocol ?
>>
>> IP/UDP/TCP definitely works, but maybe another protocol assumes its
>> header is in skb->head
>
> It could be, I think it failed either on EAPOL (which is just userspace
> registering a protocol socket) or DHCP (same?)
>
> johannes
>

2012-09-17 15:15:42

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

Hi

I would say some part of the stack assumes a header (I dont know wich
one ?) is pulled in skb->head/data, and thats a bug.

Always use pskb_may_pull(skb, XXX) to make sure you can access XXX
bytes in skb->data

It would help if you describe the problem, because I have no history (
I am not linux-wireless subscriber).

On Mon, Sep 17, 2012 at 4:41 PM, Artem Bityutskiy
<[email protected]> wrote:
> On Fri, 2012-09-14 at 19:39 +0300, Artem Bityutskiy wrote:
>> On Fri, 2012-09-14 at 17:02 +0300, Artem Bityutskiy wrote:
>> > Hi Johannes,
>> >
>> > We are using iwlwifi driver in Tizen IVI. We were using kernel v3.4 and
>> > wifi tethering worked fine. But after upgrading to v3.5 it stopped
>> > working.
>> >
>> > I am trying to bisect this already many hours - very difficult, because
>> > tethering did not work in kernels around v3.4-rcX for some different
>> > reasons, so just 'git bisect' does not work.
>>
>> OK, my latest finding is:
>>
>> bad: 3edaf3e Johannes Berg mac80211: manage AP netdev carrier state
>> good: fe40cb6 Ashok Nagarajan mac80211: Check basic rates when peering
>>
>> However, just reverting in 3.5 does not solve the issue. I am out of
>> ideas how to proceed.
>
> OK, finally I got it. After 3 days of hardcore intelligent bisecting
> I've found out that tethering in 3.5 works for me if I revert
> these 2
> patches:
>
> 56138f5 iwlwifi: dont pull too much payload in skb head
> 3edaf3e mac80211: manage AP netdev carrier state
>
> Johannes, Eric?
>
> --
> Best Regards,
> Artem Bityutskiy

2012-09-20 12:49:49

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 14:46 +0200, Eric Dumazet wrote:
> Or its the lines with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS ?
>
> What arch is it ?

x86

johannes


2012-09-20 12:51:09

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 14:47 +0200, Johannes Berg wrote:

> > if (skb) {
> > /* deliver to local stack */
> > - skb->protocol = eth_type_trans(skb, dev);
> > - memset(skb->cb, 0, sizeof(skb->cb));
> > - netif_receive_skb(skb);
> > + if (pskb_may_pull(skb, sizeof(struct ethhdr))) {
> > + skb->protocol = eth_type_trans(skb, dev);
> > + memset(skb->cb, 0, sizeof(skb->cb));
> > + netif_receive_skb(skb);
> > + } else {
> > + kfree_skb(skb);
> > + }
> > }
> > }
>
> Yeah I was looking at the same code just now. However, we had actually
> inserted the skb_linearize() *after* eth_type_trans(), so I'm confused.

Ok actually, by the time we get here the ethernet header must be in the
skb head because we construct it from the 802.11 and llc/snap header.

johannes


2012-09-20 12:50:14

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 14:48 +0200, Eric Dumazet wrote:
> Or its a buggy protocol ?
>
> IP/UDP/TCP definitely works, but maybe another protocol assumes its
> header is in skb->head

It could be, I think it failed either on EAPOL (which is just userspace
registering a protocol socket) or DHCP (same?)

johannes


2012-09-20 12:46:29

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 14:40 +0200, Eric Dumazet wrote:
> OK, but which netif_receive_skb(), as I count 5 occurrences in
> net/mac80211/rx.c ?

The one you modified in the other email :-)

> Instead of skb_linearize() calls
>
> you could try several values of XXX in
>
> pskb_may_pull(skb, XXX)
>
> So that you make sure XXX bytes are available in skb->head, and not
> the whole frame.
>
> One you know the limit for XXX, it might give a clue where a
> pskb_may_pull(skb, XXX) is missing.

Good idea.

johannes


2012-09-17 14:36:14

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Fri, 2012-09-14 at 19:39 +0300, Artem Bityutskiy wrote:
> On Fri, 2012-09-14 at 17:02 +0300, Artem Bityutskiy wrote:
> > Hi Johannes,
> >
> > We are using iwlwifi driver in Tizen IVI. We were using kernel v3.4 and
> > wifi tethering worked fine. But after upgrading to v3.5 it stopped
> > working.
> >
> > I am trying to bisect this already many hours - very difficult, because
> > tethering did not work in kernels around v3.4-rcX for some different
> > reasons, so just 'git bisect' does not work.
>
> OK, my latest finding is:
>
> bad: 3edaf3e Johannes Berg mac80211: manage AP netdev carrier state
> good: fe40cb6 Ashok Nagarajan mac80211: Check basic rates when peering
>
> However, just reverting in 3.5 does not solve the issue. I am out of
> ideas how to proceed.

OK, finally I got it. After 3 days of hardcore intelligent bisecting
I've found out that tethering in 3.5 works for me if I revert
these 2
patches:

56138f5 iwlwifi: dont pull too much payload in skb head
3edaf3e mac80211: manage AP netdev carrier state

Johannes, Eric?

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-20 13:41:36

by Eric Dumazet

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 16:39 +0300, Artem Bityutskiy wrote:
> On Thu, 2012-09-20 at 15:22 +0200, Eric Dumazet wrote:

> > OK but you cant do that, or small frames will be dropped.
> >
> > Anyway its a hack, we should find the buggy layer.
> >
> > You could use dropwatch (drop_monitor) to check where frame is dropped.
> >
> > modprobe drop_monitor
> > dropwatch -l kas
>
> I do not have dropwatch in Tizen, I need to find the sources and compile
> the user-space part. Any hint where to download it?
>

https://fedorahosted.org/dropwatch/



2012-09-20 12:53:45

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 14:51 +0200, Eric Dumazet wrote:
> So its using a RAW socket ?

Yes:

socket(PF_PACKET, SOCK_RAW, htons(ETH_P_EAPOL))

johannes

> On Thu, Sep 20, 2012 at 2:50 PM, Johannes Berg
> <[email protected]> wrote:
> > On Thu, 2012-09-20 at 14:48 +0200, Eric Dumazet wrote:
> >> Or its a buggy protocol ?
> >>
> >> IP/UDP/TCP definitely works, but maybe another protocol assumes its
> >> header is in skb->head
> >
> > It could be, I think it failed either on EAPOL (which is just userspace
> > registering a protocol socket) or DHCP (same?)
> >
> > johannes
> >
>



2012-09-20 13:17:35

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 15:04 +0200, Eric Dumazet wrote:
> Try to pull 40 bytes : Thats OK for tcp performance, because 40 bytes
> is the minimum size of IP+TCP headers
>
> pskb_may_pull(skb, 40);

OK, I've tried almost this (see below) and it solves my issue:

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 965e6ec..7f079d0 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1798,9 +1798,13 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)

if (skb) {
/* deliver to local stack */
- skb->protocol = eth_type_trans(skb, dev);
- memset(skb->cb, 0, sizeof(skb->cb));
- netif_receive_skb(skb);
+ if (pskb_may_pull(skb, 40)) {
+ skb->protocol = eth_type_trans(skb, dev);
+ memset(skb->cb, 0, sizeof(skb->cb));
+ netif_receive_skb(skb);
+ } else {
+ kfree_skb(skb);
+ }
}
}

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-20 15:25:53

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2012-09-20 at 17:19 +0200, Eric Dumazet wrote:
> On Thu, 2012-09-20 at 17:05 +0200, Johannes Berg wrote:
>
> > Note I think the failing packets are dhcp packets, and connman seems to
> > use an AF_INET, SOCK_DGRAM, IPPROTO_UDP socket for those, and binds it
> > to the right device. I'd be quite surprised though if UDP code had
> > issues with paged Rx??
> >
>
> It could be a checksum issue ?

Hmm, would paged RX make a difference there? But I guess it could be,
our packets are all CHECKSUM_NONE.

johannes


2012-09-18 08:35:46

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Tue, 2012-09-18 at 09:09 +0200, Johannes Berg wrote:
> Hm. This is curious. *How* does AP mode fail? You said earlier stations
> failed to connect, but can they actually get associated? I'm curious
> because while I could see the carrier state thing causing issues (on old
> hostapd versions that we didn't test), I'm not sure I see how the skb
> payload makes a difference since mac80211 will linearize management
> frames (so association should work) and data frames don't matter (so
> authorization should work) ... unless there's a bug somewhere in
> mac80211 where it forgets to pskb_may_pull(), but I don't see any such
> place right now.

I am not sure what happens. In the failure case the high-level symptom
(in NetwokManager or in Windows) is that "failed to set IP address".
I've captured 2 tcpdumps - for the working (those 2 patches reverted)
and non-working cases (vanilla 3.5.4). I'll send them to you privately.

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-20 12:01:34

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

Hi, some updates.

On Mon, 2012-09-17 at 17:41 +0300, Artem Bityutskiy wrote:
> OK, finally I got it. After 3 days of hardcore intelligent bisecting
> I've found out that tethering in 3.5 works for me if I revert
> these 2
> patches:
>
> 56138f5 iwlwifi: dont pull too much payload in skb head
> 3edaf3e mac80211: manage AP netdev carrier state

The second patch (AP carrier state) actually exposed a connman issue
which I fixed and submitted a connman patch:
http://lists.connman.net/pipermail/connman/2012-September/011232.html

However, Eric's patch still causes tethering problems to me.

--
Best Regards,
Artem Bityutskiy

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-09-18 07:09:23

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Mon, 2012-09-17 at 17:41 +0300, Artem Bityutskiy wrote:


> OK, finally I got it. After 3 days of hardcore intelligent bisecting
> I've found out that tethering in 3.5 works for me if I revert
> these 2
> patches:
>
> 56138f5 iwlwifi: dont pull too much payload in skb head
> 3edaf3e mac80211: manage AP netdev carrier state

Hm. This is curious. *How* does AP mode fail? You said earlier stations
failed to connect, but can they actually get associated? I'm curious
because while I could see the carrier state thing causing issues (on old
hostapd versions that we didn't test), I'm not sure I see how the skb
payload makes a difference since mac80211 will linearize management
frames (so association should work) and data frames don't matter (so
authorization should work) ... unless there's a bug somewhere in
mac80211 where it forgets to pskb_may_pull(), but I don't see any such
place right now.

johannes


2013-03-23 15:46:03

by Luis Henriques

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, Mar 21, 2013 at 08:34:11PM +0100, Johannes Berg wrote:
> On Mon, 2012-09-17 at 17:41 +0300, Artem Bityutskiy wrote:
>
> > OK, finally I got it. After 3 days of hardcore intelligent bisecting
> > I've found out that tethering in 3.5 works for me if I revert
> > these 2
> > patches:
> >
> > 56138f5 iwlwifi: dont pull too much payload in skb head
>
> I got back to this for a customer running 3.5, and after many failed
> attempts realized that you have to have iptables for this problem to
> actually happen. I reverse-bisected that the *fix* is
> 6caab7b0544e83e6c160b5e80f5a4a7dd69545c7, in 3.7.
>
> Is there still any stable kernel 3.5/3.6 (or possibly before, though for
> iwlwifi before doesn't matter) that this should be applied to?

The 3.5.y.z extended stable tree already contains this commit as well.

Cheers,
--
Luis

2013-03-21 19:34:21

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Mon, 2012-09-17 at 17:41 +0300, Artem Bityutskiy wrote:

> OK, finally I got it. After 3 days of hardcore intelligent bisecting
> I've found out that tethering in 3.5 works for me if I revert
> these 2
> patches:
>
> 56138f5 iwlwifi: dont pull too much payload in skb head

I got back to this for a customer running 3.5, and after many failed
attempts realized that you have to have iptables for this problem to
actually happen. I reverse-bisected that the *fix* is
6caab7b0544e83e6c160b5e80f5a4a7dd69545c7, in 3.7.

Is there still any stable kernel 3.5/3.6 (or possibly before, though for
iwlwifi before doesn't matter) that this should be applied to?

johannes


2013-03-22 07:00:44

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

On Thu, 2013-03-21 at 20:34 +0100, Johannes Berg wrote:
> On Mon, 2012-09-17 at 17:41 +0300, Artem Bityutskiy wrote:
>
> > OK, finally I got it. After 3 days of hardcore intelligent bisecting
> > I've found out that tethering in 3.5 works for me if I revert
> > these 2
> > patches:
> >
> > 56138f5 iwlwifi: dont pull too much payload in skb head
>
> I got back to this for a customer running 3.5, and after many failed
> attempts realized that you have to have iptables for this problem to
> actually happen. I reverse-bisected that the *fix* is
> 6caab7b0544e83e6c160b5e80f5a4a7dd69545c7, in 3.7.
>
> Is there still any stable kernel 3.5/3.6 (or possibly before, though for
> iwlwifi before doesn't matter) that this should be applied to?

Hi Johannes,

thanks for remembering about this. We've switched to the 3.8 kernel
recently, and we are pulling stable kernels in regularly. Now we are
at 3.8.3.

--
Best Regards,
Artem Bityutskiy


2013-03-21 21:24:15

by David Miller

[permalink] [raw]
Subject: Re: regression: tethering fails in 3.5 with iwlwifi

From: Johannes Berg <[email protected]>
Date: Thu, 21 Mar 2013 20:34:11 +0100

> On Mon, 2012-09-17 at 17:41 +0300, Artem Bityutskiy wrote:
>
>> OK, finally I got it. After 3 days of hardcore intelligent bisecting
>> I've found out that tethering in 3.5 works for me if I revert
>> these 2
>> patches:
>>
>> 56138f5 iwlwifi: dont pull too much payload in skb head
>
> I got back to this for a customer running 3.5, and after many failed
> attempts realized that you have to have iptables for this problem to
> actually happen. I reverse-bisected that the *fix* is
> 6caab7b0544e83e6c160b5e80f5a4a7dd69545c7, in 3.7.
>
> Is there still any stable kernel 3.5/3.6 (or possibly before, though for
> iwlwifi before doesn't matter) that this should be applied to?

I've checked all of 3.0.x, 3.2.x, 3.4.x, and 3.8.x They all have this
patch applied.

Those are the official stable kernels, everything else is outside of my
realm.