2010-06-13 20:15:14

by Chris Clayton

[permalink] [raw]
Subject: Noticeable slow-down in 2.6.35-rc3

Hi,

Please cc me on any reply because I'm not subscribed to linux-kernel
or linux-net

I've noticed a slowdown in 2.6.35-rc3. It shows up in a few places:

1. When my desktop (KDE 3.5.10) is starting up, the "Initialising
system services" phase takes about 45 seconds as opposed to the normal
4 or 5 seconds., Similarly, whilst the basic KDE panel draws as
normal, the icons and other gadgets that it normally contains take
about 15 seconds to appear.

2. In firefox (3.6.3), there is a short (a second or two), but
noticeable, delay when a menu or sub-menu label is clicked on before
the {sub-,}menu appears. Normally the response id almost instant.

There are some similarities with Gene Heskett's report at
http://marc.info/?l=linux-kernel&m=127635846208957

I've bisected it and arrived at:

597a264b1a9c7e36d1728f677c66c5c1f7e3b837 is the first bad commit
commit 597a264b1a9c7e36d1728f677c66c5c1f7e3b837
Author: John Fastabend <[email protected]>
Date: Thu Jun 3 09:30:11 2010 +0000

net: deliver skbs on inactive slaves to exact matches

Currently, the accelerated receive path for VLAN's will
drop packets if the real device is an inactive slave and
is not one of the special pkts tested for in
skb_bond_should_drop(). This behavior is different then
the non-accelerated path and for pkts over a bonded vlan.

For example,

vlanx -> bond0 -> ethx

will be dropped in the vlan path and not delivered to any
packet handlers at all. However,

bond0 -> vlanx -> ethx

and

bond0 -> ethx

will be delivered to handlers that match the exact dev,
because the VLAN path checks the real_dev which is not a
slave and netif_recv_skb() doesn't drop frames but only
delivers them to exact matches.

This patch adds a sk_buff flag which is used for tagging
skbs that would previously been dropped and allows the
skb to continue to skb_netif_recv(). Here we add
logic to check for the deliver_no_wcard flag and if it
is set only deliver to handlers that match exactly. This
makes both paths above consistent and gives pkt handlers
a way to identify skbs that come from inactive slaves.
Without this patch in some configurations skbs will be
delivered to handlers with exact matches and in others
be dropped out right in the vlan path.

I have tested the following 4 configurations in failover modes
and load balancing modes.

# bond0 -> ethx

# vlanx -> bond0 -> ethx

# bond0 -> vlanx -> ethx

# bond0 -> ethx
|
vlanx -> --

Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

:040000 040000 f272ab5b895c46b3166d321a2da759c2a6e08ae0
467d28aad962f3506bc8820241d7417fb93e507f M include
:040000 040000 b4c5eb03a781b5ca016459ae19ebe2175d119eda
9c0ce9f12b43aecd9fee9ed816e11841b7b81fd8 M net

The bisect log:

# bad: [7e27d6e778cd87b6f2415515d7127eba53fe5d02] Linux 2.6.35-rc3
# good: [e44a21b7268a022c7749f521c06214145bd161e4] Linux 2.6.35-rc2
git bisect start 'v2.6.35-rc3' 'v2.6.35-rc2'
# good: [6db40cf047a8723095caf79f5569d21b388d7b31] pipe: fix check in
"set size" fcntl
git bisect good 6db40cf047a8723095caf79f5569d21b388d7b31
# good: [63c70a0d7b59bac08bd14cd24c36f76aafc25de6] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
git bisect good 63c70a0d7b59bac08bd14cd24c36f76aafc25de6
# good: [6f902af400b2499c80865c62a06fbbd15cf804fd] Btrfs: The file
argument for fsync() is never null
git bisect good 6f902af400b2499c80865c62a06fbbd15cf804fd
# good: [7ae1277a5202109a31d8f81ac99d4a53278dab84] Merge branch
'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
git bisect good 7ae1277a5202109a31d8f81ac99d4a53278dab84
# good: [00d9d6a185de89edc0649ca4ead58f0283dfcbac] ipv6: fix ICMP6_MIB_OUTERRORS
git bisect good 00d9d6a185de89edc0649ca4ead58f0283dfcbac
# bad: [349124a00754129a5f1e43efa84733e364bf3749] net8139: fix a race
at the end of NAPI
git bisect bad 349124a00754129a5f1e43efa84733e364bf3749
# bad: [ae638c47dc040b8def16d05dc6acdd527628f231] pkt_sched:
gen_estimator: add a new lock
git bisect bad ae638c47dc040b8def16d05dc6acdd527628f231
# bad: [597a264b1a9c7e36d1728f677c66c5c1f7e3b837] net: deliver skbs on
inactive slaves to exact matches
git bisect bad 597a264b1a9c7e36d1728f677c66c5c1f7e3b837

Reversing the identified patch gives a kernel without the slowdowns.

bzip'd .config is attached.

Happy to test fixes or provide additional diagnostics, but for the
latter I'll need clear instructions - I'm not that familiar with the
net tools.

Chris
--
The more I see, the more I know. The more I know, the less I
understand. Changing Man - Paul Weller


Attachments:
config-2.6.35-rc3.bz2 (12.96 kB)

2010-06-13 20:41:33

by François Valenduc

[permalink] [raw]
Subject: Re: Noticeable slow-down in 2.6.35-rc3

Le 13/06/10 22:15, Chris Clayton a ?crit :
> Hi,
>
> Please cc me on any reply because I'm not subscribed to linux-kernel
> or linux-net
>
> I've noticed a slowdown in 2.6.35-rc3. It shows up in a few places:
>
> 1. When my desktop (KDE 3.5.10) is starting up, the "Initialising
> system services" phase takes about 45 seconds as opposed to the normal
> 4 or 5 seconds., Similarly, whilst the basic KDE panel draws as
> normal, the icons and other gadgets that it normally contains take
> about 15 seconds to appear.
>
> 2. In firefox (3.6.3), there is a short (a second or two), but
> noticeable, delay when a menu or sub-menu label is clicked on before
> the {sub-,}menu appears. Normally the response id almost instant.
>
> There are some similarities with Gene Heskett's report at
> http://marc.info/?l=linux-kernel&m=127635846208957
>
> I've bisected it and arrived at:
>
> 597a264b1a9c7e36d1728f677c66c5c1f7e3b837 is the first bad commit
> commit 597a264b1a9c7e36d1728f677c66c5c1f7e3b837
> Author: John Fastabend <[email protected]>
> Date: Thu Jun 3 09:30:11 2010 +0000
>
> net: deliver skbs on inactive slaves to exact matches
>
> Currently, the accelerated receive path for VLAN's will
> drop packets if the real device is an inactive slave and
> is not one of the special pkts tested for in
> skb_bond_should_drop(). This behavior is different then
> the non-accelerated path and for pkts over a bonded vlan.
>
> For example,
>
> vlanx -> bond0 -> ethx
>
> will be dropped in the vlan path and not delivered to any
> packet handlers at all. However,
>
> bond0 -> vlanx -> ethx
>
> and
>
> bond0 -> ethx
>
> will be delivered to handlers that match the exact dev,
> because the VLAN path checks the real_dev which is not a
> slave and netif_recv_skb() doesn't drop frames but only
> delivers them to exact matches.
>
> This patch adds a sk_buff flag which is used for tagging
> skbs that would previously been dropped and allows the
> skb to continue to skb_netif_recv(). Here we add
> logic to check for the deliver_no_wcard flag and if it
> is set only deliver to handlers that match exactly. This
> makes both paths above consistent and gives pkt handlers
> a way to identify skbs that come from inactive slaves.
> Without this patch in some configurations skbs will be
> delivered to handlers with exact matches and in others
> be dropped out right in the vlan path.
>
> I have tested the following 4 configurations in failover modes
> and load balancing modes.
>
> # bond0 -> ethx
>
> # vlanx -> bond0 -> ethx
>
> # bond0 -> vlanx -> ethx
>
> # bond0 -> ethx
> |
> vlanx -> --
>
> Signed-off-by: John Fastabend <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
>
> :040000 040000 f272ab5b895c46b3166d321a2da759c2a6e08ae0
> 467d28aad962f3506bc8820241d7417fb93e507f M include
> :040000 040000 b4c5eb03a781b5ca016459ae19ebe2175d119eda
> 9c0ce9f12b43aecd9fee9ed816e11841b7b81fd8 M net
>
> The bisect log:
>
> # bad: [7e27d6e778cd87b6f2415515d7127eba53fe5d02] Linux 2.6.35-rc3
> # good: [e44a21b7268a022c7749f521c06214145bd161e4] Linux 2.6.35-rc2
> git bisect start 'v2.6.35-rc3' 'v2.6.35-rc2'
> # good: [6db40cf047a8723095caf79f5569d21b388d7b31] pipe: fix check in
> "set size" fcntl
> git bisect good 6db40cf047a8723095caf79f5569d21b388d7b31
> # good: [63c70a0d7b59bac08bd14cd24c36f76aafc25de6] Merge branch
> 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
> git bisect good 63c70a0d7b59bac08bd14cd24c36f76aafc25de6
> # good: [6f902af400b2499c80865c62a06fbbd15cf804fd] Btrfs: The file
> argument for fsync() is never null
> git bisect good 6f902af400b2499c80865c62a06fbbd15cf804fd
> # good: [7ae1277a5202109a31d8f81ac99d4a53278dab84] Merge branch
> 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
> git bisect good 7ae1277a5202109a31d8f81ac99d4a53278dab84
> # good: [00d9d6a185de89edc0649ca4ead58f0283dfcbac] ipv6: fix ICMP6_MIB_OUTERRORS
> git bisect good 00d9d6a185de89edc0649ca4ead58f0283dfcbac
> # bad: [349124a00754129a5f1e43efa84733e364bf3749] net8139: fix a race
> at the end of NAPI
> git bisect bad 349124a00754129a5f1e43efa84733e364bf3749
> # bad: [ae638c47dc040b8def16d05dc6acdd527628f231] pkt_sched:
> gen_estimator: add a new lock
> git bisect bad ae638c47dc040b8def16d05dc6acdd527628f231
> # bad: [597a264b1a9c7e36d1728f677c66c5c1f7e3b837] net: deliver skbs on
> inactive slaves to exact matches
> git bisect bad 597a264b1a9c7e36d1728f677c66c5c1f7e3b837
>
> Reversing the identified patch gives a kernel without the slowdowns.
>
> bzip'd .config is attached.
>
> Happy to test fixes or provide additional diagnostics, but for the
> latter I'll need clear instructions - I'm not that familiar with the
> net tools.
>
> Chris

This commit also makes nfsd hangs at startup on my computer (see
https://bugzilla.kernel.org/show_bug.cgi?id=16195). This problem doesn't
occur if it's reverted.

Fran?ois Valenduc

2010-06-13 20:58:29

by Chris Clayton

[permalink] [raw]
Subject: Re: Noticeable slow-down in 2.6.35-rc3

On Sunday 13 June 2010 21:35:25 Fran?ois Valenduc wrote:
> Le 13/06/10 22:15, Chris Clayton a ?crit :
<snip>
> >
> > I've bisected it and arrived at:
> >
> > 597a264b1a9c7e36d1728f677c66c5c1f7e3b837 is the first bad commit
> > commit 597a264b1a9c7e36d1728f677c66c5c1f7e3b837
> > Author: John Fastabend <[email protected]>
> > Date: Thu Jun 3 09:30:11 2010 +0000
> >

<snip>

> This commit also makes nfsd hangs at startup on my computer (see
> https://bugzilla.kernel.org/show_bug.cgi?id=16195). This problem doesn't
> occur if it's reverted.
>

I've just found John Fastabend's easy and fast fix
(http://marc.info/?l=linux-kernel&m=127646140827821) and am about to apply it
and test the new kernel. back soon!

> Fran?ois Valenduc



--
The more I see, the more I know. The more I know, the less I understand.
Changing Man - Paul Weller

2010-06-13 21:11:29

by Chris Clayton

[permalink] [raw]
Subject: Re: Noticeable slow-down in 2.6.35-rc3

On Sunday 13 June 2010 21:58:09 Chris Clayton wrote:
> On Sunday 13 June 2010 21:35:25 Fran?ois Valenduc wrote:
> > Le 13/06/10 22:15, Chris Clayton a ?crit :
>
> <snip>
>
> > > I've bisected it and arrived at:
> > >
> > > 597a264b1a9c7e36d1728f677c66c5c1f7e3b837 is the first bad commit
> > > commit 597a264b1a9c7e36d1728f677c66c5c1f7e3b837
> > > Author: John Fastabend <[email protected]>
> > > Date: Thu Jun 3 09:30:11 2010 +0000
>
> <snip>
>
> > This commit also makes nfsd hangs at startup on my computer (see
> > https://bugzilla.kernel.org/show_bug.cgi?id=16195). This problem doesn't
> > occur if it's reverted.
>
> I've just found John Fastabend's easy and fast fix
> (http://marc.info/?l=linux-kernel&m=127646140827821) and am about to apply
> it and test the new kernel. back soon!
>

Yes, that's fixed the problem I reported.

Thanks.

Chris

> > Fran?ois Valenduc



--
The more I see, the more I know. The more I know, the less I understand.
Changing Man - Paul Weller

2010-06-13 22:13:13

by Gene Heskett

[permalink] [raw]
Subject: Re: Noticeable slow-down in 2.6.35-rc3

On Sunday 13 June 2010, Chris Clayton wrote:
>On Sunday 13 June 2010 21:58:09 Chris Clayton wrote:
>> On Sunday 13 June 2010 21:35:25 Fran?ois Valenduc wrote:
>> > Le 13/06/10 22:15, Chris Clayton a ?crit :
>>
>> <snip>
>>
>> > > I've bisected it and arrived at:
>> > >
>> > > 597a264b1a9c7e36d1728f677c66c5c1f7e3b837 is the first bad commit
>> > > commit 597a264b1a9c7e36d1728f677c66c5c1f7e3b837
>> > > Author: John Fastabend <[email protected]>
>> > > Date: Thu Jun 3 09:30:11 2010 +0000
>>
>> <snip>
>>
>> > This commit also makes nfsd hangs at startup on my computer (see
>> > https://bugzilla.kernel.org/show_bug.cgi?id=16195). This problem
>> > doesn't occur if it's reverted.
>>
>> I've just found John Fastabend's easy and fast fix
>> (http://marc.info/?l=linux-kernel&m=127646140827821) and am about to
>> apply it and test the new kernel. back soon!
>
>Yes, that's fixed the problem I reported.
>
Unfortunately, this patch will not apply to my src tree for 2.6.35-rc3,
even after I fixed the unwanted line wrap in the first active line.

I get:

[root@coyote linux-2.6.35-rc3]# patch -p1 <../2.6.35-rc3-test.patch
patching file net/core/skbuff.c
Hunk #1 FAILED at 532.
1 out of 1 hunk FAILED -- saving rejects to file net/core/skbuff.c.rej

Did my grabbing it with swiftfox damage it even further?
------------------------------------------------------------------
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9f07e74..bcf2fa3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
new-&gt;ip_summed = old-&gt;ip_summed;
skb_copy_queue_mapping(new, old);
new-&gt;priority = old-&gt;priority;
+ new-&gt;deliver_no_wcard = old-&gt;deliver_no_wcard;
#if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
new-&gt;ipvs_property = old-&gt;ipvs_property;
#endif
---------------------------------------------------------------

That line 532 is not at line 532 in my src tree, its at line 516 here.

I have seen this sort of thing before when using bz2 src files, but this is
all from gzipped stuffs. ???

Thanks for any hints.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Depends on how you define "always". :-)
-- Larry Wall in <[email protected]>

2010-06-14 12:55:54

by walt

[permalink] [raw]
Subject: Re: Noticeable slow-down in 2.6.35-rc3

On 06/13/2010 03:13 PM, Gene Heskett wrote:

> Unfortunately, this patch will not apply to my src tree for 2.6.35-rc3,
> even after I fixed the unwanted line wrap in the first active line.
>
> I get:
>
> [root@coyote linux-2.6.35-rc3]# patch -p1<../2.6.35-rc3-test.patch
> patching file net/core/skbuff.c
> Hunk #1 FAILED at 532.
> 1 out of 1 hunk FAILED -- saving rejects to file net/core/skbuff.c.rej
>
> Did my grabbing it with swiftfox damage it even further?
> ------------------------------------------------------------------
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 9f07e74..bcf2fa3 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
> new-&gt;ip_summed = old-&gt;ip_summed;

I don't know what caused the corruption, but it's a systematic error.
Throughout the patch, the dereference symbol -> has been changed to
-& so it's not surprising that the patch won't apply. I've never seen
that particular problem before.

I don't think the line numbers (e.g.532) are responsible.

2010-06-14 15:31:56

by Gene Heskett

[permalink] [raw]
Subject: Re: Noticeable slow-down in 2.6.35-rc3

On Monday 14 June 2010, walt wrote:
>On 06/13/2010 03:13 PM, Gene Heskett wrote:
>> Unfortunately, this patch will not apply to my src tree for 2.6.35-rc3,
>> even after I fixed the unwanted line wrap in the first active line.
>>
>> I get:
>>
>> [root@coyote linux-2.6.35-rc3]# patch -p1<../2.6.35-rc3-test.patch
>> patching file net/core/skbuff.c
>> Hunk #1 FAILED at 532.
>> 1 out of 1 hunk FAILED -- saving rejects to file net/core/skbuff.c.rej
>>
>> Did my grabbing it with swiftfox damage it even further?
>> ------------------------------------------------------------------
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 9f07e74..bcf2fa3 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new,
>> const struct sk_buff *old) new-&gt;ip_summed = old-&gt;ip_summed;
>
>I don't know what caused the corruption, but it's a systematic error.
>Throughout the patch, the dereference symbol -> has been changed to
>-& so it's not surprising that the patch won't apply. I've never seen
>that particular problem before.
>
Neither have I, but that looks to be fixable, so I will give it a shot later
today. Thank you very much.

>I don't think the line numbers (e.g.532) are responsible.

Neither did I.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
This fortune was brought to you by the people at Hewlett-Packard.

2010-06-14 16:27:45

by Gene Heskett

[permalink] [raw]
Subject: Re: Noticeable slow-down in 2.6.35-rc3

On Monday 14 June 2010, walt wrote:
>On 06/13/2010 03:13 PM, Gene Heskett wrote:
>> Unfortunately, this patch will not apply to my src tree for 2.6.35-rc3,
>> even after I fixed the unwanted line wrap in the first active line.
>>
>> I get:
>>
>> [root@coyote linux-2.6.35-rc3]# patch -p1<../2.6.35-rc3-test.patch
>> patching file net/core/skbuff.c
>> Hunk #1 FAILED at 532.
>> 1 out of 1 hunk FAILED -- saving rejects to file net/core/skbuff.c.rej
>>
>> Did my grabbing it with swiftfox damage it even further?
>> ------------------------------------------------------------------
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 9f07e74..bcf2fa3 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new,
>> const struct sk_buff *old) new-&gt;ip_summed = old-&gt;ip_summed;

It also seems that the whole string "&gt;ip_" has been traded for the
leading c in the var names. So that patch was totally trashed by a broken
html converter by the time it actually got to my machine.

Where can I get a pristine copy?

I am also wondering if this has anything to do with my missing USB devices
when booted to 2.6.35-rc3. In my case the symptoms are as it it did not
recurse far enough up the various branches of my usb tree from hell to find
everything.

Thanks.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
FROM THE DESK OF
Rapunzel

Dear Prince:

Use ladder tonight -- you're splitting my ends.