I've moved recently one of my proxies(squid and some compressing application)
from 2.6.21 to 2.6.22, and notice huge performance drop. I think this is
important, cause it can cause serious regression on some other workloads like
busy web-servers and etc.
After some analysis of different options i can bring more exact numbers:
2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of traffic,
and working great without any slowdown or instability.
2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic, ssh
and console is "freezing" (there is delay even for typing characters).
Both proxies is on identical hardware(Sun Fire X4100), configuration(small
system, LFS-like, on USB flash), different only kernel.
I tried to disable/enable various options and optimisations - it doesn't
change anything, till i reach SLUB/SLAB option.
I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded it to
2.6.23-rc8), and having same effect.
Additionally, when load reaching maximum i can notice whole system slowdown,
for example ssh and scp takes much more time to run, even i do nice -n -5 for
them.
But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and sure
it slowdown other kind of network performance), but much less comparing with
SLUB. On top i am seeing ksoftirqd taking almost 100% (sometimes ksoftirqd/0,
sometimes ksoftirqd/1).
I tried also different tricks with scheduler (/proc/sys/kernel/sched*), but
it's also didn't help.
When it freezes it looks like:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7 root 15 -5 0 0 0 R 64 0.0 2:47.48 ksoftirqd/1
5819 root 20 0 134m 130m 596 R 57 3.3 4:36.78 globax
5911 squid 20 0 1138m 1.1g 2124 R 26 28.9 2:24.87 squid
10 root 15 -5 0 0 0 S 1 0.0 0:01.86 events/1
6130 root 20 0 3960 2416 1592 S 0 0.1 0:08.02 oprofiled
Oprofile results:
Thats oprofile with 2.6.23-rc8 - SLUB
73918 21.5521 check_bytes
38361 11.1848 acpi_pm_read
14077 4.1044 init_object
13632 3.9747 ip_send_reply
8486 2.4742 __slab_alloc
7199 2.0990 nf_iterate
6718 1.9588 page_address
6716 1.9582 tcp_v4_rcv
6425 1.8733 __slab_free
5604 1.6339 on_freelist
Thats oprofile with 2.6.23-rc8 - SLAB
CPU: AMD64 processors, speed 2592.64 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 100000
samples % symbol name
138991 14.0627 acpi_pm_read
52401 5.3018 tcp_v4_rcv
48466 4.9037 nf_iterate
38043 3.8491 __slab_alloc
34155 3.4557 ip_send_reply
20963 2.1210 ip_rcv
19475 1.9704 csum_partial
19084 1.9309 kfree
17434 1.7639 ip_output
17278 1.7481 netif_receive_skb
15248 1.5428 nf_hook_slow
My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
enabled, it doesn't make any noticeable difference)
Please CC me on reply, i am not in list.
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Denys a écrit :
> I've moved recently one of my proxies(squid and some compressing application)
> from 2.6.21 to 2.6.22, and notice huge performance drop. I think this is
> important, cause it can cause serious regression on some other workloads like
> busy web-servers and etc.
>
> After some analysis of different options i can bring more exact numbers:
>
> 2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of traffic,
> and working great without any slowdown or instability.
>
> 2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic, ssh
> and console is "freezing" (there is delay even for typing characters).
>
> Both proxies is on identical hardware(Sun Fire X4100), configuration(small
> system, LFS-like, on USB flash), different only kernel.
>
> I tried to disable/enable various options and optimisations - it doesn't
> change anything, till i reach SLUB/SLAB option.
>
> I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded it to
> 2.6.23-rc8), and having same effect.
> Additionally, when load reaching maximum i can notice whole system slowdown,
> for example ssh and scp takes much more time to run, even i do nice -n -5 for
> them.
>
> But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and sure
> it slowdown other kind of network performance), but much less comparing with
> SLUB. On top i am seeing ksoftirqd taking almost 100% (sometimes ksoftirqd/0,
> sometimes ksoftirqd/1).
>
> I tried also different tricks with scheduler (/proc/sys/kernel/sched*), but
> it's also didn't help.
>
> When it freezes it looks like:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 7 root 15 -5 0 0 0 R 64 0.0 2:47.48 ksoftirqd/1
> 5819 root 20 0 134m 130m 596 R 57 3.3 4:36.78 globax
> 5911 squid 20 0 1138m 1.1g 2124 R 26 28.9 2:24.87 squid
> 10 root 15 -5 0 0 0 S 1 0.0 0:01.86 events/1
> 6130 root 20 0 3960 2416 1592 S 0 0.1 0:08.02 oprofiled
>
>
> Oprofile results:
>
>
> Thats oprofile with 2.6.23-rc8 - SLUB
>
> 73918 21.5521 check_bytes
> 38361 11.1848 acpi_pm_read
> 14077 4.1044 init_object
> 13632 3.9747 ip_send_reply
> 8486 2.4742 __slab_alloc
> 7199 2.0990 nf_iterate
> 6718 1.9588 page_address
> 6716 1.9582 tcp_v4_rcv
> 6425 1.8733 __slab_free
> 5604 1.6339 on_freelist
>
>
> Thats oprofile with 2.6.23-rc8 - SLAB
>
> CPU: AMD64 processors, speed 2592.64 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
> mask of 0x00 (No unit mask) count 100000
> samples % symbol name
> 138991 14.0627 acpi_pm_read
> 52401 5.3018 tcp_v4_rcv
> 48466 4.9037 nf_iterate
> 38043 3.8491 __slab_alloc
> 34155 3.4557 ip_send_reply
> 20963 2.1210 ip_rcv
> 19475 1.9704 csum_partial
> 19084 1.9309 kfree
> 17434 1.7639 ip_output
> 17278 1.7481 netif_receive_skb
> 15248 1.5428 nf_hook_slow
>
> My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
> enabled, it doesn't make any noticeable difference)
>
> Please CC me on reply, i am not in list.
>
Could you try with SLUB but disabling CONFIG_SLUB_DEBUG ?
Hi
pi ~ # zcat /proc/config.gz |grep SLUB
# CONFIG_SLUB_DEBUG is not set
CONFIG_SLUB=y
21:27:20 CPU %user %nice %sys %iowait %irq %soft %steal
%idle intr/s
21:27:21 all 5.53 21.61 8.54 0.00 0.50 5.03 0.00
58.79 7147.52
21:27:22 all 6.40 15.76 8.87 0.00 0.49 20.20 0.00
48.28 6327.00
21:27:23 all 6.53 20.60 12.06 0.00 0.00 52.76 0.00
8.04 456.44
21:27:24 all 7.00 17.50 10.50 0.00 0.00 53.00 0.00
12.00 439.00
21:27:25 all 7.50 18.00 12.50 0.50 0.50 52.50 0.00
8.50 464.00
21:27:26 all 6.64 16.11 11.37 0.00 0.00 58.77 0.00
7.11 401.90
21:27:27 all 0.00 0.40 0.40 0.00 0.00 99.20 0.00
0.00 325.20
21:27:28 all 0.00 0.50 0.00 0.00 0.00 99.50 0.00
0.00 453.47
21:27:29 all 0.00 0.00 0.00 0.00 0.00 100.00 0.00
0.00 495.10
21:27:30 all 0.00 0.50 0.00 0.00 0.00 99.50 0.00
0.00 625.74
21:27:31 all 0.00 0.00 0.00 0.00 0.00 100.00 0.00
0.00 600.97
21:27:32 all 0.00 0.00 0.00 0.00 0.00 100.00 0.00
0.00 604.95
64 bytes from 127.0.0.1: icmp_seq=15 ttl=64 time=0.031 ms
64 bytes from 127.0.0.1: icmp_seq=16 ttl=64 time=229 ms
64 bytes from 127.0.0.1: icmp_seq=17 ttl=64 time=253 ms
64 bytes from 127.0.0.1: icmp_seq=18 ttl=64 time=257 ms
64 bytes from 127.0.0.1: icmp_seq=19 ttl=64 time=279 ms
64 bytes from 127.0.0.1: icmp_seq=20 ttl=64 time=246 ms
64 bytes from 127.0.0.1: icmp_seq=21 ttl=64 time=246 ms
64 bytes from 127.0.0.1: icmp_seq=22 ttl=64 time=234 ms
64 bytes from 127.0.0.1: icmp_seq=23 ttl=64 time=235 ms
Problem remains. It is another server.
On Sun, 30 Sep 2007 19:48:44 +0200, Eric Dumazet wrote
> > I've moved recently one of my proxies(squid and some compressing
application)
> > from 2.6.21 to 2.6.22, and notice huge performance drop. I think this is
> > important, cause it can cause serious regression on some other workloads
like
> > busy web-servers and etc.
> >
> > After some analysis of different options i can bring more exact numbers:
> >
> > 2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of
traffic,
> > and working great without any slowdown or instability.
> >
> > 2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic,
ssh
> > and console is "freezing" (there is delay even for typing characters).
> >
> > Both proxies is on identical hardware(Sun Fire X4100),
configuration(small
> > system, LFS-like, on USB flash), different only kernel.
> >
> > I tried to disable/enable various options and optimisations - it doesn't
> > change anything, till i reach SLUB/SLAB option.
> >
> > I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded
it to
> > 2.6.23-rc8), and having same effect.
> > Additionally, when load reaching maximum i can notice whole system
slowdown,
> > for example ssh and scp takes much more time to run, even i do nice -n -5
for
> > them.
> >
> > But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and
sure
> > it slowdown other kind of network performance), but much less comparing
with
> > SLUB. On top i am seeing ksoftirqd taking almost 100% (sometimes
ksoftirqd/0,
> > sometimes ksoftirqd/1).
> >
> > I tried also different tricks with scheduler (/proc/sys/kernel/sched*),
but
> > it's also didn't help.
> >
> > When it freezes it looks like:
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 7 root 15 -5 0 0 0 R 64 0.0 2:47.48 ksoftirqd/1
> > 5819 root 20 0 134m 130m 596 R 57 3.3 4:36.78 globax
> > 5911 squid 20 0 1138m 1.1g 2124 R 26 28.9 2:24.87 squid
> > 10 root 15 -5 0 0 0 S 1 0.0 0:01.86 events/1
> > 6130 root 20 0 3960 2416 1592 S 0 0.1 0:08.02 oprofiled
> >
> >
> > Oprofile results:
> >
> >
> > Thats oprofile with 2.6.23-rc8 - SLUB
> >
> > 73918 21.5521 check_bytes
> > 38361 11.1848 acpi_pm_read
> > 14077 4.1044 init_object
> > 13632 3.9747 ip_send_reply
> > 8486 2.4742 __slab_alloc
> > 7199 2.0990 nf_iterate
> > 6718 1.9588 page_address
> > 6716 1.9582 tcp_v4_rcv
> > 6425 1.8733 __slab_free
> > 5604 1.6339 on_freelist
> >
> >
> > Thats oprofile with 2.6.23-rc8 - SLAB
> >
> > CPU: AMD64 processors, speed 2592.64 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit
> > mask of 0x00 (No unit mask) count 100000
> > samples % symbol name
> > 138991 14.0627 acpi_pm_read
> > 52401 5.3018 tcp_v4_rcv
> > 48466 4.9037 nf_iterate
> > 38043 3.8491 __slab_alloc
> > 34155 3.4557 ip_send_reply
> > 20963 2.1210 ip_rcv
> > 19475 1.9704 csum_partial
> > 19084 1.9309 kfree
> > 17434 1.7639 ip_output
> > 17278 1.7481 netif_receive_skb
> > 15248 1.5428 nf_hook_slow
> >
> > My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
> > enabled, it doesn't make any noticeable difference)
> >
> > Please CC me on reply, i am not in list.
> >
>
> Could you try with SLUB but disabling CONFIG_SLUB_DEBUG ?
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Hi Denys, thanks for reporting (btw. please reply-to-all when replying
on lkml).
You say that SLAB is better than SLUB on an otherwise identical kernel,
but I didn't see if you quantified the actual numbers? It sounds like there
is still a regression with SLAB?
On Monday 01 October 2007 03:48, Eric Dumazet wrote:
> Denys a écrit :
> > I've moved recently one of my proxies(squid and some compressing
> > application) from 2.6.21 to 2.6.22, and notice huge performance drop. I
> > think this is important, cause it can cause serious regression on some
> > other workloads like busy web-servers and etc.
> >
> > After some analysis of different options i can bring more exact numbers:
> >
> > 2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of
> > traffic, and working great without any slowdown or instability.
> >
> > 2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic,
> > ssh and console is "freezing" (there is delay even for typing
> > characters).
> >
> > Both proxies is on identical hardware(Sun Fire X4100),
> > configuration(small system, LFS-like, on USB flash), different only
> > kernel.
> >
> > I tried to disable/enable various options and optimisations - it doesn't
> > change anything, till i reach SLUB/SLAB option.
> >
> > I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded
> > it to 2.6.23-rc8), and having same effect.
> > Additionally, when load reaching maximum i can notice whole system
> > slowdown, for example ssh and scp takes much more time to run, even i do
> > nice -n -5 for them.
> >
> > But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and
> > sure it slowdown other kind of network performance), but much less
> > comparing with SLUB. On top i am seeing ksoftirqd taking almost 100%
> > (sometimes ksoftirqd/0, sometimes ksoftirqd/1).
> >
> > I tried also different tricks with scheduler (/proc/sys/kernel/sched*),
> > but it's also didn't help.
> >
> > When it freezes it looks like:
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 7 root 15 -5 0 0 0 R 64 0.0 2:47.48 ksoftirqd/1
> > 5819 root 20 0 134m 130m 596 R 57 3.3 4:36.78 globax
> > 5911 squid 20 0 1138m 1.1g 2124 R 26 28.9 2:24.87 squid
> > 10 root 15 -5 0 0 0 S 1 0.0 0:01.86 events/1
> > 6130 root 20 0 3960 2416 1592 S 0 0.1 0:08.02 oprofiled
> >
> >
> > Oprofile results:
> >
> >
> > Thats oprofile with 2.6.23-rc8 - SLUB
> >
> > 73918 21.5521 check_bytes
> > 38361 11.1848 acpi_pm_read
> > 14077 4.1044 init_object
> > 13632 3.9747 ip_send_reply
> > 8486 2.4742 __slab_alloc
> > 7199 2.0990 nf_iterate
> > 6718 1.9588 page_address
> > 6716 1.9582 tcp_v4_rcv
> > 6425 1.8733 __slab_free
> > 5604 1.6339 on_freelist
> >
> >
> > Thats oprofile with 2.6.23-rc8 - SLAB
> >
> > CPU: AMD64 processors, speed 2592.64 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> > unit mask of 0x00 (No unit mask) count 100000
> > samples % symbol name
> > 138991 14.0627 acpi_pm_read
> > 52401 5.3018 tcp_v4_rcv
> > 48466 4.9037 nf_iterate
> > 38043 3.8491 __slab_alloc
> > 34155 3.4557 ip_send_reply
> > 20963 2.1210 ip_rcv
> > 19475 1.9704 csum_partial
> > 19084 1.9309 kfree
> > 17434 1.7639 ip_output
> > 17278 1.7481 netif_receive_skb
> > 15248 1.5428 nf_hook_slow
> >
> > My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
> > enabled, it doesn't make any noticeable difference)
> >
> > Please CC me on reply, i am not in list.
>
> Could you try with SLUB but disabling CONFIG_SLUB_DEBUG ?
Hi Nick
There is no numbers for sure now, it is just instability.
I am bisecting kernel now (from 2.6.21 to 2.6.22-rc1), i think it is caused
by some patch, and maybe not related to CFS. Anyways, let's see when i finish
bisect. Just i was thinking, maybe anyone else have similar issues or some
ideas in mind.
The only one case - it is very difficult to track, and i have to do it on
live server with thousands of users. At night load is much lower, so possible
by mistake i will miss a point (it can detect bug easily, IF softirq in
mpstat 1 will jump up to 30+%. On 2.6.21 it is always staying +-10%, on
"buggy" version it jumps up to 100%. Now cause less load at night up to 30-
50%, so on hardware it is not very noticeable (thats why maybe not much
feedbacks), i am trying to utilise hardware maximum as possible - so i feel
regressions much more.
Anyway, possible i will have news after 2-3 hours max.
On Sun, 30 Sep 2007 14:25:37 +1000, Nick Piggin wrote
> Hi Denys, thanks for reporting (btw. please reply-to-all when
> replying on lkml).
>
> You say that SLAB is better than SLUB on an otherwise identical
> kernel, but I didn't see if you quantified the actual numbers? It
> sounds like there is still a regression with SLAB?
>
> On Monday 01 October 2007 03:48, Eric Dumazet wrote:
> > Denys a :
> > > I've moved recently one of my proxies(squid and some compressing
> > > application) from 2.6.21 to 2.6.22, and notice huge performance drop. I
> > > think this is important, cause it can cause serious regression on some
> > > other workloads like busy web-servers and etc.
> > >
> > > After some analysis of different options i can bring more exact numbers:
> > >
> > > 2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of
> > > traffic, and working great without any slowdown or instability.
> > >
> > > 2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic,
> > > ssh and console is "freezing" (there is delay even for typing
> > > characters).
> > >
> > > Both proxies is on identical hardware(Sun Fire X4100),
> > > configuration(small system, LFS-like, on USB flash), different only
> > > kernel.
> > >
> > > I tried to disable/enable various options and optimisations - it doesn't
> > > change anything, till i reach SLUB/SLAB option.
> > >
> > > I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded
> > > it to 2.6.23-rc8), and having same effect.
> > > Additionally, when load reaching maximum i can notice whole system
> > > slowdown, for example ssh and scp takes much more time to run, even i do
> > > nice -n -5 for them.
> > >
> > > But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and
> > > sure it slowdown other kind of network performance), but much less
> > > comparing with SLUB. On top i am seeing ksoftirqd taking almost 100%
> > > (sometimes ksoftirqd/0, sometimes ksoftirqd/1).
> > >
> > > I tried also different tricks with scheduler (/proc/sys/kernel/sched*),
> > > but it's also didn't help.
> > >
> > > When it freezes it looks like:
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > > 7 root 15 -5 0 0 0 R 64 0.0 2:47.48 ksoftirqd/1
> > > 5819 root 20 0 134m 130m 596 R 57 3.3 4:36.78 globax
> > > 5911 squid 20 0 1138m 1.1g 2124 R 26 28.9 2:24.87 squid
> > > 10 root 15 -5 0 0 0 S 1 0.0 0:01.86 events/1
> > > 6130 root 20 0 3960 2416 1592 S 0 0.1 0:08.02 oprofiled
> > >
> > >
> > > Oprofile results:
> > >
> > >
> > > Thats oprofile with 2.6.23-rc8 - SLUB
> > >
> > > 73918 21.5521 check_bytes
> > > 38361 11.1848 acpi_pm_read
> > > 14077 4.1044 init_object
> > > 13632 3.9747 ip_send_reply
> > > 8486 2.4742 __slab_alloc
> > > 7199 2.0990 nf_iterate
> > > 6718 1.9588 page_address
> > > 6716 1.9582 tcp_v4_rcv
> > > 6425 1.8733 __slab_free
> > > 5604 1.6339 on_freelist
> > >
> > >
> > > Thats oprofile with 2.6.23-rc8 - SLAB
> > >
> > > CPU: AMD64 processors, speed 2592.64 MHz (estimated)
> > > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> > > unit mask of 0x00 (No unit mask) count 100000
> > > samples % symbol name
> > > 138991 14.0627 acpi_pm_read
> > > 52401 5.3018 tcp_v4_rcv
> > > 48466 4.9037 nf_iterate
> > > 38043 3.8491 __slab_alloc
> > > 34155 3.4557 ip_send_reply
> > > 20963 2.1210 ip_rcv
> > > 19475 1.9704 csum_partial
> > > 19084 1.9309 kfree
> > > 17434 1.7639 ip_output
> > > 17278 1.7481 netif_receive_skb
> > > 15248 1.5428 nf_hook_slow
> > >
> > > My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
> > > enabled, it doesn't make any noticeable difference)
> > >
> > > Please CC me on reply, i am not in list.
> >
> > Could you try with SLUB but disabling CONFIG_SLUB_DEBUG ?
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Hi
I got
pi linux-git # git bisect bad
Bisecting: 0 revisions left to test after this
[f85958151900f9d30fa5ff941b0ce71eaa45a7de] [NET]: random functions can use
nsec resolution instead of usec
I will make sure and will try to reverse this patch on 2.6.22
But it seems "that's it".
On Sun, 30 Sep 2007 14:25:37 +1000, Nick Piggin wrote
> Hi Denys, thanks for reporting (btw. please reply-to-all when
> replying on lkml).
>
> You say that SLAB is better than SLUB on an otherwise identical
> kernel, but I didn't see if you quantified the actual numbers? It
> sounds like there is still a regression with SLAB?
>
> On Monday 01 October 2007 03:48, Eric Dumazet wrote:
> > Denys a :
> > > I've moved recently one of my proxies(squid and some compressing
> > > application) from 2.6.21 to 2.6.22, and notice huge performance drop. I
> > > think this is important, cause it can cause serious regression on some
> > > other workloads like busy web-servers and etc.
> > >
> > > After some analysis of different options i can bring more exact numbers:
> > >
> > > 2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of
> > > traffic, and working great without any slowdown or instability.
> > >
> > > 2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic,
> > > ssh and console is "freezing" (there is delay even for typing
> > > characters).
> > >
> > > Both proxies is on identical hardware(Sun Fire X4100),
> > > configuration(small system, LFS-like, on USB flash), different only
> > > kernel.
> > >
> > > I tried to disable/enable various options and optimisations - it doesn't
> > > change anything, till i reach SLUB/SLAB option.
> > >
> > > I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded
> > > it to 2.6.23-rc8), and having same effect.
> > > Additionally, when load reaching maximum i can notice whole system
> > > slowdown, for example ssh and scp takes much more time to run, even i do
> > > nice -n -5 for them.
> > >
> > > But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and
> > > sure it slowdown other kind of network performance), but much less
> > > comparing with SLUB. On top i am seeing ksoftirqd taking almost 100%
> > > (sometimes ksoftirqd/0, sometimes ksoftirqd/1).
> > >
> > > I tried also different tricks with scheduler (/proc/sys/kernel/sched*),
> > > but it's also didn't help.
> > >
> > > When it freezes it looks like:
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > > 7 root 15 -5 0 0 0 R 64 0.0 2:47.48 ksoftirqd/1
> > > 5819 root 20 0 134m 130m 596 R 57 3.3 4:36.78 globax
> > > 5911 squid 20 0 1138m 1.1g 2124 R 26 28.9 2:24.87 squid
> > > 10 root 15 -5 0 0 0 S 1 0.0 0:01.86 events/1
> > > 6130 root 20 0 3960 2416 1592 S 0 0.1 0:08.02 oprofiled
> > >
> > >
> > > Oprofile results:
> > >
> > >
> > > Thats oprofile with 2.6.23-rc8 - SLUB
> > >
> > > 73918 21.5521 check_bytes
> > > 38361 11.1848 acpi_pm_read
> > > 14077 4.1044 init_object
> > > 13632 3.9747 ip_send_reply
> > > 8486 2.4742 __slab_alloc
> > > 7199 2.0990 nf_iterate
> > > 6718 1.9588 page_address
> > > 6716 1.9582 tcp_v4_rcv
> > > 6425 1.8733 __slab_free
> > > 5604 1.6339 on_freelist
> > >
> > >
> > > Thats oprofile with 2.6.23-rc8 - SLAB
> > >
> > > CPU: AMD64 processors, speed 2592.64 MHz (estimated)
> > > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> > > unit mask of 0x00 (No unit mask) count 100000
> > > samples % symbol name
> > > 138991 14.0627 acpi_pm_read
> > > 52401 5.3018 tcp_v4_rcv
> > > 48466 4.9037 nf_iterate
> > > 38043 3.8491 __slab_alloc
> > > 34155 3.4557 ip_send_reply
> > > 20963 2.1210 ip_rcv
> > > 19475 1.9704 csum_partial
> > > 19084 1.9309 kfree
> > > 17434 1.7639 ip_output
> > > 17278 1.7481 netif_receive_skb
> > > 15248 1.5428 nf_hook_slow
> > >
> > > My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
> > > enabled, it doesn't make any noticeable difference)
> > >
> > > Please CC me on reply, i am not in list.
> >
> > Could you try with SLUB but disabling CONFIG_SLUB_DEBUG ?
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Hi
I can confirm, that reversing this patch on 2.6.23-rc8 fixes the problem.
Checked on 3 different servers.
softirq not jumping anymore on top of "top" and mpstat also looks stable,
reasonable and nice.
pi linux-git # git bisect bad
Bisecting: 0 revisions left to test after this
[f85958151900f9d30fa5ff941b0ce71eaa45a7de] [NET]: random functions can use
nsec resolution instead of usec
On Sun, 30 Sep 2007 14:25:37 +1000, Nick Piggin wrote
> Hi Denys, thanks for reporting (btw. please reply-to-all when
> replying on lkml).
>
> You say that SLAB is better than SLUB on an otherwise identical
> kernel, but I didn't see if you quantified the actual numbers? It
> sounds like there is still a regression with SLAB?
>
> On Monday 01 October 2007 03:48, Eric Dumazet wrote:
> > Denys a :
> > > I've moved recently one of my proxies(squid and some compressing
> > > application) from 2.6.21 to 2.6.22, and notice huge performance drop. I
> > > think this is important, cause it can cause serious regression on some
> > > other workloads like busy web-servers and etc.
> > >
> > > After some analysis of different options i can bring more exact numbers:
> > >
> > > 2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of
> > > traffic, and working great without any slowdown or instability.
> > >
> > > 2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic,
> > > ssh and console is "freezing" (there is delay even for typing
> > > characters).
> > >
> > > Both proxies is on identical hardware(Sun Fire X4100),
> > > configuration(small system, LFS-like, on USB flash), different only
> > > kernel.
> > >
> > > I tried to disable/enable various options and optimisations - it doesn't
> > > change anything, till i reach SLUB/SLAB option.
> > >
> > > I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded
> > > it to 2.6.23-rc8), and having same effect.
> > > Additionally, when load reaching maximum i can notice whole system
> > > slowdown, for example ssh and scp takes much more time to run, even i do
> > > nice -n -5 for them.
> > >
> > > But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and
> > > sure it slowdown other kind of network performance), but much less
> > > comparing with SLUB. On top i am seeing ksoftirqd taking almost 100%
> > > (sometimes ksoftirqd/0, sometimes ksoftirqd/1).
> > >
> > > I tried also different tricks with scheduler (/proc/sys/kernel/sched*),
> > > but it's also didn't help.
> > >
> > > When it freezes it looks like:
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > > 7 root 15 -5 0 0 0 R 64 0.0 2:47.48 ksoftirqd/1
> > > 5819 root 20 0 134m 130m 596 R 57 3.3 4:36.78 globax
> > > 5911 squid 20 0 1138m 1.1g 2124 R 26 28.9 2:24.87 squid
> > > 10 root 15 -5 0 0 0 S 1 0.0 0:01.86 events/1
> > > 6130 root 20 0 3960 2416 1592 S 0 0.1 0:08.02 oprofiled
> > >
> > >
> > > Oprofile results:
> > >
> > >
> > > Thats oprofile with 2.6.23-rc8 - SLUB
> > >
> > > 73918 21.5521 check_bytes
> > > 38361 11.1848 acpi_pm_read
> > > 14077 4.1044 init_object
> > > 13632 3.9747 ip_send_reply
> > > 8486 2.4742 __slab_alloc
> > > 7199 2.0990 nf_iterate
> > > 6718 1.9588 page_address
> > > 6716 1.9582 tcp_v4_rcv
> > > 6425 1.8733 __slab_free
> > > 5604 1.6339 on_freelist
> > >
> > >
> > > Thats oprofile with 2.6.23-rc8 - SLAB
> > >
> > > CPU: AMD64 processors, speed 2592.64 MHz (estimated)
> > > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> > > unit mask of 0x00 (No unit mask) count 100000
> > > samples % symbol name
> > > 138991 14.0627 acpi_pm_read
> > > 52401 5.3018 tcp_v4_rcv
> > > 48466 4.9037 nf_iterate
> > > 38043 3.8491 __slab_alloc
> > > 34155 3.4557 ip_send_reply
> > > 20963 2.1210 ip_rcv
> > > 19475 1.9704 csum_partial
> > > 19084 1.9309 kfree
> > > 17434 1.7639 ip_output
> > > 17278 1.7481 netif_receive_skb
> > > 15248 1.5428 nf_hook_slow
> > >
> > > My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
> > > enabled, it doesn't make any noticeable difference)
> > >
> > > Please CC me on reply, i am not in list.
> >
> > Could you try with SLUB but disabling CONFIG_SLUB_DEBUG ?
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Denys a écrit :
> Hi
>
> I got
>
> pi linux-git # git bisect bad
> Bisecting: 0 revisions left to test after this
> [f85958151900f9d30fa5ff941b0ce71eaa45a7de] [NET]: random functions can use
> nsec resolution instead of usec
>
> I will make sure and will try to reverse this patch on 2.6.22
>
> But it seems "that's it".
Well... thats interesting...
No problem here on bigger servers, so I CC David Miller and netdev on this one.
AFAIK do_gettimeofday() and ktime_get_real() should use the same underlying
hardware functions on PC and no performance problem should happen here.
(relevant part of this patch :
@ -1521,7 +1515,6 @@ __u32 secure_ip_id(__be32 daddr)
__u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
__be16 sport, __be16 dport)
{
- struct timeval tv;
__u32 seq;
__u32 hash[4];
struct keydata *keyptr = get_keyptr();
@@ -1543,12 +1536,11 @@ __u32 secure_tcp_sequence_number(__be32 saddr, __be32
daddr,
* As close as possible to RFC 793, which
* suggests using a 250 kHz clock.
* Further reading shows this assumes 2 Mb/s networks.
- * For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
+ * For 10 Gb/s Ethernet, a 1 GHz clock is appropriate.
* That's funny, Linux has one built in! Use it!
* (Networks are faster now - should this be increased?)
*/
- do_gettimeofday(&tv);
- seq += tv.tv_usec + tv.tv_sec * 1000000;
+ seq += ktime_get_real().tv64;
Thank you for doing this research.
>
>
> On Sun, 30 Sep 2007 14:25:37 +1000, Nick Piggin wrote
>> Hi Denys, thanks for reporting (btw. please reply-to-all when
>> replying on lkml).
>>
>> You say that SLAB is better than SLUB on an otherwise identical
>> kernel, but I didn't see if you quantified the actual numbers? It
>> sounds like there is still a regression with SLAB?
>>
>> On Monday 01 October 2007 03:48, Eric Dumazet wrote:
>>> Denys a :
>>>> I've moved recently one of my proxies(squid and some compressing
>>>> application) from 2.6.21 to 2.6.22, and notice huge performance drop. I
>>>> think this is important, cause it can cause serious regression on some
>>>> other workloads like busy web-servers and etc.
>>>>
>>>> After some analysis of different options i can bring more exact numbers:
>>>>
>>>> 2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of
>>>> traffic, and working great without any slowdown or instability.
>>>>
>>>> 2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic,
>>>> ssh and console is "freezing" (there is delay even for typing
>>>> characters).
>>>>
>>>> Both proxies is on identical hardware(Sun Fire X4100),
>>>> configuration(small system, LFS-like, on USB flash), different only
>>>> kernel.
>>>>
>>>> I tried to disable/enable various options and optimisations - it doesn't
>>>> change anything, till i reach SLUB/SLAB option.
>>>>
>>>> I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded
>>>> it to 2.6.23-rc8), and having same effect.
>>>> Additionally, when load reaching maximum i can notice whole system
>>>> slowdown, for example ssh and scp takes much more time to run, even i do
>>>> nice -n -5 for them.
>>>>
>>>> But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and
>>>> sure it slowdown other kind of network performance), but much less
>>>> comparing with SLUB. On top i am seeing ksoftirqd taking almost 100%
>>>> (sometimes ksoftirqd/0, sometimes ksoftirqd/1).
>>>>
>>>> I tried also different tricks with scheduler (/proc/sys/kernel/sched*),
>>>> but it's also didn't help.
>>>>
>>>> When it freezes it looks like:
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>> 7 root 15 -5 0 0 0 R 64 0.0 2:47.48 ksoftirqd/1
>>>> 5819 root 20 0 134m 130m 596 R 57 3.3 4:36.78 globax
>>>> 5911 squid 20 0 1138m 1.1g 2124 R 26 28.9 2:24.87 squid
>>>> 10 root 15 -5 0 0 0 S 1 0.0 0:01.86 events/1
>>>> 6130 root 20 0 3960 2416 1592 S 0 0.1 0:08.02 oprofiled
>>>>
>>>>
>>>> Oprofile results:
>>>>
>>>>
>>>> Thats oprofile with 2.6.23-rc8 - SLUB
>>>>
>>>> 73918 21.5521 check_bytes
>>>> 38361 11.1848 acpi_pm_read
>>>> 14077 4.1044 init_object
>>>> 13632 3.9747 ip_send_reply
>>>> 8486 2.4742 __slab_alloc
>>>> 7199 2.0990 nf_iterate
>>>> 6718 1.9588 page_address
>>>> 6716 1.9582 tcp_v4_rcv
>>>> 6425 1.8733 __slab_free
>>>> 5604 1.6339 on_freelist
>>>>
>>>>
>>>> Thats oprofile with 2.6.23-rc8 - SLAB
>>>>
>>>> CPU: AMD64 processors, speed 2592.64 MHz (estimated)
>>>> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
>>>> unit mask of 0x00 (No unit mask) count 100000
>>>> samples % symbol name
>>>> 138991 14.0627 acpi_pm_read
>>>> 52401 5.3018 tcp_v4_rcv
>>>> 48466 4.9037 nf_iterate
>>>> 38043 3.8491 __slab_alloc
>>>> 34155 3.4557 ip_send_reply
>>>> 20963 2.1210 ip_rcv
>>>> 19475 1.9704 csum_partial
>>>> 19084 1.9309 kfree
>>>> 17434 1.7639 ip_output
>>>> 17278 1.7481 netif_receive_skb
>>>> 15248 1.5428 nf_hook_slow
>>>>
>>>> My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
>>>> enabled, it doesn't make any noticeable difference)
>>>>
>>>> Please CC me on reply, i am not in list.
>>> Could you try with SLUB but disabling CONFIG_SLUB_DEBUG ?
>
>
> --
> Denys Fedoryshchenko
> Technical Manager
> Virtual ISP S.A.L.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
Just a bit more details about hardware:
Sun Fire X4100 (AMD Opteron 252), chipset looks like AMD-8111/AMD-8131 chips.
There is no HPET detected, and by default acpi_pm used, which is seems more
CPU intensive(based on oprofile results) than TSC. Choosing TSC over /sys
doesn't make much difference.
Workload is http requests coming from customers, noticeable slowdown
happening on >250 requests/second, then routed to squid, then marked at squid
ToS and depends on ToS routed over netfilter to required TCP port. At
required TCP port "satellite accelerator" intercepting connection (so it is
loopback, and it means "double" tcp work), and encapsulate stuff to UDP. Sure
receiving UDP (stream around 10-20Mbit/s), and decapsulating ... and same way
but in reverse to customer. So summary:
2x Opteron 2.4 Ghz
Processing incoming TCP requests rate 250-500 req/s
Incoming TCP bandwidth around 3-5 Mbit/s
Outgoing TCP bandwidth around 33-35Mbit/s
Internally routed TCP also around this number (-cached content)
Incoming UDP bandwidth arounn 20-25Mbit/s
Outgoing UDP bandwidth around 500Kbit/s
At 2x Dual Core Opteron 2.6 Ghz (means total 4 cores) and similar load i
cannot notice big slowdown, so it is noticable only when hardware used nearby
it's limits i think. BUT! I can notice in mpstat spikes of softirq, from
normal value 7-8% to 50-60%, but it didn't cause any noticeable slowdown on
ssh or system operations. I am expecting 2.6.21 to serve 600-700 req/s. Maybe
there is some additional overhead in this calculations, or improper irq
locking? I am not guru in such stuff.
I guess who have similar workload can check mpstat 1, and see if there is
spikes in soft% or not.
>Denys a :
>> Hi
>>
>> I got
>>
>> pi linux-git # git bisect bad
>> Bisecting: 0 revisions left to test after this
>> [f85958151900f9d30fa5ff941b0ce71eaa45a7de] [NET]: random functions can use
>> nsec resolution instead of usec
>>
>> I will make sure and will try to reverse this patch on 2.6.22
>>
>> But it seems "that's it".
>
>Well... thats interesting...
>
>No problem here on bigger servers, so I CC David Miller and netdev on this
>one.
>
>AFAIK do_gettimeofday() and ktime_get_real() should use the same underlying
>hardware functions on PC and no performance problem should happen here.
>
>(relevant part of this patch :
>
>@ -1521,7 +1515,6 @@ __u32 secure_ip_id(__be32 daddr)
> __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
> __be16 sport, __be16 dport)
> {
>- struct timeval tv;
> __u32 seq;
> __u32 hash[4];
> struct keydata *keyptr = get_keyptr();
>@@ -1543,12 +1536,11 @@ __u32 secure_tcp_sequence_number(__be32 saddr,
>__be32
>daddr,
> * As close as possible to RFC 793, which
> * suggests using a 250 kHz clock.
> * Further reading shows this assumes 2 Mb/s networks.
>- * For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
>+ * For 10 Gb/s Ethernet, a 1 GHz clock is appropriate.
> * That's funny, Linux has one built in! Use it!
> * (Networks are faster now - should this be increased?)
> */
>- do_gettimeofday(&tv);
>- seq += tv.tv_usec + tv.tv_sec * 1000000;
>+ seq += ktime_get_real().tv64;
>
>
>Thank you for doing this research.
>
>>
>>
>> On Sun, 30 Sep 2007 14:25:37 +1000, Nick Piggin wrote
>>> Hi Denys, thanks for reporting (btw. please reply-to-all when
>>> replying on lkml).
>>>
>>> You say that SLAB is better than SLUB on an otherwise identical
>>> kernel, but I didn't see if you quantified the actual numbers? It
>>> sounds like there is still a regression with SLAB?
>>>
>>> On Monday 01 October 2007 03:48, Eric Dumazet wrote:
>>>> Denys a :
>>>>> I've moved recently one of my proxies(squid and some compressing
>>>>> application) from 2.6.21 to 2.6.22, and notice huge performance drop. I
>>>>> think this is important, cause it can cause serious regression on some
>>>>> other workloads like busy web-servers and etc.
>>>>>
>>>>> After some analysis of different options i can bring more exact numbers:
>>>>>
>>>>> 2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of
>>>>> traffic, and working great without any slowdown or instability.
>>>>>
>>>>> 2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic,
>>>>> ssh and console is "freezing" (there is delay even for typing
>>>>> characters).
>>>>>
>>>>> Both proxies is on identical hardware(Sun Fire X4100),
>>>>> configuration(small system, LFS-like, on USB flash), different only
>>>>> kernel.
>>>>>
>>>>> I tried to disable/enable various options and optimisations - it doesn't
>>>>> change anything, till i reach SLUB/SLAB option.
>>>>>
>>>>> I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded
>>>>> it to 2.6.23-rc8), and having same effect.
>>>>> Additionally, when load reaching maximum i can notice whole system
>>>>> slowdown, for example ssh and scp takes much more time to run, even i do
>>>>> nice -n -5 for them.
>>>>>
>>>>> But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and
>>>>> sure it slowdown other kind of network performance), but much less
>>>>> comparing with SLUB. On top i am seeing ksoftirqd taking almost 100%
>>>>> (sometimes ksoftirqd/0, sometimes ksoftirqd/1).
>>>>>
>>>>> I tried also different tricks with scheduler (/proc/sys/kernel/sched*),
>>>>> but it's also didn't help.
>>>>>
>>>>> When it freezes it looks like:
>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>>> 7 root 15 -5 0 0 0 R 64 0.0 2:47.48 ksoftirqd/1
>>>>> 5819 root 20 0 134m 130m 596 R 57 3.3 4:36.78 globax
>>>>> 5911 squid 20 0 1138m 1.1g 2124 R 26 28.9 2:24.87 squid
>>>>> 10 root 15 -5 0 0 0 S 1 0.0 0:01.86 events/1
>>>>> 6130 root 20 0 3960 2416 1592 S 0 0.1 0:08.02 oprofiled
>>>>>
>>>>>
>>>>> Oprofile results:
>>>>>
>>>>>
>>>>> Thats oprofile with 2.6.23-rc8 - SLUB
>>>>>
>>>>> 73918 21.5521 check_bytes
>>>>> 38361 11.1848 acpi_pm_read
>>>>> 14077 4.1044 init_object
>>>>> 13632 3.9747 ip_send_reply
>>>>> 8486 2.4742 __slab_alloc
>>>>> 7199 2.0990 nf_iterate
>>>>> 6718 1.9588 page_address
>>>>> 6716 1.9582 tcp_v4_rcv
>>>>> 6425 1.8733 __slab_free
>>>>> 5604 1.6339 on_freelist
>>>>>
>>>>>
>>>>> Thats oprofile with 2.6.23-rc8 - SLAB
>>>>>
>>>>> CPU: AMD64 processors, speed 2592.64 MHz (estimated)
>>>>> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
>>>>> unit mask of 0x00 (No unit mask) count 100000
>>>>> samples % symbol name
>>>>> 138991 14.0627 acpi_pm_read
>>>>> 52401 5.3018 tcp_v4_rcv
>>>>> 48466 4.9037 nf_iterate
>>>>> 38043 3.8491 __slab_alloc
>>>>> 34155 3.4557 ip_send_reply
>>>>> 20963 2.1210 ip_rcv
>>>>> 19475 1.9704 csum_partial
>>>>> 19084 1.9309 kfree
>>>>> 17434 1.7639 ip_output
>>>>> 17278 1.7481 netif_receive_skb
>>>>> 15248 1.5428 nf_hook_slow
>>>>>
>>>>> My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
>>>>> enabled, it doesn't make any noticeable difference)
>>>>>
>>>>> Please CC me on reply, i am not in list.
>>>> Could you try with SLUB but disabling CONFIG_SLUB_DEBUG ?
>>
>>
>> --
>> Denys Fedoryshchenko
>> Technical Manager
>> Virtual ISP S.A.L.
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>>
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
From: Eric Dumazet <[email protected]>
Date: Mon, 01 Oct 2007 07:59:12 +0200
> No problem here on bigger servers, so I CC David Miller and netdev
> on this one. AFAIK do_gettimeofday() and ktime_get_real() should
> use the same underlying hardware functions on PC and no performance
> problem should happen here.
One thing that jumps out at me is that on 32-bit (and to a certain
extent on 64-bit) there is a lot of stack accesses and missed
optimizations because all of the work occurs, and gets expanded,
inside of ktime_get_real().
The timespec_to_ktime() inside of there constructs the ktime_t return
value on the stack, then returns that as an aggregate to the caller.
That cannot be without some cost.
ktime_get_real() is definitely a candidate for inlining especially in
these kinds of cases where we'll happily get computations in local
registers instead of all of this on-stack nonsense. And in several
cases (if the caller only needs the tv_sec value, for example)
computations can be elided entirely.
It would be constructive to experiment and see if this is in fact part
of the problem.
Well, i can play a bit more on "live" servers. I have now hot-swap server with
full gentoo, where i can rebuild any kernel you want, with any applied patch.
But it looks more like not overhead, load becoming high too "spiky", and it is
not just permantenly higher. Also it is not normal that all system becoming
unresposive (for example ping 127.0.0.1 becoming 300ms for period, when usage
softirq jumps to 100%).
On Mon, 01 Oct 2007 00:12:59 -0700 (PDT), David Miller wrote
> From: Eric Dumazet <[email protected]>
> Date: Mon, 01 Oct 2007 07:59:12 +0200
>
> > No problem here on bigger servers, so I CC David Miller and netdev
> > on this one. AFAIK do_gettimeofday() and ktime_get_real() should
> > use the same underlying hardware functions on PC and no performance
> > problem should happen here.
>
> One thing that jumps out at me is that on 32-bit (and to a certain
> extent on 64-bit) there is a lot of stack accesses and missed
> optimizations because all of the work occurs, and gets expanded,
> inside of ktime_get_real().
>
> The timespec_to_ktime() inside of there constructs the ktime_t return
> value on the stack, then returns that as an aggregate to the caller.
>
> That cannot be without some cost.
>
> ktime_get_real() is definitely a candidate for inlining especially in
> these kinds of cases where we'll happily get computations in local
> registers instead of all of this on-stack nonsense. And in several
> cases (if the caller only needs the tv_sec value, for example)
> computations can be elided entirely.
>
> It would be constructive to experiment and see if this is in fact
> part of the problem.
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Denys a écrit :
> Well, i can play a bit more on "live" servers. I have now hot-swap server with
> full gentoo, where i can rebuild any kernel you want, with any applied patch.
> But it looks more like not overhead, load becoming high too "spiky", and it is
> not just permantenly higher. Also it is not normal that all system becoming
> unresposive (for example ping 127.0.0.1 becoming 300ms for period, when usage
> softirq jumps to 100%).
>
>
Could you try a pristine 2.6.22.9 and some patch in
secure_tcp_sequence_number() like :
--- drivers/char/random.c.orig 2007-10-01 10:18:42.000000000 +0200
+++ drivers/char/random.c 2007-10-01 10:19:58.000000000 +0200
@@ -1554,7 +1554,7 @@
* That's funny, Linux has one built in! Use it!
* (Networks are faster now - should this be increased?)
*/
- seq += ktime_get_real().tv64;
+ seq += ktime_get_real().tv64 / 1000;
#if 0
printk("init_seq(%lx, %lx, %d, %d) = %d\n",
saddr, daddr, sport, dport, seq);
Thank you
> On Mon, 01 Oct 2007 00:12:59 -0700 (PDT), David Miller wrote
>
>> From: Eric Dumazet <[email protected]>
>> Date: Mon, 01 Oct 2007 07:59:12 +0200
>>
>>
>>> No problem here on bigger servers, so I CC David Miller and netdev
>>> on this one. AFAIK do_gettimeofday() and ktime_get_real() should
>>> use the same underlying hardware functions on PC and no performance
>>> problem should happen here.
>>>
>> One thing that jumps out at me is that on 32-bit (and to a certain
>> extent on 64-bit) there is a lot of stack accesses and missed
>> optimizations because all of the work occurs, and gets expanded,
>> inside of ktime_get_real().
>>
>> The timespec_to_ktime() inside of there constructs the ktime_t return
>> value on the stack, then returns that as an aggregate to the caller.
>>
>> That cannot be without some cost.
>>
>> ktime_get_real() is definitely a candidate for inlining especially in
>> these kinds of cases where we'll happily get computations in local
>> registers instead of all of this on-stack nonsense. And in several
>> cases (if the caller only needs the tv_sec value, for example)
>> computations can be elided entirely.
>>
>> It would be constructive to experiment and see if this is in fact
>> part of the problem.
>>
>
>
> --
> Denys Fedoryshchenko
> Technical Manager
> Virtual ISP S.A.L.
>
>
>
Eric Dumazet a écrit :
> Denys a écrit :
>> Well, i can play a bit more on "live" servers. I have now hot-swap
>> server with
>> full gentoo, where i can rebuild any kernel you want, with any
>> applied patch.
>> But it looks more like not overhead, load becoming high too "spiky",
>> and it is
>> not just permantenly higher. Also it is not normal that all system
>> becoming
>> unresposive (for example ping 127.0.0.1 becoming 300ms for period,
>> when usage
>> softirq jumps to 100%).
>>
>>
> Could you try a pristine 2.6.22.9 and some patch in
> secure_tcp_sequence_number() like :
>
> --- drivers/char/random.c.orig 2007-10-01 10:18:42.000000000 +0200
> +++ drivers/char/random.c 2007-10-01 10:19:58.000000000 +0200
> @@ -1554,7 +1554,7 @@
> * That's funny, Linux has one built in! Use it!
> * (Networks are faster now - should this be increased?)
> */
> - seq += ktime_get_real().tv64;
> + seq += ktime_get_real().tv64 / 1000;
> #if 0
> printk("init_seq(%lx, %lx, %d, %d) = %d\n",
> saddr, daddr, sport, dport, seq);
On 32 bits machine, replace the divide by a shift to avoid a linker
error (undefined reference to `__divdi3'):
seq += ktime_get_real().tv64 >> 10;
"Denys" <[email protected]> writes:
> by mistake i will miss a point (it can detect bug easily, IF softirq in
> mpstat 1 will jump up to 30+%. On 2.6.21 it is always staying +-10%, on
> "buggy" version it jumps up to 100%. Now cause less load at night up to 30-
> 50%, so on hardware it is not very noticeable (thats why maybe not much
> feedbacks), i am trying to utilise hardware maximum as possible - so i feel
> regressions much more.
You could use oprofile to find where the CPU time is spent
-Andi
Already i did, and it didn't show real failure point.
On 01 Oct 2007 12:01:24 +0200, Andi Kleen wrote
> "Denys" <[email protected]> writes:
>
> > by mistake i will miss a point (it can detect bug easily, IF softirq in
> > mpstat 1 will jump up to 30+%. On 2.6.21 it is always staying +-10%, on
> > "buggy" version it jumps up to 100%. Now cause less load at night up to 30-
> > 50%, so on hardware it is not very noticeable (thats why maybe not much
> > feedbacks), i am trying to utilise hardware maximum as possible - so i feel
> > regressions much more.
>
> You could use oprofile to find where the CPU time is spent
>
> -Andi
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
On Mon, Oct 01, 2007 at 01:30:52PM +0300, Denys wrote:
>
> Already i did, and it didn't show real failure point.
There was no difference between the profile of the working kernel
and the high load kernel? Perhaps you just have legitimate higher load?
-Andi
Hi
No major differences.
Some procedures higher a bit, some lower a bit, but everything looks fine on
both kernels.
Thats what i tried first - oprofile, and only one keypoint i was able to catch
- in top softirqd/0 or /1 shows usage 100% cpu, and in mpstat - soft% also
reaching 100%.
On Mon, 1 Oct 2007 13:14:41 +0200, Andi Kleen wrote
> On Mon, Oct 01, 2007 at 01:30:52PM +0300, Denys wrote:
> >
> > Already i did, and it didn't show real failure point.
>
> There was no difference between the profile of the working kernel
> and the high load kernel? Perhaps you just have legitimate higher load?
>
> -Andi
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
On Mon, Oct 01, 2007 at 02:52:03PM +0300, Denys wrote:
> Hi
>
> No major differences.
> Some procedures higher a bit, some lower a bit, but everything looks fine on
> both kernels.
>
> Thats what i tried first - oprofile, and only one keypoint i was able to catch
> - in top softirqd/0 or /1 shows usage 100% cpu, and in mpstat - soft% also
> reaching 100%.
Then you likely just have higher load. What makes you think this
is a kernel problem?
-Andi
That 2.6.21 works fine and 2.6.22 MUCH worse on multiple servers with similar
workload.
Plus all userspace becoming extremely unstable, and definitely ping 127.0.0.1
must not reach 300ms.
On Mon, 1 Oct 2007 13:57:21 +0200, Andi Kleen wrote
> On Mon, Oct 01, 2007 at 02:52:03PM +0300, Denys wrote:
> > Hi
> >
> > No major differences.
> > Some procedures higher a bit, some lower a bit, but everything looks fine on
> > both kernels.
> >
> > Thats what i tried first - oprofile, and only one keypoint i was able to catch
> > - in top softirqd/0 or /1 shows usage 100% cpu, and in mpstat - soft% also
> > reaching 100%.
>
> Then you likely just have higher load. What makes you think this
> is a kernel problem?
>
> -Andi
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Not able to compile kernel with patch
drivers/built-in.o: In function `secure_tcp_sequence_number':
(.text+0x3ad02): undefined reference to `__divdi3'
make: *** [.tmp_vmlinux1] Error 1
On Mon, 01 Oct 2007 10:20:07 +0200, Eric Dumazet wrote
> Denys a :
> > Well, i can play a bit more on "live" servers. I have now hot-swap server with
> > full gentoo, where i can rebuild any kernel you want, with any applied patch.
> > But it looks more like not overhead, load becoming high too "spiky", and it is
> > not just permantenly higher. Also it is not normal that all system becoming
> > unresposive (for example ping 127.0.0.1 becoming 300ms for period, when usage
> > softirq jumps to 100%).
> >
> >
> Could you try a pristine 2.6.22.9 and some patch in
> secure_tcp_sequence_number() like :
>
> --- drivers/char/random.c.orig 2007-10-01 10:18:42.000000000 +0200
> +++ drivers/char/random.c 2007-10-01 10:19:58.000000000 +0200
> @@ -1554,7 +1554,7 @@
> * That's funny, Linux has one built in! Use it!
> * (Networks are faster now - should this be increased?)
> */
> - seq += ktime_get_real().tv64;
> + seq += ktime_get_real().tv64 / 1000;
> #if 0
> printk("init_seq(%lx, %lx, %d, %d) = %d\n",
> saddr, daddr, sport, dport, seq);
>
> Thank you
>
> > On Mon, 01 Oct 2007 00:12:59 -0700 (PDT), David Miller wrote
> >
> >> From: Eric Dumazet <[email protected]>
> >> Date: Mon, 01 Oct 2007 07:59:12 +0200
> >>
> >>
> >>> No problem here on bigger servers, so I CC David Miller and netdev
> >>> on this one. AFAIK do_gettimeofday() and ktime_get_real() should
> >>> use the same underlying hardware functions on PC and no performance
> >>> problem should happen here.
> >>>
> >> One thing that jumps out at me is that on 32-bit (and to a certain
> >> extent on 64-bit) there is a lot of stack accesses and missed
> >> optimizations because all of the work occurs, and gets expanded,
> >> inside of ktime_get_real().
> >>
> >> The timespec_to_ktime() inside of there constructs the ktime_t return
> >> value on the stack, then returns that as an aggregate to the caller.
> >>
> >> That cannot be without some cost.
> >>
> >> ktime_get_real() is definitely a candidate for inlining especially in
> >> these kinds of cases where we'll happily get computations in local
> >> registers instead of all of this on-stack nonsense. And in several
> >> cases (if the caller only needs the tv_sec value, for example)
> >> computations can be elided entirely.
> >>
> >> It would be constructive to experiment and see if this is in fact
> >> part of the problem.
> >>
> >
> >
> > --
> > Denys Fedoryshchenko
> > Technical Manager
> > Virtual ISP S.A.L.
> >
> >
> >
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Resend for maillists (was discareded cause of encoding issues as SPAM).
Everything looks fine, for sure. Confirmed on second server.
On Mon, 01 Oct 2007 10:20:07 +0200, Eric Dumazet wrote
> > Well, i can play a bit more on "live" servers. I have now hot-swap server with
> > full gentoo, where i can rebuild any kernel you want, with any applied patch.
> > But it looks more like not overhead, load becoming high too "spiky", and it is
> > not just permantenly higher. Also it is not normal that all system becoming
> > unresposive (for example ping 127.0.0.1 becoming 300ms for period, when usage
> > softirq jumps to 100%).
> >
> >
> Could you try a pristine 2.6.22.9 and some patch in
> secure_tcp_sequence_number() like :
>
> --- drivers/char/random.c.orig 2007-10-01 10:18:42.000000000 +0200
> +++ drivers/char/random.c 2007-10-01 10:19:58.000000000 +0200
> @@ -1554,7 +1554,7 @@
> * That's funny, Linux has one built in! Use it!
> * (Networks are faster now - should this be increased?)
> */
> - seq += ktime_get_real().tv64;
> + seq += ktime_get_real().tv64 / 1000;
> #if 0
> printk("init_seq(%lx, %lx, %d, %d) = %d\n",
> saddr, daddr, sport, dport, seq);
>
> Thank you
>
> > On Mon, 01 Oct 2007 00:12:59 -0700 (PDT), David Miller wrote
> >
> >> From: Eric Dumazet <[email protected]>
> >> Date: Mon, 01 Oct 2007 07:59:12 +0200
> >>
> >>
> >>> No problem here on bigger servers, so I CC David Miller and netdev
> >>> on this one. AFAIK do_gettimeofday() and ktime_get_real() should
> >>> use the same underlying hardware functions on PC and no performance
> >>> problem should happen here.
> >>>
> >> One thing that jumps out at me is that on 32-bit (and to a certain
> >> extent on 64-bit) there is a lot of stack accesses and missed
> >> optimizations because all of the work occurs, and gets expanded,
> >> inside of ktime_get_real().
> >>
> >> The timespec_to_ktime() inside of there constructs the ktime_t return
> >> value on the stack, then returns that as an aggregate to the caller.
> >>
> >> That cannot be without some cost.
> >>
> >> ktime_get_real() is definitely a candidate for inlining especially in
> >> these kinds of cases where we'll happily get computations in local
> >> registers instead of all of this on-stack nonsense. And in several
> >> cases (if the caller only needs the tv_sec value, for example)
> >> computations can be elided entirely.
> >>
> >> It would be constructive to experiment and see if this is in fact
> >> part of the problem.
> >>
> >
> >
> > --
> > Denys Fedoryshchenko
> > Technical Manager
> > Virtual ISP S.A.L.
> >
> >
> >
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
--- linux-2.6.22/drivers/char/random.c 2007-10-01 10:18:42.000000000 +0200
+++ linux-2.6.22-ed/drivers/char/random.c 2007-10-01 21:47:58.000000000 +0200
@@ -1550,11 +1550,13 @@ __u32 secure_tcp_sequence_number(__be32
* As close as possible to RFC 793, which
* suggests using a 250 kHz clock.
* Further reading shows this assumes 2 Mb/s networks.
- * For 10 Gb/s Ethernet, a 1 GHz clock is appropriate.
- * That's funny, Linux has one built in! Use it!
- * (Networks are faster now - should this be increased?)
+ * For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
+ * For 10 Gb/s Ethernet, a 1 GHz clock should be ok, but
+ * we also need to limit the resolution so that the u32 seq
+ * overlaps less than one time per MSL (2 minutes).
+ * Choosing a clock of 64 ns period is OK. (period of 274 s)
*/
- seq += ktime_get_real().tv64;
+ seq += ktime_get_real().tv64 >> 6;
#if 0
printk("init_seq(%lx, %lx, %d, %d) = %d\n",
saddr, daddr, sport, dport, seq);
From: Eric Dumazet <[email protected]>
Date: Mon, 01 Oct 2007 22:10:03 +0200
> So maybe the following patch is necessary...
>
> I believe IPV6 & DCCP are immune to this problem.
>
> Thanks again Denys for spotting this.
>
> Eric
>
> [PATCH] TCP : secure_tcp_sequence_number() should not use a too fast clock
>
> TCP V4 sequence numbers are 32bits, and RFC 793 assumed a 250 KHz clock.
> In order to follow network speed increase, we can use a faster clock, but
> we should limit this clock so that the delay between two rollovers is
> greater than MSL (TCP Maximum Segment Lifetime : 2 minutes)
>
> Choosing a 64 nsec clock should be OK, since the rollovers occur every
> 274 seconds.
>
> Problem spotted by Denys Fedoryshchenko
>
> Signed-off-by: Eric Dumazet <[email protected]>
Thanks a lot Eric for bringing closure to this.
I'll apply this and add a reference in the commit message to the
changeset that introduced this problem, since it might help
others who look at this.