ksymoops 2.4.1 on alpha 2.4.14-pre6. Options used
-v /home/jgarzik/cvs/linux_2_4/vmlinux (specified)
-k /proc/ksyms (specified)
-l /proc/modules (specified)
-o /lib/modules/2.4.14-pre6/ (specified)
-m /boot/System.map-2.4.14-pre6 (specified)
Warning (compare_maps): mismatch on symbol packet_socks_nr , af_packet says fffffffc002c2584, /lib/modules/2.4.14-pre6/kernel/net/packet/af_packet.o says fffffffc002be004. Ignoring /lib/modules/2.4.14-pre6/kernel/net/packet/af_packet.o entry
kernel BUG at page_alloc.c:82!
rpmd(20216): Kernel Bug 1
pc = [__free_pages_ok+288/1040] ra = [__free_pages_ok+276/1040] ps = 0000 Tainted: P
pc = [<fffffc0000841b50>] ra = [<fffffc0000841b44>] ps = 0000 Tainted: P
Using defaults from ksymoops -t elf64-alpha -a alpha
v0 = 000000000000001f t0 = 0000000000000001 t1 = fffffc0016a0fec8
t2 = 0000000000000056 t3 = 0000000000000001 t4 = fffffc0000a3fcf8
t5 = fffffc0017d5a2c0 t6 = 0000000000000065 t7 = fffffc0008624000
a0 = 0000000000000000 a1 = 0000000000000001 a2 = 0000000000000001
a3 = 0000000000000001 a4 = 000000000000000e a5 = 0000000000000001
t8 = fffffc0000a407f0 t9 = 0000000000004000 t10= fffffc0000a407f8
t11= fffffc0000a40808 pv = fffffc000081e1a0 at = 0000000000000000
gp = fffffc0000a3a8b8 sp = fffffc0008627d58
Trace:fffffc0000836160 fffffc0000835e28 fffffc0000836160 fffffc0000836338 fffffc00008364b4 fffffc0000865768 fffffc0000862e3c fffffc0000859c30 fffffc0000859dc0 fffffc0000810b10
Code: 225f0052 a77da6e0 6b5b4083 27ba0020 23bd8d74 00000081 <a04a0000> 44481001
>>PC; fffffc0000841b50 <__free_pages_ok+120/410> <=====
Trace; fffffc0000836160 <truncate_complete_page+90/b0>
Trace; fffffc0000835e28 <remove_inode_page+48/60>
Trace; fffffc0000836160 <truncate_complete_page+90/b0>
Trace; fffffc0000836338 <truncate_list_pages+1b8/2d0>
Trace; fffffc00008364b4 <truncate_inode_pages+64/d0>
Trace; fffffc0000865768 <iput+108/270>
Trace; fffffc0000862e3c <d_delete+7c/c0>
Trace; fffffc0000859c30 <vfs_unlink+190/230>
Trace; fffffc0000859dc0 <sys_unlink+f0/1f0>
Trace; fffffc0000810b10 <entSys+a8/c0>
Code; fffffc0000841b38 <__free_pages_ok+108/410>
0000000000000000 <_PC>:
Code; fffffc0000841b38 <__free_pages_ok+108/410>
0: 52 00 5f 22 lda a2,82(zero)
Code; fffffc0000841b3c <__free_pages_ok+10c/410>
4: e0 a6 7d a7 ldq t12,-22816(gp)
Code; fffffc0000841b40 <__free_pages_ok+110/410>
8: 83 40 5b 6b jsr ra,(t12),218 <_PC+0x218> fffffc0000841d50 <__free_pages_ok+320/410>
Code; fffffc0000841b44 <__free_pages_ok+114/410>
c: 20 00 ba 27 ldah gp,32(ra)
Code; fffffc0000841b48 <__free_pages_ok+118/410>
10: 74 8d bd 23 lda gp,-29324(gp)
Code; fffffc0000841b4c <__free_pages_ok+11c/410>
14: 81 00 00 00 call_pal 0x81
Code; fffffc0000841b50 <__free_pages_ok+120/410> <=====
18: 00 00 4a a0 ldl t1,0(s1) <=====
Code; fffffc0000841b54 <__free_pages_ok+124/410>
1c: 01 10 48 44 and t1,0x40,t0
1 warning issued. Results may not be reliable.
From: Jeff Garzik <[email protected]>
Date: Wed, 31 Oct 2001 12:25:21 -0500
2.4.14-pre6 on UP alpha, newly reformatted and reinstalled :)
Machine was nowhere near OOM, and no other adverse messages appeared in
dmesg. An rpm build and an rpm install were running in parallel.
Hmmm... the oops suggests that truncate_complete_page() gets a page
not in the page cache. The page count should be at least 2 when
it gets passed there.
Franks a lot,
David S. Miller
[email protected]
In article <[email protected]>,
David S. Miller <[email protected]> wrote:
> From: Jeff Garzik <[email protected]>
> Date: Wed, 31 Oct 2001 12:25:21 -0500
>
> 2.4.14-pre6 on UP alpha, newly reformatted and reinstalled :)
>
> Machine was nowhere near OOM, and no other adverse messages appeared in
> dmesg. An rpm build and an rpm install were running in parallel.
>
>Hmmm... the oops suggests that truncate_complete_page() gets a page
>not in the page cache. The page count should be at least 2 when
>it gets passed there.
Indeed. Looks like somebody removed the page from the page cache
without removing it from the inode queue.
Which sounds rather impossible, as the two are always done together.
Maybe it's just the page count that is buggered, and we free it too
early as a result. Is this the same machine that had interesting
trouble before?
Linus
Linus Torvalds wrote:
> Maybe it's just the page count that is buggered, and we free it too
> early as a result. Is this the same machine that had interesting
> trouble before?
yes, a UP alpha running 2.4.14-pre6, that was described in the false oom
killer report.
--
Jeff Garzik | Only so many songs can be sung
Building 1024 | with two lips, two lungs, and one tongue.
MandrakeSoft | - nomeansno
In article <[email protected]>,
Jeff Garzik <[email protected]> wrote:
>Linus Torvalds wrote:
>> Maybe it's just the page count that is buggered, and we free it too
>> early as a result. Is this the same machine that had interesting
>> trouble before?
>
>yes, a UP alpha running 2.4.14-pre6, that was described in the false oom
>killer report.
Ok, I think it's the same bug, and that we're just freeing the wrong
page somehow.
I wonder if the "VALID_PAGE(page)" macro is reliable on alpha? You seem
to be able to trigger this too easily for it to not being something
specific to the setup..
Linus
symptoms: when the firewall is freshly booted it supports >200
connections/sec, when the firewall is hit with lots of simultanious
requests it slows down, after the flood of requests stops and the machine
sits idle it never recovers back to it's origional performance. I can
drive the performance down to <50 connections/sec permanently. The CPU
is eaten up more and more by the system, leaving less time for userspace.
the single threaded requests do outperform the massivly parallel test (as
can be expected as the scheduler isn't optimized for long runqueues :-) I
am not asking to optimize things for the large runqueue, just to be able
to recover after the long runqueue had finished.
killing and restarting the proxy does not fix the problem. The only way I
have discovered to resotre performance is a reboot of the firewall.
test setup:
client---firewall---server
all boxes are 1.2GHz athlon, 512MB ram, IDE drive, D-link quad ethernet,
2GB swap space (although it never comes close to useing it)
ethernet switches are cisco 3548
firewall is running 4 proxies, each proxy accepts incoming connections,
forks to handle that connection, relays the data to/from the server and
then exits. (yes I know this is not the most efficiant way to do things,
but this _is_ a real world use)
I did a vmstat 1 on the firewall during the test and exerpts from it are
listed below.
server is running apache (maxclients set to 250 after prior runs noted
that it ran up against the 150 limit)
client is running ab (apache benchmark tool)
testing procedure:
1. boot firewall
2. get performance data
run ab -n 400 firewall:port/index.html.en twice (first to get past
initalization delays, second to get a performace value)
This does 400 sequential connections and reports the time to get the
responses
3. flood the firewall
run 5 copies of ab -n 5000 -c 200 firewall:port/index.html.en
this does 5000 connections attempting to make up to 200 connections at a
time
4. wait for all the connections to clear on the firewall (at least,
possibly longer if I am in a meeting). The idea here is to return
everything to a stable mode.
5. run ab -n 400 firewall:port/index.html and record the connections/sec
loop back to 3.
after 4 iterations of load reboot the firewall and do two more runs of ab
-n 400 to make sure that performance is back up and the slowdown is not
due to large logfiles anywhere.
at the start of the test vmstat looks like this
ab -n 200 (219 connections/sec) this is two runs sequentially
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 496744 604 8076 0 0 0 0 101 11 0 0 100
1 0 1 0 496548 604 8124 0 0 48 0 3814 1759 36 41 24
0 0 0 0 496588 604 8124 0 0 0 0 4007 1899 42 34 25
0 0 0 0 496576 604 8128 0 0 0 0 103 16 0 0 100
1 0 0 0 496452 604 8128 0 0 0 0 924 405 11 6 83
2 0 0 0 496360 604 8128 0 0 0 0 4252 2015 41 40 20
0 0 0 0 496492 604 8128 0 0 0 0 2628 1240 26 21 53
0 0 0 0 496492 604 8128 0 0 0 0 101 7 0 0 100
5xab -n 5000 -c 200 (~210 connections/sec)
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
19 0 1 0 493192 604 8124 0 0 0 0 5938 1139 55 45 0
13 0 1 0 492980 604 8124 0 0 0 0 4956 1059 58 42 0
11 0 1 0 492692 604 8124 0 0 0 0 4878 1057 45 55 0
17 0 1 0 492900 604 8124 0 0 0 0 4890 1072 57 43 0
29 0 1 0 492572 604 8124 0 0 0 0 4648 1049 46 54 0
13 0 1 0 492608 604 8124 0 0 0 0 4650 1041 50 50 0
11 0 1 0 492484 604 8124 0 0 0 0 4645 1043 38 62 0
18 0 1 0 492472 604 8128 0 0 0 0 4779 1029 56 44 0
17 0 1 0 492440 604 8128 0 0 0 0 4691 1057 46 54 0
18 0 1 0 492476 604 8128 0 0 0 0 4598 1074 54 46 0
18 0 1 0 492488 604 8128 0 0 0 0 4625 1051 53 47 0
22 0 1 0 492388 604 8128 0 0 0 0 4661 1057 50 50 0
10 0 1 0 492448 604 8128 0 0 0 0 4569 1033 56 44 0
22 0 1 0 492364 604 8128 0 0 0 0 4589 1036 48 52 0
18 0 1 0 492384 604 8128 0 0 0 0 4536 1031 48 52 0
15 0 1 0 492236 604 8128 0 0 0 0 4528 1034 43 57 0
26 0 1 0 492132 604 8128 0 0 0 0 4554 1037 50 50 0
24 0 1 0 492016 604 8128 0 0 0 0 4518 1037 48 52 0
at the end of 4 runs
5xab -n 5000 -c 200 (~40 connections/sec)
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
23 0 1 0 482356 624 8524 0 0 0 0 1053 181 7 93 0
29 0 1 0 481724 624 8528 0 0 0 0 960 206 7 93 0
23 0 1 0 482300 624 8528 0 0 0 0 1059 176 5 95 0
30 0 1 0 482104 624 8528 0 0 0 72 1095 197 9 91 0
19 0 1 0 482760 624 8528 0 0 0 0 1014 179 5 95 0
31 0 1 0 481892 624 8528 0 0 0 0 1010 198 6 94 0
16 0 1 0 482488 624 8524 0 0 0 0 1001 176 8 92 0
29 0 1 0 482236 624 8524 0 0 0 0 1037 179 9 91 0
12 0 1 0 483008 624 8528 0 0 0 0 1182 188 8 92 0
25 0 1 0 482620 624 8528 0 0 0 0 988 173 8 92 0
19 0 1 0 482696 624 8528 0 0 0 0 931 173 7 93 0
20 0 1 0 482776 624 8528 0 0 0 0 985 171 9 91 0
20 0 1 0 482116 624 8528 0 0 0 0 1122 119 5 95 0
21 0 1 0 482888 624 8528 0 0 0 0 830 144 3 97 0
16 0 1 0 481916 624 8528 0 0 0 0 1054 155 9 91 0
18 0 1 0 482892 624 8528 0 0 0 0 875 143 12 88 0
20 0 1 0 483100 624 8528 0 0 0 0 875 146 7 93 0
27 0 1 0 482428 624 8528 0 0 0 0 859 153 6 94 0
26 0 1 0 482688 624 8528 0 0 0 0 874 151 8 92 0
ab -n 400 (~48 connections/sec) this is a single run
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 483552 624 8544 0 0 0 0 103 13 0 0 100
2 0 0 0 483388 624 8544 0 0 0 0 592 225 10 38 52
1 0 0 0 483392 624 8544 0 0 0 0 1088 488 13 87 0
2 0 0 0 483392 624 8544 0 0 0 0 1108 468 10 89 1
1 0 0 0 483404 624 8548 0 0 0 0 1081 452 9 89 2
1 0 0 0 483508 624 8548 0 0 0 0 1096 486 7 90 3
2 0 1 0 483368 624 8548 0 0 0 0 1091 458 11 87 2
1 0 0 0 483364 624 8548 0 0 0 0 1085 480 13 85 2
1 0 0 0 483356 624 8552 0 0 0 0 1098 478 9 87 4
0 0 0 0 483520 624 8548 0 0 0 0 796 330 9 59 32
0 0 0 0 483520 624 8548 0 0 0 0 103 13 0 0 100
Please tell me what additional things I should try to identify what the
kernel is getting tied up doing.
I was able to duplicate this problem with 2.4.5 as well as 2.4.13. the
performance of 2.4.5 is ~4% lower then 2.4.13 in all cases.
David Lang
This problem is also happening on 2.4.14pre6.
however one thing that I did find
in this firewall setup I am useing ipchains REDIRECT to map port 1433 on
four IP addresses to ports 1434-1437 to get to the proxies.
if I do the same test, but instead of having all 5 copies of ab hitting
port 1433 I make them connect to the real port of the proxy the slowdown
does not happen (I have run 5 iterations of the load test without any
slowdown)
So this slowdown appears to be in the ipchains emulation code or the
iptables port mapping code.
I think my next step is going to be to try to use iptables instead of
ipchains to do the port mapping.
David Lang
On Wed, 31 Oct 2001, David Lang wrote:
> Date: Wed, 31 Oct 2001 13:48:43 -0800 (PST)
> From: David Lang <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Subject: 2.4.13 (and 2.4.5) performance drops under load and does not
> recover
>
> symptoms: when the firewall is freshly booted it supports >200
> connections/sec, when the firewall is hit with lots of simultanious
> requests it slows down, after the flood of requests stops and the machine
> sits idle it never recovers back to it's origional performance. I can
> drive the performance down to <50 connections/sec permanently. The CPU
> is eaten up more and more by the system, leaving less time for userspace.
>
> the single threaded requests do outperform the massivly parallel test (as
> can be expected as the scheduler isn't optimized for long runqueues :-) I
> am not asking to optimize things for the large runqueue, just to be able
> to recover after the long runqueue had finished.
>
> killing and restarting the proxy does not fix the problem. The only way I
> have discovered to resotre performance is a reboot of the firewall.
>
> test setup:
>
> client---firewall---server
>
> all boxes are 1.2GHz athlon, 512MB ram, IDE drive, D-link quad ethernet,
> 2GB swap space (although it never comes close to useing it)
>
> ethernet switches are cisco 3548
>
> firewall is running 4 proxies, each proxy accepts incoming connections,
> forks to handle that connection, relays the data to/from the server and
> then exits. (yes I know this is not the most efficiant way to do things,
> but this _is_ a real world use)
>
> I did a vmstat 1 on the firewall during the test and exerpts from it are
> listed below.
>
> server is running apache (maxclients set to 250 after prior runs noted
> that it ran up against the 150 limit)
>
> client is running ab (apache benchmark tool)
>
> testing procedure:
>
> 1. boot firewall
>
> 2. get performance data
> run ab -n 400 firewall:port/index.html.en twice (first to get past
> initalization delays, second to get a performace value)
> This does 400 sequential connections and reports the time to get the
> responses
>
> 3. flood the firewall
> run 5 copies of ab -n 5000 -c 200 firewall:port/index.html.en
> this does 5000 connections attempting to make up to 200 connections at a
> time
>
> 4. wait for all the connections to clear on the firewall (at least,
> possibly longer if I am in a meeting). The idea here is to return
> everything to a stable mode.
>
> 5. run ab -n 400 firewall:port/index.html and record the connections/sec
>
> loop back to 3.
>
> after 4 iterations of load reboot the firewall and do two more runs of ab
> -n 400 to make sure that performance is back up and the slowdown is not
> due to large logfiles anywhere.
>
> at the start of the test vmstat looks like this
>
>
> ab -n 200 (219 connections/sec) this is two runs sequentially
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 0 0 0 496744 604 8076 0 0 0 0 101 11 0 0 100
> 1 0 1 0 496548 604 8124 0 0 48 0 3814 1759 36 41 24
> 0 0 0 0 496588 604 8124 0 0 0 0 4007 1899 42 34 25
> 0 0 0 0 496576 604 8128 0 0 0 0 103 16 0 0 100
> 1 0 0 0 496452 604 8128 0 0 0 0 924 405 11 6 83
> 2 0 0 0 496360 604 8128 0 0 0 0 4252 2015 41 40 20
> 0 0 0 0 496492 604 8128 0 0 0 0 2628 1240 26 21 53
> 0 0 0 0 496492 604 8128 0 0 0 0 101 7 0 0 100
>
>
> 5xab -n 5000 -c 200 (~210 connections/sec)
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 19 0 1 0 493192 604 8124 0 0 0 0 5938 1139 55 45 0
> 13 0 1 0 492980 604 8124 0 0 0 0 4956 1059 58 42 0
> 11 0 1 0 492692 604 8124 0 0 0 0 4878 1057 45 55 0
> 17 0 1 0 492900 604 8124 0 0 0 0 4890 1072 57 43 0
> 29 0 1 0 492572 604 8124 0 0 0 0 4648 1049 46 54 0
> 13 0 1 0 492608 604 8124 0 0 0 0 4650 1041 50 50 0
> 11 0 1 0 492484 604 8124 0 0 0 0 4645 1043 38 62 0
> 18 0 1 0 492472 604 8128 0 0 0 0 4779 1029 56 44 0
> 17 0 1 0 492440 604 8128 0 0 0 0 4691 1057 46 54 0
> 18 0 1 0 492476 604 8128 0 0 0 0 4598 1074 54 46 0
> 18 0 1 0 492488 604 8128 0 0 0 0 4625 1051 53 47 0
> 22 0 1 0 492388 604 8128 0 0 0 0 4661 1057 50 50 0
> 10 0 1 0 492448 604 8128 0 0 0 0 4569 1033 56 44 0
> 22 0 1 0 492364 604 8128 0 0 0 0 4589 1036 48 52 0
> 18 0 1 0 492384 604 8128 0 0 0 0 4536 1031 48 52 0
> 15 0 1 0 492236 604 8128 0 0 0 0 4528 1034 43 57 0
> 26 0 1 0 492132 604 8128 0 0 0 0 4554 1037 50 50 0
> 24 0 1 0 492016 604 8128 0 0 0 0 4518 1037 48 52 0
>
> at the end of 4 runs
>
> 5xab -n 5000 -c 200 (~40 connections/sec)
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 23 0 1 0 482356 624 8524 0 0 0 0 1053 181 7 93 0
> 29 0 1 0 481724 624 8528 0 0 0 0 960 206 7 93 0
> 23 0 1 0 482300 624 8528 0 0 0 0 1059 176 5 95 0
> 30 0 1 0 482104 624 8528 0 0 0 72 1095 197 9 91 0
> 19 0 1 0 482760 624 8528 0 0 0 0 1014 179 5 95 0
> 31 0 1 0 481892 624 8528 0 0 0 0 1010 198 6 94 0
> 16 0 1 0 482488 624 8524 0 0 0 0 1001 176 8 92 0
> 29 0 1 0 482236 624 8524 0 0 0 0 1037 179 9 91 0
> 12 0 1 0 483008 624 8528 0 0 0 0 1182 188 8 92 0
> 25 0 1 0 482620 624 8528 0 0 0 0 988 173 8 92 0
> 19 0 1 0 482696 624 8528 0 0 0 0 931 173 7 93 0
> 20 0 1 0 482776 624 8528 0 0 0 0 985 171 9 91 0
> 20 0 1 0 482116 624 8528 0 0 0 0 1122 119 5 95 0
> 21 0 1 0 482888 624 8528 0 0 0 0 830 144 3 97 0
> 16 0 1 0 481916 624 8528 0 0 0 0 1054 155 9 91 0
> 18 0 1 0 482892 624 8528 0 0 0 0 875 143 12 88 0
> 20 0 1 0 483100 624 8528 0 0 0 0 875 146 7 93 0
> 27 0 1 0 482428 624 8528 0 0 0 0 859 153 6 94 0
> 26 0 1 0 482688 624 8528 0 0 0 0 874 151 8 92 0
>
> ab -n 400 (~48 connections/sec) this is a single run
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 0 0 0 483552 624 8544 0 0 0 0 103 13 0 0 100
> 2 0 0 0 483388 624 8544 0 0 0 0 592 225 10 38 52
> 1 0 0 0 483392 624 8544 0 0 0 0 1088 488 13 87 0
> 2 0 0 0 483392 624 8544 0 0 0 0 1108 468 10 89 1
> 1 0 0 0 483404 624 8548 0 0 0 0 1081 452 9 89 2
> 1 0 0 0 483508 624 8548 0 0 0 0 1096 486 7 90 3
> 2 0 1 0 483368 624 8548 0 0 0 0 1091 458 11 87 2
> 1 0 0 0 483364 624 8548 0 0 0 0 1085 480 13 85 2
> 1 0 0 0 483356 624 8552 0 0 0 0 1098 478 9 87 4
> 0 0 0 0 483520 624 8548 0 0 0 0 796 330 9 59 32
> 0 0 0 0 483520 624 8548 0 0 0 0 103 13 0 0 100
>
> Please tell me what additional things I should try to identify what the
> kernel is getting tied up doing.
>
> I was able to duplicate this problem with 2.4.5 as well as 2.4.13. the
> performance of 2.4.5 is ~4% lower then 2.4.13 in all cases.
>
> David Lang
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
testing this with iptables shows a similar performance drop for the first
few thousand connections, but then it stabalizes and does not get worse.
After the firewall is allowed to sit idle long enough for
/proc/net/ip_conntrack to drop back down to zero the performance recoveres
to the freshly booted state.
in doing this test I ran into the 32K limit of connection tracking and
bumped it to 327680 (just to make it large enough :-)
so the ipchains compatability layer appears to have a leak in it that
prevents it from recovering the memory used to track the connections.
under iptables is there a way to cause the tcp connections to timeout from
the ip_conntrack more rapidly? it appears that they sit in the tcp table
for the time_wait interval when closed and then sit in the ip_conntrack
table for an additional time_wait interval as well.
David Lang
On Wed, 31 Oct 2001, David Lang wrote:
> Date: Wed, 31 Oct 2001 15:22:03 -0800 (PST)
> From: David Lang <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Subject: Re: 2.4.13 (and 2.4.5) performance drops under load and does not
> recover
>
> This problem is also happening on 2.4.14pre6.
>
> however one thing that I did find
>
> in this firewall setup I am useing ipchains REDIRECT to map port 1433 on
> four IP addresses to ports 1434-1437 to get to the proxies.
>
> if I do the same test, but instead of having all 5 copies of ab hitting
> port 1433 I make them connect to the real port of the proxy the slowdown
> does not happen (I have run 5 iterations of the load test without any
> slowdown)
>
> So this slowdown appears to be in the ipchains emulation code or the
> iptables port mapping code.
>
> I think my next step is going to be to try to use iptables instead of
> ipchains to do the port mapping.
>
> David Lang
>
> On Wed, 31 Oct 2001, David Lang wrote:
>
> > Date: Wed, 31 Oct 2001 13:48:43 -0800 (PST)
> > From: David Lang <[email protected]>
> > To: [email protected]
> > Cc: [email protected]
> > Subject: 2.4.13 (and 2.4.5) performance drops under load and does not
> > recover
> >
> > symptoms: when the firewall is freshly booted it supports >200
> > connections/sec, when the firewall is hit with lots of simultanious
> > requests it slows down, after the flood of requests stops and the machine
> > sits idle it never recovers back to it's origional performance. I can
> > drive the performance down to <50 connections/sec permanently. The CPU
> > is eaten up more and more by the system, leaving less time for userspace.
> >
> > the single threaded requests do outperform the massivly parallel test (as
> > can be expected as the scheduler isn't optimized for long runqueues :-) I
> > am not asking to optimize things for the large runqueue, just to be able
> > to recover after the long runqueue had finished.
> >
> > killing and restarting the proxy does not fix the problem. The only way I
> > have discovered to resotre performance is a reboot of the firewall.
> >
> > test setup:
> >
> > client---firewall---server
> >
> > all boxes are 1.2GHz athlon, 512MB ram, IDE drive, D-link quad ethernet,
> > 2GB swap space (although it never comes close to useing it)
> >
> > ethernet switches are cisco 3548
> >
> > firewall is running 4 proxies, each proxy accepts incoming connections,
> > forks to handle that connection, relays the data to/from the server and
> > then exits. (yes I know this is not the most efficiant way to do things,
> > but this _is_ a real world use)
> >
> > I did a vmstat 1 on the firewall during the test and exerpts from it are
> > listed below.
> >
> > server is running apache (maxclients set to 250 after prior runs noted
> > that it ran up against the 150 limit)
> >
> > client is running ab (apache benchmark tool)
> >
> > testing procedure:
> >
> > 1. boot firewall
> >
> > 2. get performance data
> > run ab -n 400 firewall:port/index.html.en twice (first to get past
> > initalization delays, second to get a performace value)
> > This does 400 sequential connections and reports the time to get the
> > responses
> >
> > 3. flood the firewall
> > run 5 copies of ab -n 5000 -c 200 firewall:port/index.html.en
> > this does 5000 connections attempting to make up to 200 connections at a
> > time
> >
> > 4. wait for all the connections to clear on the firewall (at least,
> > possibly longer if I am in a meeting). The idea here is to return
> > everything to a stable mode.
> >
> > 5. run ab -n 400 firewall:port/index.html and record the connections/sec
> >
> > loop back to 3.
> >
> > after 4 iterations of load reboot the firewall and do two more runs of ab
> > -n 400 to make sure that performance is back up and the slowdown is not
> > due to large logfiles anywhere.
> >
> > at the start of the test vmstat looks like this
> >
> >
> > ab -n 200 (219 connections/sec) this is two runs sequentially
> > procs memory swap io system cpu
> > r b w swpd free buff cache si so bi bo in cs us sy id
> > 0 0 0 0 496744 604 8076 0 0 0 0 101 11 0 0 100
> > 1 0 1 0 496548 604 8124 0 0 48 0 3814 1759 36 41 24
> > 0 0 0 0 496588 604 8124 0 0 0 0 4007 1899 42 34 25
> > 0 0 0 0 496576 604 8128 0 0 0 0 103 16 0 0 100
> > 1 0 0 0 496452 604 8128 0 0 0 0 924 405 11 6 83
> > 2 0 0 0 496360 604 8128 0 0 0 0 4252 2015 41 40 20
> > 0 0 0 0 496492 604 8128 0 0 0 0 2628 1240 26 21 53
> > 0 0 0 0 496492 604 8128 0 0 0 0 101 7 0 0 100
> >
> >
> > 5xab -n 5000 -c 200 (~210 connections/sec)
> > procs memory swap io system cpu
> > r b w swpd free buff cache si so bi bo in cs us sy id
> > 19 0 1 0 493192 604 8124 0 0 0 0 5938 1139 55 45 0
> > 13 0 1 0 492980 604 8124 0 0 0 0 4956 1059 58 42 0
> > 11 0 1 0 492692 604 8124 0 0 0 0 4878 1057 45 55 0
> > 17 0 1 0 492900 604 8124 0 0 0 0 4890 1072 57 43 0
> > 29 0 1 0 492572 604 8124 0 0 0 0 4648 1049 46 54 0
> > 13 0 1 0 492608 604 8124 0 0 0 0 4650 1041 50 50 0
> > 11 0 1 0 492484 604 8124 0 0 0 0 4645 1043 38 62 0
> > 18 0 1 0 492472 604 8128 0 0 0 0 4779 1029 56 44 0
> > 17 0 1 0 492440 604 8128 0 0 0 0 4691 1057 46 54 0
> > 18 0 1 0 492476 604 8128 0 0 0 0 4598 1074 54 46 0
> > 18 0 1 0 492488 604 8128 0 0 0 0 4625 1051 53 47 0
> > 22 0 1 0 492388 604 8128 0 0 0 0 4661 1057 50 50 0
> > 10 0 1 0 492448 604 8128 0 0 0 0 4569 1033 56 44 0
> > 22 0 1 0 492364 604 8128 0 0 0 0 4589 1036 48 52 0
> > 18 0 1 0 492384 604 8128 0 0 0 0 4536 1031 48 52 0
> > 15 0 1 0 492236 604 8128 0 0 0 0 4528 1034 43 57 0
> > 26 0 1 0 492132 604 8128 0 0 0 0 4554 1037 50 50 0
> > 24 0 1 0 492016 604 8128 0 0 0 0 4518 1037 48 52 0
> >
> > at the end of 4 runs
> >
> > 5xab -n 5000 -c 200 (~40 connections/sec)
> > procs memory swap io system cpu
> > r b w swpd free buff cache si so bi bo in cs us sy id
> > 23 0 1 0 482356 624 8524 0 0 0 0 1053 181 7 93 0
> > 29 0 1 0 481724 624 8528 0 0 0 0 960 206 7 93 0
> > 23 0 1 0 482300 624 8528 0 0 0 0 1059 176 5 95 0
> > 30 0 1 0 482104 624 8528 0 0 0 72 1095 197 9 91 0
> > 19 0 1 0 482760 624 8528 0 0 0 0 1014 179 5 95 0
> > 31 0 1 0 481892 624 8528 0 0 0 0 1010 198 6 94 0
> > 16 0 1 0 482488 624 8524 0 0 0 0 1001 176 8 92 0
> > 29 0 1 0 482236 624 8524 0 0 0 0 1037 179 9 91 0
> > 12 0 1 0 483008 624 8528 0 0 0 0 1182 188 8 92 0
> > 25 0 1 0 482620 624 8528 0 0 0 0 988 173 8 92 0
> > 19 0 1 0 482696 624 8528 0 0 0 0 931 173 7 93 0
> > 20 0 1 0 482776 624 8528 0 0 0 0 985 171 9 91 0
> > 20 0 1 0 482116 624 8528 0 0 0 0 1122 119 5 95 0
> > 21 0 1 0 482888 624 8528 0 0 0 0 830 144 3 97 0
> > 16 0 1 0 481916 624 8528 0 0 0 0 1054 155 9 91 0
> > 18 0 1 0 482892 624 8528 0 0 0 0 875 143 12 88 0
> > 20 0 1 0 483100 624 8528 0 0 0 0 875 146 7 93 0
> > 27 0 1 0 482428 624 8528 0 0 0 0 859 153 6 94 0
> > 26 0 1 0 482688 624 8528 0 0 0 0 874 151 8 92 0
> >
> > ab -n 400 (~48 connections/sec) this is a single run
> > procs memory swap io system cpu
> > r b w swpd free buff cache si so bi bo in cs us sy id
> > 0 0 0 0 483552 624 8544 0 0 0 0 103 13 0 0 100
> > 2 0 0 0 483388 624 8544 0 0 0 0 592 225 10 38 52
> > 1 0 0 0 483392 624 8544 0 0 0 0 1088 488 13 87 0
> > 2 0 0 0 483392 624 8544 0 0 0 0 1108 468 10 89 1
> > 1 0 0 0 483404 624 8548 0 0 0 0 1081 452 9 89 2
> > 1 0 0 0 483508 624 8548 0 0 0 0 1096 486 7 90 3
> > 2 0 1 0 483368 624 8548 0 0 0 0 1091 458 11 87 2
> > 1 0 0 0 483364 624 8548 0 0 0 0 1085 480 13 85 2
> > 1 0 0 0 483356 624 8552 0 0 0 0 1098 478 9 87 4
> > 0 0 0 0 483520 624 8548 0 0 0 0 796 330 9 59 32
> > 0 0 0 0 483520 624 8548 0 0 0 0 103 13 0 0 100
> >
> > Please tell me what additional things I should try to identify what the
> > kernel is getting tied up doing.
> >
> > I was able to duplicate this problem with 2.4.5 as well as 2.4.13. the
> > performance of 2.4.5 is ~4% lower then 2.4.13 in all cases.
> >
> > David Lang
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
in further investigation of my options for this firewall I considered
replacing the user-space proxy with NAT (with the idea that it would be
faster then the fork-for-every-connection proxy)
to do this I reconfigured the setup below to forward packets through the
firewall and setup routes on the client and server to talk to each other.
I then started my flood of http requests from the client to the server (5x
sb -n 5000 -c 200)
with 2.4.14pre6 iptables, connection tracking turned on, no NAT rules used
performance was ~1800 connections/sec for the first few seconds (about
the limits of the server) and then plummeted down to between 40-120
connections/sec (very sporatic) and the firewall bogged down so badly that
a vmstat 1 was dropping 2 out of every 3 seconds. doing a wc -l on
/proc/net/ip_conntrack (which took 45 or so seconds to complete) reported
~17000 (doing this in the test from my last e-mail with the proxy reported
~38000) and vmstat was showing a consistant 100% system.
with 2.4.14pre6 with no iptables compiled into the system and doing the
same test maintained a steady ~1800 connections/sec while useing ~50% of
the CPU on the firewall
so there is a serious bottleneck in the connection tracking code that
makes it so that under load it's better to run the ancient user-space
proxy then to use the iptables NAT.
David Lang
On Thu, 1 Nov 2001, David Lang wrote:
> Date: Thu, 1 Nov 2001 13:37:05 -0800 (PST)
> From: David Lang <[email protected]>
> To: [email protected]
> Cc: [email protected], [email protected]
> Subject: Re: 2.4.13 (and 2.4.5) performance drops under load and does not
> recover
>
> testing this with iptables shows a similar performance drop for the first
> few thousand connections, but then it stabalizes and does not get worse.
> After the firewall is allowed to sit idle long enough for
> /proc/net/ip_conntrack to drop back down to zero the performance recoveres
> to the freshly booted state.
>
> in doing this test I ran into the 32K limit of connection tracking and
> bumped it to 327680 (just to make it large enough :-)
>
> so the ipchains compatability layer appears to have a leak in it that
> prevents it from recovering the memory used to track the connections.
>
> under iptables is there a way to cause the tcp connections to timeout from
> the ip_conntrack more rapidly? it appears that they sit in the tcp table
> for the time_wait interval when closed and then sit in the ip_conntrack
> table for an additional time_wait interval as well.
>
> David Lang
>
> On Wed, 31 Oct 2001, David Lang wrote:
>
> > Date: Wed, 31 Oct 2001 15:22:03 -0800 (PST)
> > From: David Lang <[email protected]>
> > To: [email protected]
> > Cc: [email protected]
> > Subject: Re: 2.4.13 (and 2.4.5) performance drops under load and does not
> > recover
> >
> > This problem is also happening on 2.4.14pre6.
> >
> > however one thing that I did find
> >
> > in this firewall setup I am useing ipchains REDIRECT to map port 1433 on
> > four IP addresses to ports 1434-1437 to get to the proxies.
> >
> > if I do the same test, but instead of having all 5 copies of ab hitting
> > port 1433 I make them connect to the real port of the proxy the slowdown
> > does not happen (I have run 5 iterations of the load test without any
> > slowdown)
> >
> > So this slowdown appears to be in the ipchains emulation code or the
> > iptables port mapping code.
> >
> > I think my next step is going to be to try to use iptables instead of
> > ipchains to do the port mapping.
> >
> > David Lang
> >
> > On Wed, 31 Oct 2001, David Lang wrote:
> >
> > > Date: Wed, 31 Oct 2001 13:48:43 -0800 (PST)
> > > From: David Lang <[email protected]>
> > > To: [email protected]
> > > Cc: [email protected]
> > > Subject: 2.4.13 (and 2.4.5) performance drops under load and does not
> > > recover
> > >
> > > symptoms: when the firewall is freshly booted it supports >200
> > > connections/sec, when the firewall is hit with lots of simultanious
> > > requests it slows down, after the flood of requests stops and the machine
> > > sits idle it never recovers back to it's origional performance. I can
> > > drive the performance down to <50 connections/sec permanently. The CPU
> > > is eaten up more and more by the system, leaving less time for userspace.
> > >
> > > the single threaded requests do outperform the massivly parallel test (as
> > > can be expected as the scheduler isn't optimized for long runqueues :-) I
> > > am not asking to optimize things for the large runqueue, just to be able
> > > to recover after the long runqueue had finished.
> > >
> > > killing and restarting the proxy does not fix the problem. The only way I
> > > have discovered to resotre performance is a reboot of the firewall.
> > >
> > > test setup:
> > >
> > > client---firewall---server
> > >
> > > all boxes are 1.2GHz athlon, 512MB ram, IDE drive, D-link quad ethernet,
> > > 2GB swap space (although it never comes close to useing it)
> > >
> > > ethernet switches are cisco 3548
> > >
> > > firewall is running 4 proxies, each proxy accepts incoming connections,
> > > forks to handle that connection, relays the data to/from the server and
> > > then exits. (yes I know this is not the most efficiant way to do things,
> > > but this _is_ a real world use)
> > >
> > > I did a vmstat 1 on the firewall during the test and exerpts from it are
> > > listed below.
> > >
> > > server is running apache (maxclients set to 250 after prior runs noted
> > > that it ran up against the 150 limit)
> > >
> > > client is running ab (apache benchmark tool)
> > >
> > > testing procedure:
> > >
> > > 1. boot firewall
> > >
> > > 2. get performance data
> > > run ab -n 400 firewall:port/index.html.en twice (first to get past
> > > initalization delays, second to get a performace value)
> > > This does 400 sequential connections and reports the time to get the
> > > responses
> > >
> > > 3. flood the firewall
> > > run 5 copies of ab -n 5000 -c 200 firewall:port/index.html.en
> > > this does 5000 connections attempting to make up to 200 connections at a
> > > time
> > >
> > > 4. wait for all the connections to clear on the firewall (at least,
> > > possibly longer if I am in a meeting). The idea here is to return
> > > everything to a stable mode.
> > >
> > > 5. run ab -n 400 firewall:port/index.html and record the connections/sec
> > >
> > > loop back to 3.
> > >
> > > after 4 iterations of load reboot the firewall and do two more runs of ab
> > > -n 400 to make sure that performance is back up and the slowdown is not
> > > due to large logfiles anywhere.
> > >
> > > at the start of the test vmstat looks like this
> > >
> > >
> > > ab -n 200 (219 connections/sec) this is two runs sequentially
> > > procs memory swap io system cpu
> > > r b w swpd free buff cache si so bi bo in cs us sy id
> > > 0 0 0 0 496744 604 8076 0 0 0 0 101 11 0 0 100
> > > 1 0 1 0 496548 604 8124 0 0 48 0 3814 1759 36 41 24
> > > 0 0 0 0 496588 604 8124 0 0 0 0 4007 1899 42 34 25
> > > 0 0 0 0 496576 604 8128 0 0 0 0 103 16 0 0 100
> > > 1 0 0 0 496452 604 8128 0 0 0 0 924 405 11 6 83
> > > 2 0 0 0 496360 604 8128 0 0 0 0 4252 2015 41 40 20
> > > 0 0 0 0 496492 604 8128 0 0 0 0 2628 1240 26 21 53
> > > 0 0 0 0 496492 604 8128 0 0 0 0 101 7 0 0 100
> > >
> > >
> > > 5xab -n 5000 -c 200 (~210 connections/sec)
> > > procs memory swap io system cpu
> > > r b w swpd free buff cache si so bi bo in cs us sy id
> > > 19 0 1 0 493192 604 8124 0 0 0 0 5938 1139 55 45 0
> > > 13 0 1 0 492980 604 8124 0 0 0 0 4956 1059 58 42 0
> > > 11 0 1 0 492692 604 8124 0 0 0 0 4878 1057 45 55 0
> > > 17 0 1 0 492900 604 8124 0 0 0 0 4890 1072 57 43 0
> > > 29 0 1 0 492572 604 8124 0 0 0 0 4648 1049 46 54 0
> > > 13 0 1 0 492608 604 8124 0 0 0 0 4650 1041 50 50 0
> > > 11 0 1 0 492484 604 8124 0 0 0 0 4645 1043 38 62 0
> > > 18 0 1 0 492472 604 8128 0 0 0 0 4779 1029 56 44 0
> > > 17 0 1 0 492440 604 8128 0 0 0 0 4691 1057 46 54 0
> > > 18 0 1 0 492476 604 8128 0 0 0 0 4598 1074 54 46 0
> > > 18 0 1 0 492488 604 8128 0 0 0 0 4625 1051 53 47 0
> > > 22 0 1 0 492388 604 8128 0 0 0 0 4661 1057 50 50 0
> > > 10 0 1 0 492448 604 8128 0 0 0 0 4569 1033 56 44 0
> > > 22 0 1 0 492364 604 8128 0 0 0 0 4589 1036 48 52 0
> > > 18 0 1 0 492384 604 8128 0 0 0 0 4536 1031 48 52 0
> > > 15 0 1 0 492236 604 8128 0 0 0 0 4528 1034 43 57 0
> > > 26 0 1 0 492132 604 8128 0 0 0 0 4554 1037 50 50 0
> > > 24 0 1 0 492016 604 8128 0 0 0 0 4518 1037 48 52 0
> > >
> > > at the end of 4 runs
> > >
> > > 5xab -n 5000 -c 200 (~40 connections/sec)
> > > procs memory swap io system cpu
> > > r b w swpd free buff cache si so bi bo in cs us sy id
> > > 23 0 1 0 482356 624 8524 0 0 0 0 1053 181 7 93 0
> > > 29 0 1 0 481724 624 8528 0 0 0 0 960 206 7 93 0
> > > 23 0 1 0 482300 624 8528 0 0 0 0 1059 176 5 95 0
> > > 30 0 1 0 482104 624 8528 0 0 0 72 1095 197 9 91 0
> > > 19 0 1 0 482760 624 8528 0 0 0 0 1014 179 5 95 0
> > > 31 0 1 0 481892 624 8528 0 0 0 0 1010 198 6 94 0
> > > 16 0 1 0 482488 624 8524 0 0 0 0 1001 176 8 92 0
> > > 29 0 1 0 482236 624 8524 0 0 0 0 1037 179 9 91 0
> > > 12 0 1 0 483008 624 8528 0 0 0 0 1182 188 8 92 0
> > > 25 0 1 0 482620 624 8528 0 0 0 0 988 173 8 92 0
> > > 19 0 1 0 482696 624 8528 0 0 0 0 931 173 7 93 0
> > > 20 0 1 0 482776 624 8528 0 0 0 0 985 171 9 91 0
> > > 20 0 1 0 482116 624 8528 0 0 0 0 1122 119 5 95 0
> > > 21 0 1 0 482888 624 8528 0 0 0 0 830 144 3 97 0
> > > 16 0 1 0 481916 624 8528 0 0 0 0 1054 155 9 91 0
> > > 18 0 1 0 482892 624 8528 0 0 0 0 875 143 12 88 0
> > > 20 0 1 0 483100 624 8528 0 0 0 0 875 146 7 93 0
> > > 27 0 1 0 482428 624 8528 0 0 0 0 859 153 6 94 0
> > > 26 0 1 0 482688 624 8528 0 0 0 0 874 151 8 92 0
> > >
> > > ab -n 400 (~48 connections/sec) this is a single run
> > > procs memory swap io system cpu
> > > r b w swpd free buff cache si so bi bo in cs us sy id
> > > 0 0 0 0 483552 624 8544 0 0 0 0 103 13 0 0 100
> > > 2 0 0 0 483388 624 8544 0 0 0 0 592 225 10 38 52
> > > 1 0 0 0 483392 624 8544 0 0 0 0 1088 488 13 87 0
> > > 2 0 0 0 483392 624 8544 0 0 0 0 1108 468 10 89 1
> > > 1 0 0 0 483404 624 8548 0 0 0 0 1081 452 9 89 2
> > > 1 0 0 0 483508 624 8548 0 0 0 0 1096 486 7 90 3
> > > 2 0 1 0 483368 624 8548 0 0 0 0 1091 458 11 87 2
> > > 1 0 0 0 483364 624 8548 0 0 0 0 1085 480 13 85 2
> > > 1 0 0 0 483356 624 8552 0 0 0 0 1098 478 9 87 4
> > > 0 0 0 0 483520 624 8548 0 0 0 0 796 330 9 59 32
> > > 0 0 0 0 483520 624 8548 0 0 0 0 103 13 0 0 100
> > >
> > > Please tell me what additional things I should try to identify what the
> > > kernel is getting tied up doing.
> > >
> > > I was able to duplicate this problem with 2.4.5 as well as 2.4.13. the
> > > performance of 2.4.5 is ~4% lower then 2.4.13 in all cases.
> > >
> > > David Lang
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Wed, Oct 31, 2001 at 07:53:37PM +0000, Linus Torvalds wrote:
> I wonder if the "VALID_PAGE(page)" macro is reliable on alpha? You seem
> to be able to trigger this too easily for it to not being something
> specific to the setup..
Should be. I've never seen an alpha with discontiguous memory.
(Well, I've not looked at a numa box, but that's not at issue here.)
FWIW, we dump the HWRPB memory table early in the boot process, e.g.
memcluster 0, usage 1, start 0, end 192
memcluster 1, usage 0, start 192, end 16267
memcluster 2, usage 1, start 16267, end 16384
where "usage 1" is reserved by the console.
r~