2004-06-12 15:32:49

by Ramy M. Hassan

[permalink] [raw]
Subject: kswapd problem


kswapd and kupdated are causing our production server to
completely freezeup for few seconds every now and then.
The server is running kernel 2.4.26SMP on a Dual Xeon 2.20GHz
with 4GB RAM, 900GB FC RAID Qlogic HBA using driver qla2300.o
and reiserfs.
The RAID filesystem contains millions of files in thousands of
directories.
The system is under fair load. Normally the load avarage is
about 3 and everything works properly, but suddenly the system
stops responding except to ping, and stay freezed for about 20
seconds, during that time I can not even type anything, then the
system becomes responsive again and I see the load avarge over
250 and starts to decrease till it is back to 3 , then few
minutes later that same thing is repeated.
I noticed that at the time of the freezups both kswapd and
kupdated are the most active processes each consuming over 30%
of the CPU ( kswapd is usually more than kupdated )

________________________________
15 Mbytes Free Web-based and POP3
Sign up now: http://www.gawab.com


2004-06-21 20:03:48

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: kswapd problem

On Sat, Jun 12, 2004 at 03:32:47PM +0000, Ramy M. Hassan wrote:
>
> kswapd and kupdated are causing our production server to
> completely freezeup for few seconds every now and then.
> The server is running kernel 2.4.26SMP on a Dual Xeon 2.20GHz
> with 4GB RAM, 900GB FC RAID Qlogic HBA using driver qla2300.o
> and reiserfs.
> The RAID filesystem contains millions of files in thousands of
> directories.
> The system is under fair load. Normally the load avarage is
> about 3 and everything works properly, but suddenly the system
> stops responding except to ping, and stay freezed for about 20
> seconds, during that time I can not even type anything, then the
> system becomes responsive again and I see the load avarge over
> 250 and starts to decrease till it is back to 3 , then few
> minutes later that same thing is repeated.
> I noticed that at the time of the freezups both kswapd and
> kupdated are the most active processes each consuming over 30%
> of the CPU ( kswapd is usually more than kupdated )

Hi Ramy,

Can you get us some more data when this happens?

What are the size's of the page lists (/proc/meminfo) when the freeze happens,
can you capture that?

Also leave vmstat running in the background.

If you are willing to debug I'm sure we will be able to find the
reason for the problem.


2004-06-25 13:26:47

by Ulrich Brand

[permalink] [raw]
Subject: Re: kswapd problem

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

i've just the same prob's whith 2.4.26 smp, build
whith p4/smp 64 GB config running on fujitsu siemens
dual xeon/3.06 GHz servers with 12 G ram.
my "magic tail" is still a grep thrue the filesystem.
the system work's well if the memory utilisation is lower
then ~ 1 gig ram. if you start th database (oracle 9i),
which takes ca. 2 gig ram (about 130 oracle prcesses),
the machine feezes for about 5 to 10 minutes !!!
the load grows up to 200!! top shows 99,5 % cpu time
on kswapd. the machine has 2 GB swap, and uses nothing.
is there a problem with highmem ?
i've attatched some snipplet's with "top" output.
the freeze occures between snipp 2 and 3.
please help.

Thanks in advance & kind regards,
ulrich

- --1--
top - 13:13:50 up 4:38, 5 users, load average: 6.48, 5.59, 2.90
Tasks: 290 total, 2 running, 288 sleeping, 0 stopped, 0 zombie
Cpu(s): 33.8% user, 16.9% system, 0.0% nice, 49.3% idle
Mem: 12230636k total, 7377644k used, 4852992k free, 151156k buffers
Swap: 2097136k total, 0k used, 2097136k free, 6375056k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14453 oracle 16 0 1074m 1.0g 1.0g S 52.3 9.0 1:57.47 oracle
5 root 9 0 0 0 0 S 19.6 0.0 1:34.22 kswapd
14950 oracle 9 0 1071m 1.0g 1.0g S 6.6 9.0 1:55.53 oracle
15377 oracle 9 0 1028m 1.0g 1.0g S 5.3 8.6 3:21.81 oracle


- --2--
top - 13:14:01 up 4:38, 5 users, load average: 5.94, 5.51, 2.91
Tasks: 290 total, 8 running, 282 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.2% user, 21.6% system, 0.0% nice, 71.2% idle
Mem: 12230636k total, 7401996k used, 4828640k free, 152544k buffers
Swap: 2097136k total, 0k used, 2097136k free, 6397888k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5 root 19 0 0 0 0 R 20.9 0.0 1:36.42 kswapd
25064 root 18 0 1008 1008 668 R 7.5 0.0 0:27.06 top
15377 oracle 9 0 1029m 1.0g 1.0g S 4.1 8.6 3:22.24 oracle
17803 oracle 9 0 965m 965m 961m S 3.8 8.1 0:53.31 oracle
25442 root 9 0 708 708 492 D 3.4 0.0 0:00.74 grep
25441 root 9 0 708 708 492 D 2.6 0.0 0:00.60 grep
14946 oracle 9 0 1120m 1.1g 1.1g S 2.1 9.4 1:54.74 oracle
14439 oracle 9 0 996m 993m 990m S 1.6 8.3 0:52.79 oracle
14453 oracle 9 0 1075m 1.0g 1.0g R 0.9 9.0 1:57.57 oracle
25425 oracle 10 0 273m 273m 271m R 0.9 2.3 0:05.02 oracle
14

- --3--
top - 13:14:15 up 4:39, 5 users, load average: 21.55, 8.85, 4.03
Tasks: 290 total, 12 running, 278 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.2% user, 96.9% system, 0.0% nice, 1.0% idle
Mem: 12230636k total, 7286604k used, 4944032k free, 152580k buffers
Swap: 2097136k total, 0k used, 2097136k free, 6281600k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6888 root 20 0 7872 7868 1160 D 84.8 0.1 0:18.87 oracm
14461 root 10 0 7872 7868 1160 S 52.1 0.1 0:17.65 oracm
6891 root 14 0 7872 7868 1160 S 31.5 0.1 0:13.18 oracm
5 root 15 0 0 0 0 R 9.9 0.0 1:37.86 kswapd
25441 root 9 0 708 708 492 D 4.3 0.0 0:01.23 grep
25064 root 14 0 1008 1008 668 R 4.2 0.0 0:27.68 top
25442 root 9 0 708 708 492 D 4.0 0.0 0:01.32 grep
9514 root 14 0 1740 1740 1508 R 3.9 0.0 0:54.47 xosview.bin
15377 oracle 9 0 1029m 1.0g 1.0g S 1.2 8.6 3:22.42 oracle
14982 oracle 14 0 1141m 1.1g 1.1g R 0.8 9.5 2:32.85 oracle


- --4---
top - 13:14:15 up 4:39, 5 users, load average: 21.55, 8.85, 4.03
Tasks: 290 total, 12 running, 278 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.2% user, 96.9% system, 0.0% nice, 1.0% idle
Mem: 12230636k total, 7286604k used, 4944032k free, 152580k buffers
Swap: 2097136k total, 0k used, 2097136k free, 6281600k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6888 root 20 0 7872 7868 1160 D 84.8 0.1 0:18.87 oracm
14461 root 10 0 7872 7868 1160 S 52.1 0.1 0:17.65 oracm
6891 root 14 0 7872 7868 1160 S 31.5 0.1 0:13.18 oracm
5 root 15 0 0 0 0 R 9.9 0.0 1:37.86 kswapd
25441 root 9 0 708 708 492 D 4.3 0.0 0:01.23 grep
25064 root 14 0 1008 1008 668 R 4.2 0.0 0:27.68 top
25442 root 9 0 708 708 492 D 4.0 0.0 0:01.32 grep
9514 root 14 0 1740 1740 1508 R 3.9 0.0 0:54.47 xosview.bin
15377 oracle 9 0 1029m 1.0g 1.0g S 1.2 8.6 3:22.42 oracle
14982 oracle 14 0 1141m 1.1g 1.1g R 0.8 9.5 2:32.85 oracle


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFA3Cfs06ncFEdJe8URArohAKCGgMQt0awZnJDOhrl0xZ16REaprwCfS+Cb
Au9EFXhuKySqCPzmVIJVYX0=
=lHV5
-----END PGP SIGNATURE-----