2003-08-12 18:39:12

by Ken Savage

[permalink] [raw]
Subject: High CPU load with kswapd and heavy disk I/O

Short version:
----------------
kernels 2.4.17 --> 2.4.21
Dual Athlon SMP system
4GB RAM, 2GB swap
3ware RAID, filled with millions of files across thousands of directories.
reiserfs 3.6

The following command is guaranteed to lock out the box by activating
kswapd to >95% CPU, blocking out pings, everything.

find /RAID/data/ -type f -mtime +180 | xargs rm

Details:
----------
Applying the rmap patch seems to prevent kswapd from hogging the CPU,
but causes it to freeze up for some other reason. (The server is remote,
so I can't view the console.) Likewise 2.6.0-test* causes freezeups.
Mind you, the server is under a fair bit of CPU and disk load -- hundreds
of processes/threads all actively running. I suspect something in rmap
has made its way into 2.6 and our usage pattern is triggering the same
fault in both places.

It appears as though the system is unable to efficiently clean up disk
buffer memory when called on to do so. In the Documentation/, there
is mention of a buffermem sysctl, but that's nowhere to be found.
It's obviously been removed/replaced... Is there any way to limit the
amount of buffer memory used by the system, that way if/when kswapd
needs to reclaim it, there's very little work for it to do?

Admittedly, that's just masking the problem, as opposed to solving it.
Any idea why kswapd is having such a tough go?? Known solutions
for this problem?

TIA,

Ken


2003-08-12 19:43:49

by Nuno Silva

[permalink] [raw]
Subject: Re: High CPU load with kswapd and heavy disk I/O

Hi!

Ken Savage wrote:
> Short version:
> ----------------
> kernels 2.4.17 --> 2.4.21
> Dual Athlon SMP system
> 4GB RAM, 2GB swap
> 3ware RAID, filled with millions of files across thousands of directories.
> reiserfs 3.6
>
> The following command is guaranteed to lock out the box by activating
> kswapd to >95% CPU, blocking out pings, everything.
>
> find /RAID/data/ -type f -mtime +180 | xargs rm
>

Can you send before, during and after:
cat /proc/meminfo
cat /proc/slabinfo

And maybe:
vmstat 1

Real kernel hackers (not me...) will find that information very usefull ;)

Regards,
Nuno Silva


> Details:
> ----------
> Applying the rmap patch seems to prevent kswapd from hogging the CPU,
> but causes it to freeze up for some other reason. (The server is remote,
> so I can't view the console.) Likewise 2.6.0-test* causes freezeups.
> Mind you, the server is under a fair bit of CPU and disk load -- hundreds
> of processes/threads all actively running. I suspect something in rmap
> has made its way into 2.6 and our usage pattern is triggering the same
> fault in both places.
>
> It appears as though the system is unable to efficiently clean up disk
> buffer memory when called on to do so. In the Documentation/, there
> is mention of a buffermem sysctl, but that's nowhere to be found.
> It's obviously been removed/replaced... Is there any way to limit the
> amount of buffer memory used by the system, that way if/when kswapd
> needs to reclaim it, there's very little work for it to do?
>
> Admittedly, that's just masking the problem, as opposed to solving it.
> Any idea why kswapd is having such a tough go?? Known solutions
> for this problem?
>
> TIA,
>
> Ken
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-08-12 20:24:44

by Ken Savage

[permalink] [raw]
Subject: Re: High CPU load with kswapd and heavy disk I/O

On Tue August 12 2003 12:44, Nuno Silva wrote:

> Can you send before, during and after:
> cat /proc/meminfo
> cat /proc/slabinfo

Can't send "during" because the system is completely locked up
and unresponsive. There is no warning as to when it's about to
bomb out on kswapd, so I can't provide anything resembling a
"just before". OTOH, "after", I can since it just happened now! ;)

AFTER:
-----------

total: used: free: shared: buffers: cached:
Mem: 3710840832 2703040512 1007800320 0 427081728 522944512
Swap: 2147467264 57257984 2090209280
MemTotal: 3623868 kB
MemFree: 984180 kB
MemShared: 0 kB
Buffers: 417072 kB
Cached: 470384 kB
SwapCached: 40304 kB
Active: 509216 kB
Inactive: 1973064 kB
HighTotal: 2752448 kB
HighFree: 978908 kB
LowTotal: 871420 kB
LowFree: 5272 kB
SwapTotal: 2097136 kB
SwapFree: 2041220 kB

slabinfo - version: 1.1 (SMP)
kmem_cache 80 80 244 5 5 1 : 252 126
ip_conntrack 211 270 384 27 27 1 : 124 62
tcp_tw_bucket 19 90 128 3 3 1 : 252 126
tcp_bind_bucket 54 224 32 2 2 1 : 252 126
tcp_open_request 59 59 64 1 1 1 : 252 126
inet_peer_cache 4 59 64 1 1 1 : 252 126
ip_fib_hash 10 224 32 2 2 1 : 252 126
ip_dst_cache 12 120 192 6 6 1 : 252 126
arp_cache 6 60 128 2 2 1 : 252 126
blkdev_requests 384 390 128 13 13 1 : 252 126
nfs_write_data 0 0 384 0 0 1 : 124 62
nfs_read_data 0 0 384 0 0 1 : 124 62
nfs_page 0 0 128 0 0 1 : 252 126
journal_head 0 0 48 0 0 1 : 252 126
revoke_table 0 0 12 0 0 1 : 252 126
revoke_record 0 0 32 0 0 1 : 252 126
dnotify cache 0 0 20 0 0 1 : 252 126
file lock cache 1 42 92 1 1 1 : 252 126
fasync cache 0 0 16 0 0 1 : 252 126
uid_cache 2 112 32 1 1 1 : 252 126
skbuff_head_cache 328 640 192 32 32 1 : 252 126
sock 154 180 832 20 20 2 : 124 62
sigqueue 29 29 132 1 1 1 : 252 126
cdev_cache 12 177 64 3 3 1 : 252 126
bdev_cache 5 59 64 1 1 1 : 252 126
mnt_cache 14 118 64 2 2 1 : 252 126
inode_cache 33897 106547 512 15221 15221 1 : 124 62
dentry_cache 12195 12300 128 410 410 1 : 252 126
filp 2751 2790 128 93 93 1 : 252 126
names_cache 3 3 4096 3 3 1 : 60 30
buffer_head 223778 513540 128 17117 17118 1 : 252 126
mm_struct 180 180 192 9 9 1 : 252 126
vm_area_struct 2638 3030 128 101 101 1 : 252 126
fs_cache 76 236 64 4 4 1 : 252 126
files_cache 76 135 448 15 15 1 : 124 62
signal_act 90 117 1344 39 39 1 : 60 30
size-131072(DMA) 0 0 131072 0 0 32 : 0 0
size-131072 0 0 131072 0 0 32 : 0 0
size-65536(DMA) 0 0 65536 0 0 16 : 0 0
size-65536 0 0 65536 0 0 16 : 0 0
size-32768(DMA) 0 0 32768 0 0 8 : 0 0
size-32768 3 3 32768 3 3 8 : 0 0
size-16384(DMA) 0 0 16384 0 0 4 : 0 0
size-16384 12 12 16384 12 12 4 : 0 0
size-8192(DMA) 0 0 8192 0 0 2 : 0 0
size-8192 2 2 8192 2 2 2 : 0 0
size-4096(DMA) 0 0 4096 0 0 1 : 60 30
size-4096 182 182 4096 182 182 1 : 60 30
size-2048(DMA) 0 0 2048 0 0 1 : 60 30
size-2048 108 108 2048 54 54 1 : 60 30
size-1024(DMA) 0 0 1024 0 0 1 : 124 62
size-1024 172 172 1024 43 43 1 : 124 62
size-512(DMA) 0 0 512 0 0 1 : 124 62
size-512 360 360 512 45 45 1 : 124 62
size-256(DMA) 0 0 256 0 0 1 : 252 126
size-256 165 165 256 11 11 1 : 252 126
size-128(DMA) 0 0 128 0 0 1 : 252 126
size-128 570 570 128 19 19 1 : 252 126
size-64(DMA) 0 0 64 0 0 1 : 252 126
size-64 722 1534 64 26 26 1 : 252 126
size-32(DMA) 0 0 64 0 0 1 : 252 126
size-32 11868 12390 64 210 210 1 : 252 126


2003-08-12 23:47:55

by Nuno Silva

[permalink] [raw]
Subject: Re: High CPU load with kswapd and heavy disk I/O

Hello!

Ken Savage wrote:
> On Tue August 12 2003 12:44, Nuno Silva wrote:

[..snip..]

>
> AFTER:
> -----------
>
> total: used: free: shared: buffers: cached:
> Mem: 3710840832 2703040512 1007800320 0 427081728 522944512
> Swap: 2147467264 57257984 2090209280
> MemTotal: 3623868 kB
> MemFree: 984180 kB
> MemShared: 0 kB
> Buffers: 417072 kB


My guess is that this is the cause. LOWMEM pressure because of very
large directories... Relating to this, linux-2.6.0-test3-mm1 has Ingo's
4G/4G memory split. Can you try this kernel, enable 4G/4G feature, and
report back?

Good luck,
Nuno Silva



2003-08-13 00:14:35

by Ken Savage

[permalink] [raw]
Subject: Re: High CPU load with kswapd and heavy disk I/O

On Tue August 12 2003 16:49, Nuno Silva wrote:

> My guess is that this is the cause. LOWMEM pressure because of very
> large directories... Relating to this, linux-2.6.0-test3-mm1 has Ingo's
> 4G/4G memory split. Can you try this kernel, enable 4G/4G feature, and
> report back?

Something about the 2.6 (and the rmap patched 2.4) kernels causes
lockouts on the server -- for reasons OTHER than kswapd. The server
running the delete-old-files process runs hundreds of other CPU and disk
I/O intensive processes/threads, and it doesn't look like 2.6 is yet able
to handle the load. Unfortunately, the server is a production environment
machine at a remote site, so lockouts/reboots/kernel panics are baaaad :(

I've seen other mentions of kswapd/kupdated problems in 2.4.xx, but
few mentions of solutions. Have people just learned to avoid the
situations that trigger the mad thrashes?

Ken

Subject: Re: High CPU load with kswapd and heavy disk I/O

I regularly run the 2.4.21 kernel with 2GB of ram, and I've had to apply
Andrew Morton's patch from a year ago that "hunts down buffer heads and
kills them":

http://marc.theaimsgroup.com/?l=lse-tech&m=102083525007877&w=2

With this patch, kswapd never goes crazy for me anymore. BTW, if you're
using NFS (you don't mention it), I've also had great luck improving NFS
performance by using Andrea's address space reconfiguration patch:

http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.22pre7
aa1/00_3.5G-address-space-5

and selecting the CONFIG_2GB option. This configuration along with 2GB of
ram gives me excellent NFS performance. Increasing the system RAM to 4GB
with any memory address configuration hurts my performance. The combination
of CONFIG_2GB and 2GB of ram has given me the best performance. Note that
changing the address space configuration like I've mentioned violates the
ABI (http://stage.caldera.com/developer/devspecs/abi386-4.pdf page 47), so
your applications might not work as expected. I've also needed to use Neil
Brown's "remove the BKL from NFSD" patches, which can be found at:

http://cgi.cse.unsw.edu.au/~neilb/patches/linux-stable/?detail=2003-01-06:00

(patches 005 thru 011)

Erik

On Tue August 12 2003 16:49, Nuno Silva wrote:


> My guess is that this is the cause. LOWMEM pressure because of very
> large directories... Relating to this, linux-2.6.0-test3-mm1 has Ingo's
> 4G/4G memory split. Can you try this kernel, enable 4G/4G feature, and
> report back?


Something about the 2.6 (and the rmap patched 2.4) kernels causes
lockouts on the server -- for reasons OTHER than kswapd. The server
running the delete-old-files process runs hundreds of other CPU and disk
I/O intensive processes/threads, and it doesn't look like 2.6 is yet able
to handle the load. Unfortunately, the server is a production environment
machine at a remote site, so lockouts/reboots/kernel panics are baaaad :(


I've seen other mentions of kswapd/kupdated problems in 2.4.xx, but
few mentions of solutions. Have people just learned to avoid the
situations that trigger the mad thrashes?


Ken


2003-08-13 15:36:59

by Nuno Silva

[permalink] [raw]
Subject: Re: High CPU load with kswapd and heavy disk I/O

Hi!

Ken Savage wrote:
> On Tue August 12 2003 16:49, Nuno Silva wrote:
>
>
>>My guess is that this is the cause. LOWMEM pressure because of very
>>large directories... Relating to this, linux-2.6.0-test3-mm1 has Ingo's
>>4G/4G memory split. Can you try this kernel, enable 4G/4G feature, and
>>report back?
>
>
> Something about the 2.6 (and the rmap patched 2.4) kernels causes
> lockouts on the server -- for reasons OTHER than kswapd. The server

If you want to help, you could try to gather more info on that to help
develope a better 2.6 ;)

FWIW, 2.6.0-test* with mm patches works well here... At least in a few
boxes.


> running the delete-old-files process runs hundreds of other CPU and disk
> I/O intensive processes/threads, and it doesn't look like 2.6 is yet able
> to handle the load. Unfortunately, the server is a production environment
> machine at a remote site, so lockouts/reboots/kernel panics are baaaad :(
>
> I've seen other mentions of kswapd/kupdated problems in 2.4.xx, but
> few mentions of solutions. Have people just learned to avoid the
> situations that trigger the mad thrashes?
>


If you're sure that it's really kswapd you can send SIGSTOP and SIGCONT
to kswapd's pid. Kswapd will honor those signals.

killall -STOP kswapd
<run your I/O intensive scripts>
killall -CONT kswapd

Sometimes I do this... For me it works well. If this makes your machine
crash or loose data, don't blame me! ;)

Regards,
Nuno Silva


> Ken
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>