2005-09-01 09:13:13

by Holger Kiehl

[permalink] [raw]
Subject: Re: Where is the performance bottleneck?

On Wed, 31 Aug 2005, Holger Kiehl wrote:

> On Thu, 1 Sep 2005, Nick Piggin wrote:
>
>> Holger Kiehl wrote:
>>
>>> meminfo.dump:
>>>
>>> MemTotal: 8124172 kB
>>> MemFree: 23564 kB
>>> Buffers: 7825944 kB
>>> Cached: 19216 kB
>>> SwapCached: 0 kB
>>> Active: 25708 kB
>>> Inactive: 7835548 kB
>>> HighTotal: 0 kB
>>> HighFree: 0 kB
>>> LowTotal: 8124172 kB
>>> LowFree: 23564 kB
>>> SwapTotal: 15631160 kB
>>> SwapFree: 15631160 kB
>>> Dirty: 3145604 kB
>>
>> Hmm OK, dirty memory is pinned pretty much exactly on dirty_ratio
>> so maybe I've just led you on a goose chase.
>>
>> You could
>> echo 5 > /proc/sys/vm/dirty_background_ratio
>> echo 10 > /proc/sys/vm/dirty_ratio
>>
>> To further reduce dirty memory in the system, however this is
>> a long shot, so please continue your interaction with the
>> other people in the thread first.
>>
> Yes, this does make a difference, here the results of running
>
> dd if=/dev/full of=/dev/sd?1 bs=4M count=4883
>
> on 8 disks at the same time:
>
> 34.273340
> 33.938829
> 33.598469
> 32.970575
> 32.841351
> 32.723988
> 31.559880
> 29.778112
>
> That's 32.710568 MB/s on average per disk with your change and without
> it it was 24.958557 MB/s on average per disk.
>
> I will do more tests tomorrow.
>
Just rechecked those numbers. Did a fresh boot and run the test several
times. With defaults (dirty_background_ratio=10, dirty_ratio=40) I get
for the dd write tests an average of 24.559491 MB/s (8 disks in parallel)
per disk. With the suggested values (dirty_background_ratio=5, dirty_ratio=10)
32.390659 MB/s per disk.

I then did a SW raid0 over all disks with the following command:

mdadm -C /dev/md3 -l0 -n8 /dev/sd[cdefghij]1

(dirty_background_ratio=10, dirty_ratio=40) 223.955995 MB/s
(dirty_background_ratio=5, dirty_ratio=10) 234.318936 MB/s

So the differnece is not so big anymore.

Something else I notice while doing the dd over 8 disks is the following
(top just before they are finished):

top - 08:39:11 up 2:03, 2 users, load average: 23.01, 21.48, 15.64
Tasks: 102 total, 2 running, 100 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 17.7% sy, 0.0% ni, 0.0% id, 78.9% wa, 0.2% hi, 3.1% si
Mem: 8124184k total, 8093068k used, 31116k free, 7831348k buffers
Swap: 15631160k total, 13352k used, 15617808k free, 5524k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3423 root 18 0 55204 460 392 R 12.0 0.0 1:15.55 dd
3421 root 18 0 55204 464 392 D 11.3 0.0 1:17.36 dd
3418 root 18 0 55204 464 392 D 10.3 0.0 1:10.92 dd
3416 root 18 0 55200 464 392 D 10.0 0.0 1:09.20 dd
3420 root 18 0 55204 464 392 D 10.0 0.0 1:10.49 dd
3422 root 18 0 55200 460 392 D 9.3 0.0 1:13.58 dd
3417 root 18 0 55204 460 392 D 7.6 0.0 1:13.11 dd
158 root 15 0 0 0 0 D 1.3 0.0 1:12.61 kswapd3
159 root 15 0 0 0 0 D 1.3 0.0 1:08.75 kswapd2
160 root 15 0 0 0 0 D 1.0 0.0 1:07.11 kswapd1
3419 root 18 0 51096 552 476 D 1.0 0.0 1:17.15 dd
161 root 15 0 0 0 0 D 0.7 0.0 0:54.46 kswapd0
1 root 16 0 4876 372 332 S 0.0 0.0 0:01.15 init
2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
5 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
6 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
7 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/3
9 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/3

A loadaverage of 23 for 8 dd's seems a bit high. Also why is kswapd working
so hard? Is that correct.

Please just tell me if there is anything else I can test or dumps that
could be useful.

Thanks,
Holger


2005-09-02 14:32:33

by Al Boldi

[permalink] [raw]
Subject: Re: Where is the performance bottleneck?

Holger Kiehl wrote:
> top - 08:39:11 up 2:03, 2 users, load average: 23.01, 21.48, 15.64
> Tasks: 102 total, 2 running, 100 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.0% us, 17.7% sy, 0.0% ni, 0.0% id, 78.9% wa, 0.2% hi, 3.1%
> si Mem: 8124184k total, 8093068k used, 31116k free, 7831348k
> buffers Swap: 15631160k total, 13352k used, 15617808k free, 5524k
> cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3423 root 18 0 55204 460 392 R 12.0 0.0 1:15.55 dd
> 3421 root 18 0 55204 464 392 D 11.3 0.0 1:17.36 dd
> 3418 root 18 0 55204 464 392 D 10.3 0.0 1:10.92 dd
> 3416 root 18 0 55200 464 392 D 10.0 0.0 1:09.20 dd
> 3420 root 18 0 55204 464 392 D 10.0 0.0 1:10.49 dd
> 3422 root 18 0 55200 460 392 D 9.3 0.0 1:13.58 dd
> 3417 root 18 0 55204 460 392 D 7.6 0.0 1:13.11 dd
> 158 root 15 0 0 0 0 D 1.3 0.0 1:12.61 kswapd3
> 159 root 15 0 0 0 0 D 1.3 0.0 1:08.75 kswapd2
> 160 root 15 0 0 0 0 D 1.0 0.0 1:07.11 kswapd1
> 3419 root 18 0 51096 552 476 D 1.0 0.0 1:17.15 dd
> 161 root 15 0 0 0 0 D 0.7 0.0 0:54.46 kswapd0
>
> A loadaverage of 23 for 8 dd's seems a bit high. Also why is kswapd
> working so hard? Is that correct.

Actually, kswapd is another problem. (see "Kswapd Flaw" thread)
Which has little impact on your problem but basically kswapd tries very hard
maybe even to hard to fullfil a request for memory, so when the buffer/cache
pages are full kswapd tries to find some more unused memory. When it finds
none it starts recycling the buffer/cache pages. Which is OK, but it only
does this after searching for swappable memory which wastes CPU cycles.

This can be tuned a little but not much by adjusting /sys(proc)/.../vm/...
Or renicing kswapd to the lowest priority, which may cause other problems.

Things get really bad when procs start asking for more memory than is
available, causing kswapd to take the liberty of paging out running procs in
the hope that these procs won't come back later. So when they do come back
something like a wild goose chase begins. This is also known as OverCommit.

This is closely related to the dreaded OOM-killer, which occurs when the
system cannot satisfy a memory request for a returning proc, causing the VM
to start killing in an unpredictable manner.

Turning OverCommit off should solve this problem but it doesn't.

This is why it is recommended to run the system always with swap enabled even
if you have tons of memory, which really only pushes the problem out of the
way until you hit the dead end and the wild goose chase begins again.

Sadly 2.6.13 did not fix this either.

Although this description only vaguely defines the problem from an end-user
pov, the actual semantics may be quite different.

--
Al