2006-12-06 18:07:15

by Magnus Naeslund

[permalink] [raw]
Subject: 2.6 tmpfs/swap performance oddity

I have this Ubuntu Edgy (HP Proliant DL380 intel x86-64 w/ 4 cores), kernel: 2.6.17-10-server) system with a raid controller (p600 + bbu, cciss) and 8gb memory. The raid disks are setup as one raid1+0 logical device, the swap and filesystem are partitions on this device.
I've created a ext3 file system with standard parameters from the ubuntu installer.

When doing a simple test with this ext3 fs I can write (dd bs=1M if=/dev/zero of=file (20gb file)) about 180-200 mb/s. If I read the same file I get about 280mb/s throughput.

We have a 60gb tmpfs that we're going to use for volatile but large data-sets.
We were thinking that using tmpfs this way it would be faster to just let it swap out stuff that aren't frequently used, and that the current working set is swaps in automagically.

But tmpfs seems to get really slow when it has to swap out stuff from tmp to a 80 gb swap partition, much slower than just writing a file to the ext3 fs. Maybe this is a known thing, and easily tuned, but I've not seen any solutions when googling around.

If i dd a file that fits in memory to the tmpfs mount I get around 1.5gb/s, ~8 times faster than the RAID card write performance. This isn't bad but not anywhere near the "cached" speed that I get from hdparm -T: ~6720mb/s, I don't know if they're even related, also I get two of these "HDIO_DRIVE_CMD(null) (wait for flush complete) failed: Inappropriate ioctl for device" when issuing the hdparm command (since it's a raid disk?).

If the file size is > the physical memory (20gb, same test) I get a radically worsened total performance, on the write test it gets a throughput of 124mb/s. One would hope that it was faster than the ext3 test. One could guess that after the first 8gb it no longer fits into memory and the throughput gets many times slower.
And all hell breaks loose when I try to dd that tmpfs file to /dev/null (bs=1M), it can only read at about ~20mb/s.

What is happening here? Check out the vmstat 1:
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 14757524 45036 924 6439068 27168 25160 27168 25160 2449 1725 0 2 75 23
0 1 14739836 47484 924 6457600 18976 25004 18976 25004 1769 1207 0 1 75 24

It seems to do IO madly to be able to page in the pages of the file.
One would hope to have faster or equal performance compared to the ext3 test.

Somehow I think I'm missing something here, maybe we're not supposed to use tmpfs in this way at all?
What more information can I supply to narrow down the problem?
Is there any secret knobs that I can use to tune swap performance?

Regards,
Magnus




2006-12-06 20:51:08

by Phillip Susi

[permalink] [raw]
Subject: Re: 2.6 tmpfs/swap performance oddity

Magnus Naeslund(k) wrote:
> But tmpfs seems to get really slow when it has to swap out stuff from
> tmp to a 80 gb swap partition, much slower than just writing a file
> to the ext3 fs. Maybe this is a known thing, and easily tuned, but
> I've not seen any solutions when googling around.

This is a SWAG ( Silly Wild Ass Guess ), but maybe tmpfs isn't mapping
the files sequentially in ram, or is otherwise doing something to
prevent proper swap clustering and read ahead?

> Somehow I think I'm missing something here, maybe we're not supposed
> to use tmpfs in this way at all? What more information can I supply
> to narrow down the problem? Is there any secret knobs that I can use
> to tune swap performance?

You aren't supposed to be using tmpfs in this way at all ;)

tmpfs is meant to hold small files that will only exist for a short
time, or do not need to persist after a reboot. Keep your big data
files on a normal filesystem, and let the kernel worry about caching the
most frequently accessed parts.

2006-12-08 04:10:31

by Chuck Ebbert

[permalink] [raw]
Subject: Re: 2.6 tmpfs/swap performance oddity

In-Reply-To: <[email protected]>

On Wed, 06 Dec 2006 19:07:07 +0100, Magnus Naeslund wrote:

> Is there any secret knobs that I can use to
> tune swap performance?

You might try changing /proc/sys/vm/page-cluster to 5.

--
Chuck
"Even supernovas have their duller moments."