I'm running kernbench (make -j 128 on a kernel source) back to back
multiple times on an SMP. Among every 10 runs, there's always at least one
run that has a run time around 40% longer than the other runs. (Before
kernbench starts timing, it does a sync.) 'vmstat 1' indicates that the
longer runs always have a couple of 1-sec intervals during which there are
10 times more block-outs (bo field) than the average traffic in the rest
of the run, and during these intervals, many cc1 processes are in the D
state. My file system is ext3 and all the things like journal commit
interval, pdflush interval, etc. have the default values.
I'm trying to understand why such variability occurs. I tested the same
thing with ext2 and did not see any variability. So I'm thinking about two
things: (1) for some reason, ext3/jbd occasionally issues a large volume
of bursty writes to the disk (but why does it occur just sometimes, not
always?), and (2) when there are bursty writes, the block device driver is
not able to handle them, causing I/O waits. But I don't really have a
clear understanding of the problem here...
Does anyone have any insight on this, or any suggestion on how to figure
it out?
Thanks,
tong
PS. I'm not subscribed to the list, so please cc me.
On Tue, Mar 14, 2006 at 02:32:17AM -0500, Tong Li wrote:
> I'm running kernbench (make -j 128 on a kernel source) back to back
> multiple times on an SMP. Among every 10 runs, there's always at least one
> run that has a run time around 40% longer than the other runs. (Before
> kernbench starts timing, it does a sync.) 'vmstat 1' indicates that the
> longer runs always have a couple of 1-sec intervals during which there are
> 10 times more block-outs (bo field) than the average traffic in the rest
> of the run, and during these intervals, many cc1 processes are in the D
> state. My file system is ext3 and all the things like journal commit
> interval, pdflush interval, etc. have the default values.
>
> I'm trying to understand why such variability occurs. I tested the same
> thing with ext2 and did not see any variability. So I'm thinking about two
> things: (1) for some reason, ext3/jbd occasionally issues a large volume
> of bursty writes to the disk (but why does it occur just sometimes, not
> always?), and (2) when there are bursty writes, the block device driver is
> not able to handle them, causing I/O waits. But I don't really have a
> clear understanding of the problem here...
If you are using an e2fsprogs older than version 1.38, you should try
expanding the journal size from the default of 32M to 128M; with the
filesystem unmounted do:
tune2fs -O ^has_journal /dev/hdXX
tune2fs -O has_journal -J journal_size=128 /dev/hdXX
If the journal gets full and the filesystem has to do a forced journal
truncate, that can cause I/O's to stall and writes can thus get bursty
with performance becoming nasty as a result. Increasing the journal
size can avoid this, at the cost of potentially having more disk
buffers be pinned in memory, thus increasing the overhead of
unswappable kernel memory.
Regards,
- Ted
On Tue, 2006-03-14 at 02:32 -0500, Tong Li wrote:
> Does anyone have any insight on this, or any suggestion on how to figure
> it out?
I tried to recreate the condition, but failed (10 runs, all about the
same amount of time). Is it possible that you have some other process
accessing the partition?
Avishay Traeger
http://www.fsl.cs.sunysb.edu/~avishay/
> If you are using an e2fsprogs older than version 1.38, you should try
> expanding the journal size from the default of 32M to 128M; with the
> filesystem unmounted do:
>
> tune2fs -O ^has_journal /dev/hdXX
> tune2fs -O has_journal -J journal_size=128 /dev/hdXX
>
I did this and yes, it fixed the problem.
Thank you so much,
tong
> I tried to recreate the condition, but failed (10 runs, all about the
> same amount of time). Is it possible that you have some other process
> accessing the partition?
I don't have other processes running on the system, so I don't know...
Thanks,
tong