2009-03-17 20:19:33

by David Rees

[permalink] [raw]
Subject: Large write = large latency for small writes

There is a fairly active bug 12309 "Large I/O operations result in
slow performance and high iowait times" [1] which has gotten rather
large an unwieldy due to the number of participants and confusion due
to what appears to be multiple bugs in either the schedule or IO
request merging layers.

I encountered this bug when seeing some horrible latency on a NFS
client while the server was performing heavy IO [2] and initially
thought it was a NFS bug, but later found that NFS only makes it worse
because writes over NFS are synchronous by default.

I have a simple test case which demonstrates the huge increase in
write latency that occurs for small writes when a large disk
saturating write is also in progress [3]:

dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
sleep 10
time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync

On a handful of systems I have access to, it took anywhere from 6-45
seconds for the small write to complete. Others in the bug have
reproduced this across a number of filesystems (ext3, reiserfs, ext4).
xfs in particular seems to handle this test case better than the
others. As do systems which can sustain high write speeds.

The only way I've been able to reduce the latency to acceptable levels
is to drop vm.dirty_background_ratio and vm.dirty_ratio to 1 and 2
respectively - but even then on some systems the small write still
takes 5-7 seconds.

The systems I've been testing on are all Fedora 9 or 10 systems with
the latest kernels (basically 2.6.27.19), ext3 filesystems and various
amounts of CPU, memory and disk subsystems (but nothing too new or
fast).

Any ideas? How much latency is acceptable with the test case?

-Dave

[1] http://bugzilla.kernel.org/show_bug.cgi?id=12309
[2] http://marc.info/?l=linux-nfs&m=123697692631683&w=2
[3] http://bugzilla.kernel.org/show_bug.cgi?id=12309#c249


2009-03-17 22:10:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Large write = large latency for small writes

On Tue, 2009-03-17 at 13:19 -0700, David Rees wrote:

> I have a simple test case which demonstrates the huge increase in
> write latency that occurs for small writes when a large disk
> saturating write is also in progress [3]:
>
> dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
> sleep 10
> time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
>
> On a handful of systems I have access to, it took anywhere from 6-45
> seconds for the small write to complete. Others in the bug have
> reproduced this across a number of filesystems (ext3, reiserfs, ext4).
> xfs in particular seems to handle this test case better than the
> others. As do systems which can sustain high write speeds.

How does it fare without the fdatasync?

That is, is it the sync that's taking ages, or the ditry?

2009-03-17 22:30:19

by David Rees

[permalink] [raw]
Subject: Re: Large write = large latency for small writes

On Tue, Mar 17, 2009 at 3:09 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2009-03-17 at 13:19 -0700, David Rees wrote:
>> I have a simple test case which demonstrates the huge increase in
>> write latency that occurs for small writes when a large disk
>> saturating write is also in progress [3]:
>>
>> dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
>> sleep 10
>> time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
>>
>> On a handful of systems I have access to, it took anywhere from 6-45
>> seconds for the small write to complete. ?Others in the bug have
>> reproduced this across a number of filesystems (ext3, reiserfs, ext4).
>> xfs in particular seems to handle this test case better than the
>> others. ?As do systems which can sustain high write speeds.
>
> How does it fare without the fdatasync?
>
> That is, is it the sync that's taking ages, or the ditry?

It's the fdatasync that takes ages. Without fdatasync the small write
finishes in an instant. In the test I just ran, it took 65 seconds
for the small write to complete with the fdatasync and 0.0001 seconds
without it. I think this is also why ext4 and xfs seem to handle the
problem a bit better than ext3/reiserfs.

Which is why I initially thought it was a NFS issue as I didn't see
too many bad delays when working directly on the server during
sustained server write load, only when working over NFS which mounts
sync by default.

-Dave