2006-12-18 04:07:11

by Manish Regmi

[permalink] [raw]
Subject: Linux disk performance.

Hi all,
I was working in one application that requires heavy disk
writes, I noticed some inconsistencies in the write timing.
We are using raw reads to bypass filesystem overhead.

Firstly i tried open("/dev/hda",O_RDWR) i.e without O_DIRECT option.
I saw that after some writes 1 write took too much time.

the results are for writing 128KB data in MIPS 400mhz
sequence channel time (in microseconds)
0 1 1675
0 2 1625
0 3 1836
...
0 16 3398
0 63 1678
1 0 1702
1 1 1845
.....
3 46 17875 // large value
...
4 13 17142 ///
...
4 44 18711 /// large value

Is this behaviour ok?
I beleive this is due to deep request queue.

But when i used O_DIRECT. I got a little higher write times but it
also had such time bumps but at smaller rate.
-----------------------------------------
0 0 3184
0 1 3165
0 2 3126
...
0 52 10613 // large value
0 60 19004 // large value

results similar with O_DIRECT|O_SYNC


Can we achieve smooth write times in Linux?

I am using 2.6.10 the results are moreover same (i dont mean
numerically same but i am getting thiming difference) in both P4 3 GHZ
512MB ram and MIPS. Disk is working in UDMA 5.

--
---------------------------------------------------------------
regards
Manish Regmi

---------------------------------------------------------------
UNIX without a C Compiler is like eating Spaghetti with your mouth
sewn shut. It just doesn't make sense.


2006-12-18 11:41:27

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux disk performance.


>
> Can we achieve smooth write times in Linux?

if you want truely really smooth writes you'll have to work for it,
since "bumpy" writes tend to be better for performance so naturally the
kernel will favor those.

to get smooth writes you'll need to do a threaded setup where you do an
msync/fdatasync/sync_file_range on a frequent-but-regular interval from
a thread. Be aware that this is quite likely to give you lower maximum
performance than the batching behavior though.

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-18 12:39:41

by Manish Regmi

[permalink] [raw]
Subject: Re: Linux disk performance.

On 12/18/06, Arjan van de Ven <[email protected]> wrote:
> if you want truely really smooth writes you'll have to work for it,
> since "bumpy" writes tend to be better for performance so naturally the
> kernel will favor those.
>
> to get smooth writes you'll need to do a threaded setup where you do an
> msync/fdatasync/sync_file_range on a frequent-but-regular interval from
> a thread. Be aware that this is quite likely to give you lower maximum
> performance than the batching behavior though.
>

Thanks...

But isn't O_DIRECT supposed to bypass buffering in Kernel?
Doesn't it directly write to disk?
I tried to put fdatasync() at regular intervals but there was no
visible effect.

--
---------------------------------------------------------------
regards
Manish Regmi

---------------------------------------------------------------
UNIX without a C Compiler is like eating Spaghetti with your mouth
sewn shut. It just doesn't make sense.

2006-12-18 13:07:28

by Nick Piggin

[permalink] [raw]
Subject: Re: Linux disk performance.

Manish Regmi wrote:
> On 12/18/06, Arjan van de Ven <[email protected]> wrote:
>
>> if you want truely really smooth writes you'll have to work for it,
>> since "bumpy" writes tend to be better for performance so naturally the
>> kernel will favor those.
>>
>> to get smooth writes you'll need to do a threaded setup where you do an
>> msync/fdatasync/sync_file_range on a frequent-but-regular interval from
>> a thread. Be aware that this is quite likely to give you lower maximum
>> performance than the batching behavior though.
>>
>
> Thanks...
>
> But isn't O_DIRECT supposed to bypass buffering in Kernel?
> Doesn't it directly write to disk?
> I tried to put fdatasync() at regular intervals but there was no
> visible effect.
>

I don't know exactly how to interpret the numbers you gave, but
they look like they might be a (HZ quantised) delay coming from
block layer plugging.

O_DIRECT bypasses caching, but not (all) buffering.

Not sure whether the block layer can handle an unplug_delay set
to 0, but that might be something to try (see block/ll_rw_blk.c).

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-12-18 13:18:36

by Erik Mouw

[permalink] [raw]
Subject: Re: Linux disk performance.

On Mon, Dec 18, 2006 at 06:24:39PM +0545, Manish Regmi wrote:
> On 12/18/06, Arjan van de Ven <[email protected]> wrote:
> >if you want truely really smooth writes you'll have to work for it,
> >since "bumpy" writes tend to be better for performance so naturally the
> >kernel will favor those.
> >
> >to get smooth writes you'll need to do a threaded setup where you do an
> >msync/fdatasync/sync_file_range on a frequent-but-regular interval from
> >a thread. Be aware that this is quite likely to give you lower maximum
> >performance than the batching behavior though.
> >
>
> Thanks...
>
> But isn't O_DIRECT supposed to bypass buffering in Kernel?

It is.

> Doesn't it directly write to disk?

Yes, but it still uses an IO scheduler.

> I tried to put fdatasync() at regular intervals but there was no
> visible effect.

In your first message you mentioned you were using an ancient 2.6.10
kernel. That kernel uses the anticipatory IO scheduler. Update to the
latest stable kernel (2.6.19.1 at time of writing) and it will default
to the CFQ scheduler which has a smoother writeout, plus you can give
your process a different IO scheduling class and level (see
Documentation/block/ioprio.txt).


Erik

2006-12-19 06:22:17

by Manish Regmi

[permalink] [raw]
Subject: Re: Linux disk performance.

On 12/18/06, Erik Mouw <[email protected]> wrote:
<...snip...>
> >
> > But isn't O_DIRECT supposed to bypass buffering in Kernel?
>
> It is.
>
> > Doesn't it directly write to disk?
>
> Yes, but it still uses an IO scheduler.
>

Ok. but i also tried with noop to turnoff disk scheduling effects.
There was still timing differences. Usually i get 3100 microseconds
but upto 20000 microseconds at certain intervals. I am just using
gettimeofday between two writes to read the timing.



> In your first message you mentioned you were using an ancient 2.6.10
> kernel. That kernel uses the anticipatory IO scheduler. Update to the
> latest stable kernel (2.6.19.1 at time of writing) and it will default
> to the CFQ scheduler which has a smoother writeout, plus you can give
> your process a different IO scheduling class and level (see
> Documentation/block/ioprio.txt).

Thanks... i will try with CFQ.



Nick Piggin:
> but
> they look like they might be a (HZ quantised) delay coming from
> block layer plugging.

Sorry i didn?t understand what you mean.

To minimise scheduling effects i tried giving it maximum priority.


--
---------------------------------------------------------------
regards
Manish Regmi

---------------------------------------------------------------
UNIX without a C Compiler is like eating Spaghetti with your mouth
sewn shut. It just doesn't make sense.

2006-12-19 06:38:53

by Nick Piggin

[permalink] [raw]
Subject: Re: Linux disk performance.

Index: linux-2.6/block/ll_rw_blk.c
===================================================================
--- linux-2.6.orig/block/ll_rw_blk.c 2006-12-19 17:35:00.000000000 +1100
+++ linux-2.6/block/ll_rw_blk.c 2006-12-19 17:35:53.000000000 +1100
@@ -226,6 +226,8 @@ void blk_queue_make_request(request_queu
q->unplug_delay = (3 * HZ) / 1000; /* 3 milliseconds */
if (q->unplug_delay == 0)
q->unplug_delay = 1;
+ q->unplug_delay = 0;
+ q->unplug_thresh = 0;

INIT_WORK(&q->unplug_work, blk_unplug_work, q);


Attachments:
block-no-plug.patch (516.00 B)

2006-12-19 12:18:15

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux disk performance.

On Tue, 2006-12-19 at 17:38 +1100, Nick Piggin wrote:
> Manish Regmi wrote:
>
> > Nick Piggin:
> >
> >> but
> >> they look like they might be a (HZ quantised) delay coming from
> >> block layer plugging.
> >
> >
> > Sorry i didn´t understand what you mean.
>
> When you submit a request to an empty block device queue, it can
> get "plugged" for a number of timer ticks before any IO is actually
> started. This is done for efficiency reasons and is independent of
> the IO scheduler used.

however the O_DIRECT codepath unplugs the queues always immediately..



2006-12-20 11:17:40

by Manish Regmi

[permalink] [raw]
Subject: Re: Linux disk performance.

On 12/19/06, Nick Piggin <[email protected]> wrote:
> When you submit a request to an empty block device queue, it can
> get "plugged" for a number of timer ticks before any IO is actually
> started. This is done for efficiency reasons and is independent of
> the IO scheduler used.
>

Thanks for the information..

> Use the noop IO scheduler, as well as the attached patch, and let's
> see what your numbers look like.
>

Unfortunately i got the same results even after applying your patch. I
also tried putting
q->unplug_delay = 1;
But it did not work. The result was similar.

--
---------------------------------------------------------------
regards
Manish Regmi

---------------------------------------------------------------
UNIX without a C Compiler is like eating Spaghetti with your mouth
sewn shut. It just doesn't make sense.

2006-12-20 22:23:04

by Bill Davidsen

[permalink] [raw]
Subject: Re: Linux disk performance.

Manish Regmi wrote:
> On 12/18/06, Arjan van de Ven <[email protected]> wrote:
>> if you want truely really smooth writes you'll have to work for it,
>> since "bumpy" writes tend to be better for performance so naturally the
>> kernel will favor those.
>>
>> to get smooth writes you'll need to do a threaded setup where you do an
>> msync/fdatasync/sync_file_range on a frequent-but-regular interval from
>> a thread. Be aware that this is quite likely to give you lower maximum
>> performance than the batching behavior though.
>>
>
> Thanks...

Just to say it another way.
>
> But isn't O_DIRECT supposed to bypass buffering in Kernel?
That's correct. But it doesn't put your write at the head of any queue,
it just doesn't buffer it for you.

> Doesn't it directly write to disk?
Also correct, when it's your turn to write to disk...

> I tried to put fdatasync() at regular intervals but there was no
> visible effect.
>
Quite honestly, the main place I have found O_DIRECT useful is in
keeping programs doing large i/o quantities from blowing the buffers and
making the other applications run like crap. If you application is
running alone, unless you are very short of CPU or memory avoiding the
copy to an o/s buffer will be down in the measurement noise.

I had a news (usenet) server which normally did 120 art/sec (~480 tps),
which dropped to about 50 tps when doing large file copies even at low
priority. By using O_DIRECT the impact essentially vanished, at the cost
of the copy running about 10-15% slower. Changing various programs to
use O_DIRECT only helped when really large blocks of data were involved,
and only when i/o clould be done in a way to satisfy the alignment and
size requirements of O_DIRECT.

If you upgrade to a newer kernel you can try other i/o scheduler
options, default cfq or even deadline might be helpful.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

2006-12-21 06:03:44

by Manish Regmi

[permalink] [raw]
Subject: Re: Linux disk performance.

On 12/21/06, Bill Davidsen <[email protected]> wrote:
> >
> > But isn't O_DIRECT supposed to bypass buffering in Kernel?
> That's correct. But it doesn't put your write at the head of any queue,
> it just doesn't buffer it for you.
>
> > Doesn't it directly write to disk?
> Also correct, when it's your turn to write to disk...

But the only process accessing that disk is my application.

> > I tried to put fdatasync() at regular intervals but there was no
> > visible effect.
> >
> Quite honestly, the main place I have found O_DIRECT useful is in
> keeping programs doing large i/o quantities from blowing the buffers and
> making the other applications run like crap. If you application is
> running alone, unless you are very short of CPU or memory avoiding the
> copy to an o/s buffer will be down in the measurement noise.

Yes... my application does large amount of I/O. It actually writes
video data received from ethernet(IP camera) to the disk using 128 K
chunks.

> I had a news (usenet) server which normally did 120 art/sec (~480 tps),
> which dropped to about 50 tps when doing large file copies even at low
> priority. By using O_DIRECT the impact essentially vanished, at the cost
> of the copy running about 10-15% slower. Changing various programs to
> use O_DIRECT only helped when really large blocks of data were involved,
> and only when i/o clould be done in a way to satisfy the alignment and
> size requirements of O_DIRECT.
>
> If you upgrade to a newer kernel you can try other i/o scheduler
> options, default cfq or even deadline might be helpful.

I tried all disk schedulers but all had timing bumps. :(

> --
> bill davidsen <[email protected]>
> CTO TMR Associates, Inc
> Doing interesting things with small computers since 1979
>


--
---------------------------------------------------------------
regards
Manish Regmi

---------------------------------------------------------------
UNIX without a C Compiler is like eating Spaghetti with your mouth
sewn shut. It just doesn't make sense.

2006-12-21 07:15:57

by Daniel Cheng

[permalink] [raw]
Subject: Re: Linux disk performance.

Manish Regmi wrote:
[...]
>>
>> If you upgrade to a newer kernel you can try other i/o scheduler
>> options, default cfq or even deadline might be helpful.
>
> I tried all disk schedulers but all had timing bumps. :(
>

Did you try to disable the on disk write cache?

man hdparm(8)

-W Disable/enable the IDE drive´s write-caching
feature


--

2006-12-21 13:22:53

by Erik Mouw

[permalink] [raw]
Subject: Re: Linux disk performance.

On Thu, Dec 21, 2006 at 11:48:42AM +0545, Manish Regmi wrote:
> Yes... my application does large amount of I/O. It actually writes
> video data received from ethernet(IP camera) to the disk using 128 K
> chunks.

Bursty video traffic is really an application that could take advantage
from the kernel buffering. Unless you want to reinvent the wheel and do
the buffering yourself (it is possible though, I've done it on IRIX).

BTW, why are you so keen on smooth-at-the-microlevel writeout? With
real time video applications it's only important not to drop frames.
How fast those frames will go to the disk isn't really an issue, as
long as you don't overflow the intermediate buffer.


Erik

--
They're all fools. Don't worry. Darwin may be slow, but he'll
eventually get them. -- Matthew Lammers in alt.sysadmin.recovery

2006-12-22 00:14:56

by Bhanu Kalyan Chetlapalli

[permalink] [raw]
Subject: Re: Linux disk performance.

On 12/20/06, Manish Regmi <[email protected]> wrote:
> On 12/19/06, Nick Piggin <[email protected]> wrote:
> > When you submit a request to an empty block device queue, it can
> > get "plugged" for a number of timer ticks before any IO is actually
> > started. This is done for efficiency reasons and is independent of
> > the IO scheduler used.
> >
>
> Thanks for the information..
>
> > Use the noop IO scheduler, as well as the attached patch, and let's
> > see what your numbers look like.
> >
>
> Unfortunately i got the same results even after applying your patch. I
> also tried putting
> q->unplug_delay = 1;
> But it did not work. The result was similar.

I am assuming that your program is not seeking inbetween writes.

Try disabling the Disk Cache, now-a-days some disks can have as much
as 8MB write cache. so the disk might be buffering as much as it can,
and trying to write only when it can no longer buffer. Since you have
an app which continously write copious amounts of data, in order,
disabling write cache might make some sense.

> --
> ---------------------------------------------------------------
> regards
> Manish Regmi
>
Bhanu
> ---------------------------------------------------------------
> UNIX without a C Compiler is like eating Spaghetti with your mouth
> sewn shut. It just doesn't make sense.
>
> --
> Kernelnewbies: Help each other learn about the Linux kernel.
> Archive: http://mail.nl.linux.org/kernelnewbies/
> FAQ: http://kernelnewbies.org/faq/
>
>


--
There is only one success - to be able to spend your life in your own way.

2006-12-22 05:30:17

by Manish Regmi

[permalink] [raw]
Subject: Re: Linux disk performance.

On 12/22/06, Bhanu Kalyan Chetlapalli <[email protected]> wrote:
>
> I am assuming that your program is not seeking inbetween writes.
>
> Try disabling the Disk Cache, now-a-days some disks can have as much
> as 8MB write cache. so the disk might be buffering as much as it can,
> and trying to write only when it can no longer buffer. Since you have
> an app which continously write copious amounts of data, in order,
> disabling write cache might make some sense.
>

Thanks for the suggestion but the performance was terrible when write
cache was disabled.

--
---------------------------------------------------------------
regards
Manish Regmi

---------------------------------------------------------------
UNIX without a C Compiler is like eating Spaghetti with your mouth
sewn shut. It just doesn't make sense.

2006-12-22 05:39:05

by Manish Regmi

[permalink] [raw]
Subject: Re: Linux disk performance.

On 12/21/06, Erik Mouw <[email protected]> wrote:
> Bursty video traffic is really an application that could take advantage
> from the kernel buffering. Unless you want to reinvent the wheel and do
> the buffering yourself (it is possible though, I've done it on IRIX).

But in my test O_DIRECT gave a slight better performance. Also the CPU
usage decreased.

>
> BTW, why are you so keen on smooth-at-the-microlevel writeout? With
> real time video applications it's only important not to drop frames.
> How fast those frames will go to the disk isn't really an issue, as
> long as you don't overflow the intermediate buffer.

Actually i dont require smooth-at-the-microlevel writeout but the
timing bumps are overflowing the intermediate buffers . I was just
wondering if i could decrease the 20ms bumps to 3 ms as in other
writes.

>
> Erik
>
> --
> They're all fools. Don't worry. Darwin may be slow, but he'll
> eventually get them. -- Matthew Lammers in alt.sysadmin.recovery
>


--
---------------------------------------------------------------
regards
Manish Regmi

---------------------------------------------------------------
UNIX without a C Compiler is like eating Spaghetti with your mouth
sewn shut. It just doesn't make sense.

2006-12-22 05:39:17

by Bhanu Kalyan Chetlapalli

[permalink] [raw]
Subject: Re: Linux disk performance.

On 12/22/06, Manish Regmi <[email protected]> wrote:
> On 12/22/06, Bhanu Kalyan Chetlapalli <[email protected]> wrote:
> >
> > I am assuming that your program is not seeking inbetween writes.
> >
> > Try disabling the Disk Cache, now-a-days some disks can have as much
> > as 8MB write cache. so the disk might be buffering as much as it can,
> > and trying to write only when it can no longer buffer. Since you have
> > an app which continously write copious amounts of data, in order,
> > disabling write cache might make some sense.
> >
>
> Thanks for the suggestion but the performance was terrible when write
> cache was disabled.

Performance degradation is expected. But the point is - did the
anomaly, that you have pointed out, go away? Because if it did, then
it is the disk cache which is causing the issue, and you will have to
live with it. Else you will have to look elsewhere.

> --
> ---------------------------------------------------------------
> regards
> Manish Regmi
>
> ---------------------------------------------------------------
> UNIX without a C Compiler is like eating Spaghetti with your mouth
> sewn shut. It just doesn't make sense.
>


--
There is only one success - to be able to spend your life in your own way.

2006-12-22 05:56:49

by Manish Regmi

[permalink] [raw]
Subject: Re: Linux disk performance.

On 12/22/06, Bhanu Kalyan Chetlapalli <[email protected]> wrote:
> >
> > Thanks for the suggestion but the performance was terrible when write
> > cache was disabled.
>
> Performance degradation is expected. But the point is - did the
> anomaly, that you have pointed out, go away? Because if it did, then
> it is the disk cache which is causing the issue, and you will have to
> live with it. Else you will have to look elsewhere.

oops, sorry for incomplete answer.
Actually i did not tested thoroughly but my initial tests showed some
bumps and serious performance degradation. But anyway there was still
some bumps... :(

(sequence)(channel)(write time in microseconds)
0 0 6366
0 1 9949
0 2 10125
0 3 10165
0 4 11043
0 5 10129
0 6 10089
0 7 10165
0 8 71572
0 9 9882
0 10 8105
0 11 10085


--
---------------------------------------------------------------
regards
Manish Regmi

---------------------------------------------------------------
UNIX without a C Compiler is like eating Spaghetti with your mouth
sewn shut. It just doesn't make sense.

2006-12-27 15:50:45

by Phillip Susi

[permalink] [raw]
Subject: Re: Linux disk performance.

Bill Davidsen wrote:
> Quite honestly, the main place I have found O_DIRECT useful is in
> keeping programs doing large i/o quantities from blowing the buffers and
> making the other applications run like crap. If you application is
> running alone, unless you are very short of CPU or memory avoiding the
> copy to an o/s buffer will be down in the measurement noise.
>
> I had a news (usenet) server which normally did 120 art/sec (~480 tps),
> which dropped to about 50 tps when doing large file copies even at low
> priority. By using O_DIRECT the impact essentially vanished, at the cost
> of the copy running about 10-15% slower. Changing various programs to
> use O_DIRECT only helped when really large blocks of data were involved,
> and only when i/o clould be done in a way to satisfy the alignment and
> size requirements of O_DIRECT.
>
> If you upgrade to a newer kernel you can try other i/o scheduler
> options, default cfq or even deadline might be helpful.

I would point out that if you are looking for optimal throughput and
reduced cpu overhead, and avoid blowing out the kernel fs cache, you
need to couple aio with O_DIRECT. By itself O_DIRECT will lower
throughput because there will be brief pauses between each IO while the
application prepares the next buffer. You can overcome this by posting
a few pending buffers concurrently with aio, allowing the kernel to
always have a buffer ready for the next io as soon as the previous one
completes.


2007-01-01 01:51:31

by Bill Davidsen

[permalink] [raw]
Subject: Re: Linux disk performance.

Phillip Susi wrote:
> Bill Davidsen wrote:
>> Quite honestly, the main place I have found O_DIRECT useful is in
>> keeping programs doing large i/o quantities from blowing the buffers
>> and making the other applications run like crap. If you application is
>> running alone, unless you are very short of CPU or memory avoiding the
>> copy to an o/s buffer will be down in the measurement noise.
>>
>> I had a news (usenet) server which normally did 120 art/sec (~480
>> tps), which dropped to about 50 tps when doing large file copies even
>> at low priority. By using O_DIRECT the impact essentially vanished, at
>> the cost of the copy running about 10-15% slower. Changing various
>> programs to use O_DIRECT only helped when really large blocks of data
>> were involved, and only when i/o clould be done in a way to satisfy
>> the alignment and size requirements of O_DIRECT.
>>
>> If you upgrade to a newer kernel you can try other i/o scheduler
>> options, default cfq or even deadline might be helpful.
>
> I would point out that if you are looking for optimal throughput and
> reduced cpu overhead, and avoid blowing out the kernel fs cache, you
> need to couple aio with O_DIRECT. By itself O_DIRECT will lower
> throughput because there will be brief pauses between each IO while the
> application prepares the next buffer. You can overcome this by posting
> a few pending buffers concurrently with aio, allowing the kernel to
> always have a buffer ready for the next io as soon as the previous one
> completes.

A good point, but in this case there was no particular urgency, other
than not to stop the application while doing background data moves. The
best way to do it would have been to put it where it belonged in the
first place :-(

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979