2008-02-29 14:16:58

by Gordon Mckeown

[permalink] [raw]
Subject: Very high IOWait during all disk activity

(Please cc me on replies)

I recently noticed on a number of my Linux boxes that during disk
activity, CPU usage was consistently hitting 100%. A little digging
showed that the CPU was spending up to around 65% of its time in an
IOWait state. Checked this with kernels 2.6.22 and 2.6.25-rc3, and
also across SATA and PATA drives on three different machines, all with
the same results. I also checked back with an old Ubuntu 6.06 Live CD
and that also exhibits the problem.

Having done some digging on the net, I can't get a definitive answer
as to whether this is considered "normal". Some people suggest that
IOWait is informational and doesn't indicate a problem, but based on
my admittedly limited understanding of such things, the CPU shouldn't
need to spend much time on disk I/O these days due to the use of DMA.

Is it expected behaviour for the CPU to spend such a large amount of
time in the IOWait state during disk I/O?

(For anyone who wants to see a more detailed analysis, I have an open
bug on Ubuntu's Launchpad: https://bugs.launchpad.net/bugs/192353).

Cheers,

Gordon.


2008-02-29 15:18:57

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: Very high IOWait during all disk activity

> I recently noticed on a number of my Linux boxes that during disk
> activity, CPU usage was consistently hitting 100%. A little digging
> showed that the CPU was spending up to around 65% of its time in an
> IOWait state. Checked this with kernels 2.6.22 and 2.6.25-rc3, and
> also across SATA and PATA drives on three different machines, all with
> the same results. I also checked back with an old Ubuntu 6.06 Live CD
> and that also exhibits the problem.
>
> Having done some digging on the net, I can't get a definitive answer
> as to whether this is considered "normal". Some people suggest that
> IOWait is informational and doesn't indicate a problem, but based on
> my admittedly limited understanding of such things, the CPU shouldn't
> need to spend much time on disk I/O these days due to the use of DMA.
>
> Is it expected behaviour for the CPU to spend such a large amount of
> time in the IOWait state during disk I/O?

Unless you can write to the disk faster than fetch data from /dev/zero -
yes, it is normal.

BTW, it doesn't mean that your CPU's cycles are wasted. You will see big
"wa" numbers when there are no other tasks to schedule at the same time.

Try running:

cat /dev/zero | bzip2 -c >/dev/null

when your IOwait is big (because you write a big file), and then watch
the numbers.


--
Tomasz Chmielewski
http://wpkg.org

2008-02-29 15:47:24

by Gordon Mckeown

[permalink] [raw]
Subject: Re: Very high IOWait during all disk activity

On Fri, Feb 29, 2008 at 3:18 PM, Tomasz Chmielewski <[email protected]> wrote:
> Unless you can write to the disk faster than fetch data from /dev/zero -
> yes, it is normal.

OK, thank you; it has been a struggle to get confirmation of this;
perhaps because the way IOWait is measured has changed at some point?

> Try running:
>
> cat /dev/zero | bzip2 -c >/dev/null
>
> when your IOwait is big (because you write a big file), and then watch
> the numbers.

Ah, I can see that the CPU-intensive command soaks up the cycles that
would otherwise have been reported as IOWaits.

Unfortunately this doesn't help explain why Windows XP on the same box
can complete an "identical" copy operation in half the time. Perhaps
it's just due to the different filesystems, or the way write caching
works?

Thanks,

Gordon.

2008-02-29 16:01:11

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: Very high IOWait during all disk activity

Gordon Mckeown schrieb:

(...)

> Unfortunately this doesn't help explain why Windows XP on the same box
> can complete an "identical" copy operation in half the time. Perhaps
> it's just due to the different filesystems, or the way write caching
> works?

Did you, at least:

* made a NTFS on that drive for a Windows test
* made a ext3 on that same drive for a Linux test

Or, is Linux using partition A, and Windows using partition B?

If it's the latter, such tests are not very meaningful.

Also, you gave too few details to say anything meaningful, really (did
you drop cache on both systems before starting copying? did you sync
after copying on both systems to see how long does it take? etc.).


--
Tomasz Chmielewski
http://wpkg.org

2008-03-01 13:31:56

by Gordon Mckeown

[permalink] [raw]
Subject: Re: Very high IOWait during all disk activity

On Fri, Feb 29, 2008 at 4:00 PM, Tomasz Chmielewski <[email protected]> wrote:

> Also, you gave too few details to say anything meaningful, really (did
> you drop cache on both systems before starting copying? did you sync
> after copying on both systems to see how long does it take? etc.).

Tomasz, many thanks for your comments. My original testing of Windows
against Linux was done in a very non-scientific way (as my main
concern was with IOWaits).

I have now performed some slightly more scientific comparisons between
the Windows and Linux copy operations. I think the main flaw in my
original Windows test was that I used Explorer to do the copy, and I
believe this has a tendency to hide the true length of the copy
operation.

Turning off the write cache on Windows resulted in a massive decrease
in performance, so I simply listened to the disk heads after the copy
had completed to confirm that the disk was idle (not perfect, I know).
On Linux I ran a sync after the copy, but it took less than 1 second
to complete, so I have ignored it in my results.

I ran the tests again using perfmon to monitor CPU usage, timeit.exe
from the Win2k3 resource kit to time the copy, and the xcopy command
to run the copies themselves. With these tests, Linux actually copied
a 3387MB file very slightly quicker than Windows.

It did, however, still highlight quite a difference in CPU usage.
Perfmon only recorded total CPU usage (i.e. user + system), so I
compared this against the user+system figures from vmstat.

Average CPU usage for Windows during the copies was 12%. Average CPU
usage for Linux during the copies was almost 40%!

Command used on Linux:

time cp testdir1/testfile1 testdir2/

Here's a quick sample of the vmstat output during one of the copy operations:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 3 34780 12688 2404 756888 0 0 19856 58148 411 988 4 40 0 56
0 2 34780 11448 2432 759284 0 0 16724 152 400 1049 1 26 0 73
0 1 34780 12632 2472 757456 0 0 36580 16460 525 1256 3 54 0 43
2 1 34780 13188 2500 756704 0 0 21400 20720 444 1071 3 34 0 63
0 1 34780 12104 2508 757368 0 0 23700 24864 430 1043 1 36 0 63
0 2 34780 11324 2512 757892 0 0 20500 59596 436 1073 2 43 0 55
0 2 34780 11700 2532 758432 0 0 20500 0 407 1100 1 29 0 70
0 1 34780 11592 2568 758444 0 0 32804 15148 512 1303 2 48 0 50

As you can see, in this case there's a pretty huge amount of system CPU usage.

I'll run some more tests with different filesystems to see if this is
related to the use of EXT3 specifically.

Cheers,

Gordon.

2008-03-01 19:42:08

by Gordon Mckeown

[permalink] [raw]
Subject: Re: Very high IOWait during all disk activity

On Sat, Mar 1, 2008 at 1:31 PM, I wrote:
> I'll run some more tests with different filesystems to see if this is
> related to the use of EXT3 specifically.

Much of the system CPU usage does appear to be due to the use of EXT3;
the load with EXT2 is around half that with EXT3.

See graph of results: http://www.ubergeek.org.uk/cpu_usage.png

For Windows, the graph shows total CPU usage as reported by the OS.
For Linux, the combined User+System figure was used.

Regards,

Gordon.