Hello,
I have an computer with an AMD Duron, and the motehrboard chipset is VIA
KT133. The hard drive is a Seagate Barracuda 7200.7 ; no other EIDE
devices are attached.
I run an RH9-based distro, and added a 2.6.4 kernel to it. The following
problem was tested with two kernel variants: 2.6.4+wolk2/0 with
preeemption enabled, and 2.6.4 plain from kernel.org with preemption
disabled. No difference.
I noticed performance problems with 2.6.4, and tracked them to strange
HDD behavior.
It turned out that on disk-intensive operation, the "system" CPU usage
skyrockets. With a mere "cp" of a large file to the same direstory
(tested with ext3fs and FAT32 file systems), it is 100% practically all
of the time !
On stock distro kernel (2.4.20) the CPU load in the same situation
varies from 0 to 15-20% with small peaks to about 40%.
"cp" actually takes less time on 2.6.4 (with a file of the same size).
However the CPU load is scary, and I suspect that it adversely affects
overall performance.
Besides, with default settings, the results of "hdparm -tT /dev/hda0"
are substandard on 2.6.4. The "Timing buffered disk reads" averages
about 38 MB/sec on 2.4.20, about 27 MB/sec on 2.6.4.
This changes after I set large read ahead: "hdparm -a8192 /dev/hda0" .
"Timing buffered disk reads" becomes about 45 MB/s. But the CPU load
does not change. The "cp" performance (time to copy a large file) also
does not change noticeably.
IANAKH (I am Not A Kernel Hacker), but I like kernel 2.6.x and would be
willing to run tests, and try various builds, to help pinpoint any
problem.
Here is some output that may be relevant:
# cat /proc/ide/via
----------VIA BusMastering IDE Configuration----------------
Driver Version: 3.38
South Bridge: VIA vt82c686a
Revision: ISA 0x22 IDE 0x10
Highest DMA rate: UDMA66
BM-DMA base: 0xd000
PCI clock: 33.3MHz
Master Read Cycle IRDY: 0ws
Master Write Cycle IRDY: 0ws
BM IDE Status Register Read Retry: yes
Max DRDY Pulse Width: No limit
-----------------------Primary IDE-------Secondary IDE------
Read DMA FIFO flush: yes yes
End Sector FIFO flush: no no
Prefetch Buffer: yes yes
Post Write Buffer: yes no
Enabled: yes yes
Simplex only: no no
Cable Type: 80w 40w
-------------------drive0----drive1----drive2----drive3-----
Transfer Mode: UDMA PIO PIO PIO
Address Setup: 30ns 120ns 120ns 120ns
Cmd Active: 90ns 90ns 480ns 480ns
Cmd Recovery: 30ns 30ns 480ns 480ns
Data Active: 90ns 330ns 330ns 330ns
Data Recovery: 30ns 270ns 270ns 270ns
Cycle Time: 30ns 600ns 600ns 600ns
Transfer Rate: 66.6MB/s 3.3MB/s 3.3MB/s 3.3MB/s
# hdparm /dev/hda
/dev/hda:
multcount = 16 (on)
IO_support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 8192 (on)
geometry = 16383/255/63, sectors = 156301488, start = 0
(readahead was 256 before I changed it using hdparm -a)
Sincerely yours, and with big thanks to all the kernel authors,
Mikhail Ramendik
Moscow, Russia
P.S. I am not subscribed. I will watch the thread over the Web, but if I
need to provide additional info, or run some tests, or try any other
build of the kernel - please CC me for faster reaction.
Mikhail Ramendik wrote:
> Hello,
>
> I have an computer with an AMD Duron, and the motehrboard chipset is VIA
> KT133. The hard drive is a Seagate Barracuda 7200.7 ; no other EIDE
> devices are attached.
>
> I run an RH9-based distro, and added a 2.6.4 kernel to it. The following
> problem was tested with two kernel variants: 2.6.4+wolk2/0 with
> preeemption enabled, and 2.6.4 plain from kernel.org with preemption
> disabled. No difference.
>
> I noticed performance problems with 2.6.4, and tracked them to strange
> HDD behavior.
>
> It turned out that on disk-intensive operation, the "system" CPU usage
> skyrockets. With a mere "cp" of a large file to the same direstory
> (tested with ext3fs and FAT32 file systems), it is 100% practically all
> of the time !
Which tool do you use for measure? xosview?
I'm having here the same problem. But it depends on the tool which is used
for measuring. If I use top from procps 3.2, I can't see this high system
load. "time" can't see it, too.
This is what top says during cp of 512MB-file:
Cpu(s): 2.0% us, 8.3% sy, 0.0% ni, 0.0% id, 89.0% wa, 0.7% hi, 0.0% si
New is "wa", what probably means "wait". This value is very high as long
as the HD is writing or reading datas:
cp dummy /dev/null
produces this top-line:
Cpu(s): 3.0% us, 5.3% sy, 0.0% ni, 0.0% id, 91.0% wa, 0.7% hi, 0.0% si
and time says:
real 0m53.195s
user 0m0.013s
sys 0m2.124s
But you're right, 2.6.4 is slower than 2.4.25. See the thread "Very poor
performance with 2.6.4" here in the list.
Regards,
Andreas Hartmann
Andreas Hartmann wrote:
> Mikhail Ramendik wrote:
>
>> Hello,
>>
>> I have an computer with an AMD Duron, and the motehrboard chipset is VIA
>> KT133. The hard drive is a Seagate Barracuda 7200.7 ; no other EIDE
>> devices are attached.
>>
>> I run an RH9-based distro, and added a 2.6.4 kernel to it. The following
>> problem was tested with two kernel variants: 2.6.4+wolk2/0 with
>> preeemption enabled, and 2.6.4 plain from kernel.org with preemption
>> disabled. No difference.
>>
>> I noticed performance problems with 2.6.4, and tracked them to strange
>> HDD behavior.
>>
>> It turned out that on disk-intensive operation, the "system" CPU usage
>> skyrockets. With a mere "cp" of a large file to the same direstory
>> (tested with ext3fs and FAT32 file systems), it is 100% practically all
>> of the time !
>
>
> Which tool do you use for measure? xosview?
>
> I'm having here the same problem. But it depends on the tool which is
> used for measuring. If I use top from procps 3.2, I can't see this high
> system load. "time" can't see it, too.
>
> This is what top says during cp of 512MB-file:
> Cpu(s): 2.0% us, 8.3% sy, 0.0% ni, 0.0% id, 89.0% wa, 0.7% hi,
> 0.0% si
>
> New is "wa", what probably means "wait". This value is very high as long
> as the HD is writing or reading datas:
>
> cp dummy /dev/null
> produces this top-line:
> Cpu(s): 3.0% us, 5.3% sy, 0.0% ni, 0.0% id, 91.0% wa, 0.7% hi,
> 0.0% si
Yes "wa" is not intuitive, some other operating systems use "wio" for
"wait i/o" time. As noted in the other thread, you can try the deadline
elevator or increased readahead for your load.
>
> and time says:
> real 0m53.195s
> user 0m0.013s
> sys 0m2.124s
>
>
> But you're right, 2.6.4 is slower than 2.4.25. See the thread "Very poor
> performance with 2.6.4" here in the list.
Much discussed, not overly fixed :-(
--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
Hello,
Andreas Hartmann wrote:
> > It turned out that on disk-intensive operation, the "system" CPU usage
> > skyrockets. With a mere "cp" of a large file to the same direstory
> > (tested with ext3fs and FAT32 file systems), it is 100% practically all
> > of the time !
>
> Which tool do you use for measure? xosview?
IceWM's monitor. (It just runs all the time, that's how I spotted the
problem).
> I'm having here the same problem. But it depends on the tool which is used
> for measuring. If I use top from procps 3.2, I can't see this high system
> load. "time" can't see it, too.
>
> This is what top says during cp of 512MB-file:
> Cpu(s): 2.0% us, 8.3% sy, 0.0% ni, 0.0% id, 89.0% wa, 0.7% hi, 0.0% si
I don't have procps 3.2 as yet, will compile it soon; however I think
it's the same issue.
I tried the deadline elevator, as suggested by Bill Davidsen down this
thread. It did not help. In fact the performance fell (the same file
took a longer time to copy); the CPU use is still 100% (with an
occasional "dent" or two, but these are very small in duration).
I also tried increasing the read-ahead. It does not help either.
Finally I tried increasing the read-ahead WITH the deadline elevator.
The performance rose (compared to the one measured with the standard
read-ahead and the deadline elevator). And the CPU load still did not
change.
I don't have much CPU to waste (Duron 650 MHz), so I think some
performance problems I see are linked to this.
> But you're right, 2.6.4 is slower than 2.4.25. See the thread "Very poor
> performance with 2.6.4" here in the list.
I've looked at it. I will try the latest rc-mm kernel and report the
results.
Yours, Mikhail Ramendik
Hello,
Andreas Hartmann wrote:
> > It turned out that on disk-intensive operation, the "system" CPU usage
> > skyrockets. With a mere "cp" of a large file to the same direstory
> > (tested with ext3fs and FAT32 file systems), it is 100% practically all
> > of the time !
> But you're right, 2.6.4 is slower than 2.4.25. See the thread "Very poor
> performance with 2.6.4" here in the list.
As recommended there, I have tried 2.6.5-rc3-mm4.
No change. Still 100% CPU usage; the performance seems teh same.
Yours, Mikhail Ramendik
P.S. Sorry for making all comments into answers to your letter. I just
don't want to break the thread.
Bill Davidsen wrote:
> Andreas Hartmann wrote:
>> This is what top says during cp of 512MB-file:
>> Cpu(s): 2.0% us, 8.3% sy, 0.0% ni, 0.0% id, 89.0% wa, 0.7% hi,
>> 0.0% si
>>
>> New is "wa", what probably means "wait". This value is very high as long
>> as the HD is writing or reading datas:
>>
>> cp dummy /dev/null
>> produces this top-line:
>> Cpu(s): 3.0% us, 5.3% sy, 0.0% ni, 0.0% id, 91.0% wa, 0.7% hi,
>> 0.0% si
>
> Yes "wa" is not intuitive, some other operating systems use "wio" for
> "wait i/o" time. As noted in the other thread, you can try the deadline
> elevator or increased readahead for your load.
If the processor and the kernel could do other things during wa, like
compiling e.g., it would be no problem. But it seems to be, that this is
not possible. Or did I oversee something?
Regards,
Andreas Hartmann
Mikhail Ramendik wrote:
> Hello,
>
> Andreas Hartmann wrote:
>
>> > It turned out that on disk-intensive operation, the "system" CPU usage
>> > skyrockets. With a mere "cp" of a large file to the same direstory
>> > (tested with ext3fs and FAT32 file systems), it is 100% practically all
>> > of the time !
>> But you're right, 2.6.4 is slower than 2.4.25. See the thread "Very poor
>> performance with 2.6.4" here in the list.
>
> As recommended there, I have tried 2.6.5-rc3-mm4.
>
> No change. Still 100% CPU usage; the performance seems teh same.
Yes. But it's curious:
Take a tar-file, e.g. tar the compiled 2.6 kernel directory. Than, untar
it again - the machine behaves total normaly. And the 2.6-kernel is about
23% faster than the 2.4-kernel.
> Yours, Mikhail Ramendik
>
> P.S. Sorry for making all comments into answers to your letter. I just
> don't want to break the thread.
No problem - it's easier to read with comment directly in the text.
Regards,
Andreas Hartmann
Hello,
Andreas Hartmann wrote:
> > As recommended there, I have tried 2.6.5-rc3-mm4.
> >
> > No change. Still 100% CPU usage; the performance seems teh same.
>
> Yes. But it's curious:
> Take a tar-file, e.g. tar the compiled 2.6 kernel directory. Than, untar
> it again - the machine behaves total normaly.
Not really. I tried a "simple" tar (no gzib/bzip2) - it was the same as
with cp, a near-100% CPU "system" load, most of it iowait.
If I use bzip2 with tar, then yes, the load is nearly 100% "user",
actually it's bzip2. But this is because the disk i/o is done at a *far*
slower rate; the bottleneck is the CPU. If we don't read (or write) the
disk heavily, naturally the system/iowait load is low.
I tried doing a "cp" in another xterm window, while the tar/bzip2 was
running. And sure enough, up the CPU system/iowait usage goes - the
"cp"'s disk i/o takes much of the CPU time away from the bz2 task! Looks
exactly like a cause of performance problems.
(All of this was done on 2.6.5-rc3-mm4).
Yours, Mikhail Ramendik
> And the 2.6-kernel is about
> 23% faster than the 2.4-kernel.
>
>
> > Yours, Mikhail Ramendik
> >
> > P.S. Sorry for making all comments into answers to your letter. I just
> > don't want to break the thread.
>
> No problem - it's easier to read with comment directly in the text.
>
>
> Regards,
> Andreas Hartmann
>
>
Andreas Hartmann wrote:
> Bill Davidsen wrote:
>
>> Andreas Hartmann wrote:
>>
>>> This is what top says during cp of 512MB-file:
>>> Cpu(s): 2.0% us, 8.3% sy, 0.0% ni, 0.0% id, 89.0% wa, 0.7% hi,
>>> 0.0% si
>>>
>>> New is "wa", what probably means "wait". This value is very high as
>>> long as the HD is writing or reading datas:
>>>
>>> cp dummy /dev/null
>>> produces this top-line:
>>> Cpu(s): 3.0% us, 5.3% sy, 0.0% ni, 0.0% id, 91.0% wa, 0.7% hi,
>>> 0.0% si
>>
>>
>> Yes "wa" is not intuitive, some other operating systems use "wio" for
>> "wait i/o" time. As noted in the other thread, you can try the
>> deadline elevator or increased readahead for your load.
>
>
> If the processor and the kernel could do other things during wa, like
> compiling e.g., it would be no problem. But it seems to be, that this is
> not possible. Or did I oversee something?
Yes, wio is similar to idle, processor is available for work even if
disk access is running slowly.
--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
Mikhail Ramendik wrote:
> Hello,
>
> Andreas Hartmann wrote:
>
>>>As recommended there, I have tried 2.6.5-rc3-mm4.
>>>
>>>No change. Still 100% CPU usage; the performance seems teh same.
>>
>>Yes. But it's curious:
>>Take a tar-file, e.g. tar the compiled 2.6 kernel directory. Than, untar
>>it again - the machine behaves total normaly.
>
>
> Not really. I tried a "simple" tar (no gzib/bzip2) - it was the same as
> with cp, a near-100% CPU "system" load, most of it iowait.
??? was it in system or wait-io? One or the other, if you can't tell the
difference update your tools, see what's really happening.
>
> If I use bzip2 with tar, then yes, the load is nearly 100% "user",
> actually it's bzip2. But this is because the disk i/o is done at a *far*
> slower rate; the bottleneck is the CPU. If we don't read (or write) the
> disk heavily, naturally the system/iowait load is low.
>
> I tried doing a "cp" in another xterm window, while the tar/bzip2 was
> running. And sure enough, up the CPU system/iowait usage goes - the
> "cp"'s disk i/o takes much of the CPU time away from the bz2 task! Looks
> exactly like a cause of performance problems.
>
> (All of this was done on 2.6.5-rc3-mm4).
--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979