2001-12-26 05:18:55

by Paul Boley

[permalink] [raw]
Subject: severe slowdown with 2.4 series w/heavy disk access

I have been having this problem with the whole 2.4 kernel series.. Under
heavy disk access, the entire system will slow down and almost all of my
memory, save 5 megs, gets used up, never to return. I am currently
running 2.4.17 on a machine with 416 megs of ram, Duron 750, not
overclocked. Cpu temp does not exceed 107 deg F ever, so I don't think
its a heat issue. I have an MSI K7T Pro2 motherboard, using the KT133
chipset. Anyway, the following is how I can duplicate the problem, in
about 5 minutes. I only used mozilla for this because I was working
with it when I decided to isolate the problem. Also note this happens
more than just with tar, and sometimes it happens for no apparent reason
at all.

*** free and ps -ax, before the slowdown:

total used free shared buffers
cached
Mem: 417472 30104 387368 0 1264
21748
-/+ buffers/cache: 7092 410380
Swap: 136544 0 136544

PID TTY STAT TIME COMMAND
1 ? S 0:04 init
2 ? SW 0:00 [keventd]
3 ? SWN 0:00 [ksoftirqd_CPU0]
4 ? SW 0:00 [kswapd]
5 ? SW 0:00 [bdflush]
6 ? SW 0:00 [kupdated]
63 ? S 0:00 /usr/sbin/syslogd
66 ? S 0:00 /usr/sbin/klogd -c 3
78 ? S 0:00 /usr/sbin/atd -b 15 -l 1
90 tty1 S 0:00 -bash
91 tty2 S 0:00 -bash
92 tty3 S 0:00 /sbin/agetty 38400 tty3 linux
93 tty4 S 0:00 /sbin/agetty 38400 tty4 linux
94 tty5 S 0:00 /sbin/agetty 38400 tty5 linux
95 tty6 S 0:00 /sbin/agetty 38400 tty6 linux
192 tty2 S 0:00 top
200 tty1 R 0:00 ps -ax

*** I then rm -rf'd mozilla, and tar -zxvf mozilla-source-0.9.7.tar.gz,
*** and immediately after, ran free and ps -ax again. The system
started
*** losing memory and slowing down about 10 seconds into the
*** decompression.

total used free shared buffers
cached
Mem: 417472 412192 5280 0 20632
315680
-/+ buffers/cache: 75880 341592
Swap: 136544 0 136544

PID TTY STAT TIME COMMAND
1 ? S 0:04 init
2 ? SW 0:00 [keventd]
3 ? SWN 0:00 [ksoftirqd_CPU0]
4 ? SW 0:00 [kswapd]
5 ? SW 0:00 [bdflush]
6 ? SW 0:02 [kupdated]
63 ? S 0:00 /usr/sbin/syslogd
66 ? S 0:00 /usr/sbin/klogd -c 3
78 ? S 0:00 /usr/sbin/atd -b 15 -l 1
90 tty1 S 0:00 -bash
91 tty2 S 0:00 -bash
92 tty3 S 0:00 /sbin/agetty 38400 tty3 linux
93 tty4 S 0:00 /sbin/agetty 38400 tty4 linux
94 tty5 S 0:00 /sbin/agetty 38400 tty5 linux
95 tty6 S 0:00 /sbin/agetty 38400 tty6 linux
192 tty2 S 0:01 top
207 tty1 R 0:00 ps -ax



Subject: Re: severe slowdown with 2.4 series w/heavy disk access

It is quite possible your MB routes all disk access straight through the
processor, while using system ram to cache itself. Tar does not (to my
knowledge) limit itself, and is quite nasty on such things. My MB has had
troubles as such before. I know of no workaround, except get a new MB.

><><>< Idrigal of Imladris
----- Original Message -----
From: "Paul Boley" <[email protected]>
To: <[email protected]>
Sent: Tuesday, 25 December, 2001 11:17 PM
Subject: severe slowdown with 2.4 series w/heavy disk access


> I have been having this problem with the whole 2.4 kernel series.. Under
> heavy disk access, the entire system will slow down and almost all of my
> memory, save 5 megs, gets used up, never to return. I am currently
> running 2.4.17 on a machine with 416 megs of ram, Duron 750, not
> overclocked. Cpu temp does not exceed 107 deg F ever, so I don't think
> its a heat issue. I have an MSI K7T Pro2 motherboard, using the KT133
> chipset. Anyway, the following is how I can duplicate the problem, in
> about 5 minutes. I only used mozilla for this because I was working
> with it when I decided to isolate the problem. Also note this happens
> more than just with tar, and sometimes it happens for no apparent reason
> at all.
>
> *** free and ps -ax, before the slowdown:
>
> total used free shared buffers
> cached
> Mem: 417472 30104 387368 0 1264
> 21748
> -/+ buffers/cache: 7092 410380
> Swap: 136544 0 136544
>
> PID TTY STAT TIME COMMAND
> 1 ? S 0:04 init
> 2 ? SW 0:00 [keventd]
> 3 ? SWN 0:00 [ksoftirqd_CPU0]
> 4 ? SW 0:00 [kswapd]
> 5 ? SW 0:00 [bdflush]
> 6 ? SW 0:00 [kupdated]
> 63 ? S 0:00 /usr/sbin/syslogd
> 66 ? S 0:00 /usr/sbin/klogd -c 3
> 78 ? S 0:00 /usr/sbin/atd -b 15 -l 1
> 90 tty1 S 0:00 -bash
> 91 tty2 S 0:00 -bash
> 92 tty3 S 0:00 /sbin/agetty 38400 tty3 linux
> 93 tty4 S 0:00 /sbin/agetty 38400 tty4 linux
> 94 tty5 S 0:00 /sbin/agetty 38400 tty5 linux
> 95 tty6 S 0:00 /sbin/agetty 38400 tty6 linux
> 192 tty2 S 0:00 top
> 200 tty1 R 0:00 ps -ax
>
> *** I then rm -rf'd mozilla, and tar -zxvf mozilla-source-0.9.7.tar.gz,
> *** and immediately after, ran free and ps -ax again. The system
> started
> *** losing memory and slowing down about 10 seconds into the
> *** decompression.
>
> total used free shared buffers
> cached
> Mem: 417472 412192 5280 0 20632
> 315680
> -/+ buffers/cache: 75880 341592
> Swap: 136544 0 136544
>
> PID TTY STAT TIME COMMAND
> 1 ? S 0:04 init
> 2 ? SW 0:00 [keventd]
> 3 ? SWN 0:00 [ksoftirqd_CPU0]
> 4 ? SW 0:00 [kswapd]
> 5 ? SW 0:00 [bdflush]
> 6 ? SW 0:02 [kupdated]
> 63 ? S 0:00 /usr/sbin/syslogd
> 66 ? S 0:00 /usr/sbin/klogd -c 3
> 78 ? S 0:00 /usr/sbin/atd -b 15 -l 1
> 90 tty1 S 0:00 -bash
> 91 tty2 S 0:00 -bash
> 92 tty3 S 0:00 /sbin/agetty 38400 tty3 linux
> 93 tty4 S 0:00 /sbin/agetty 38400 tty4 linux
> 94 tty5 S 0:00 /sbin/agetty 38400 tty5 linux
> 95 tty6 S 0:00 /sbin/agetty 38400 tty6 linux
> 192 tty2 S 0:01 top
> 207 tty1 R 0:00 ps -ax
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-12-26 14:48:15

by Alan

[permalink] [raw]
Subject: Re: severe slowdown with 2.4 series w/heavy disk access

> total used free shared buffers
> cached
> Mem: 417472 412192 5280 0 20632
> 315680

Thyose values dont show any problems. In fact your machine seems to think it
had a ton of free memory to waste and has let it fill up with stuff that has
been accessed - just on the chance that it may be reused.

It hasn't even felt enough memory pressure to start swapping. When you
say it "becomes slow", what precisely becomes slow ?

Also what disks do you have and how are they set up ?

2001-12-26 16:35:10

by Paul Boley

[permalink] [raw]
Subject: Re: severe slowdown with 2.4 series w/heavy disk access

Alan Cox wrote:
>
> > total used free shared buffers
> > cached
> > Mem: 417472 412192 5280 0 20632
> > 315680
>
> Thyose values dont show any problems. In fact your machine seems to think it
> had a ton of free memory to waste and has let it fill up with stuff that has
> been accessed - just on the chance that it may be reused.
>
> It hasn't even felt enough memory pressure to start swapping. When you
> say it "becomes slow", what precisely becomes slow ?

When this happens in X, the mouse drags and skips, any processes running
(like tar/gzip. ls in an empty dir takes about 10 seconds) slow down,
and it happens usually for about 10sec-2min, often for no apparent
reason. The big decompression was just a way I can easily duplicate
it. Oddly enough though, according to top, it caches all that memory at
once, and my free goes down to 5 megs, with the system hanging/slow to
respond, for 10sec-2min. Even typing in the console has delay before
the characters appear, and according to top, tar and gz are both using
under 1% cpu while this happens, and about 50% of the cpu is in use by
the system (not by any processes that I can see. kupdated goes up to
about 0.3% during this)

>
> Also what disks do you have and how are they set up ?
> -

/dev/hdb3 on / type ext2 (rw)
/dev/hdb4 on /home type ext2 (rw)
/dev/hda1 on /dos/c type vfat (rw)
/dev/hdb1 on /dos/d type vfat (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
none on /proc type proc (rw)

and my swap is /dev/hdb2

2001-12-26 16:39:20

by Paul Boley

[permalink] [raw]
Subject: Re: severe slowdown with 2.4 series w/heavy disk access

vda wrote:

> >
> > total used free shared buffers
> > cached
> > Mem: 417472 412192 5280 0 20632
> > 315680
> > -/+ buffers/cache: 75880 341592
> > Swap: 136544 0 136544
>
> It seems you think your memory is used for no purpose,
> but kernel just keeps page cache in your RAM (kernel bugs are indeed
> possible, but your test does not show any buggy behavior IMHO).

My whole system slows down (commands take a long time to execute,
decompression slows, ls in an empty dir takes 10 sec)

>
> To verify this, you may repeat this experiment on a separate partition:
> 1) mount a partition
> 2) do the test as you described
> 3) umount the partition

I did this, and it uncached some, but I only had 60megs (out of 416)
free after unmounting the partition. The cache went down to about 5
megs, and in-use was at 350 megs.

>
> I believe you should see tons of free memory then, especially if your tarfile
> is also on that partition.
> Please report back if you would do the test.
> --
> vda

2001-12-26 16:47:20

by Calin A. Culianu

[permalink] [raw]
Subject: Re: severe slowdown with 2.4 series w/heavy disk access

On Tue, 25 Dec 2001, Paul Boley wrote:

> When this happens in X, the mouse drags and skips, any processes running
> (like tar/gzip. ls in an empty dir takes about 10 seconds) slow down,
> and it happens usually for about 10sec-2min, often for no apparent
> reason. The big decompression was just a way I can easily duplicate
> it. Oddly enough though, according to top, it caches all that memory at
> once, and my free goes down to 5 megs, with the system hanging/slow to
> respond, for 10sec-2min. Even typing in the console has delay before
> the characters appear, and according to top, tar and gz are both using
> under 1% cpu while this happens, and about 50% of the cpu is in use by
> the system (not by any processes that I can see. kupdated goes up to
> about 0.3% during this)
>
> >
> > Also what disks do you have and how are they set up ?
> > -
>
> /dev/hdb3 on / type ext2 (rw)
> /dev/hdb4 on /home type ext2 (rw)
> /dev/hda1 on /dos/c type vfat (rw)
> /dev/hdb1 on /dos/d type vfat (rw)
> none on /dev/pts type devpts (rw,gid=5,mode=620)
> none on /proc type proc (rw)

Are your hard drives configured to use DMA? I noticed that when using PIO
in linux, you definitely get some user responsiveness problems whenever
you are doing heavy disk IO. Try turning dma mode on. man hdparm.
However, if you say you are getting like 10sec-2min lags in
responsiveness.. i somehow doubt that is the true culprit in this case...


Also, are you using NFS and/or NIS? These things can also really slow
down some apparently low-cost things like running ls and the like. You
might also have bad sectors on your hard drives. I know some hard drives
(or is it just the drivers?) internally try to repeat a read or write
request several times before reporting an error. Sometimes your hard
drives' media are slightly bad and thus exhibit huge slowdowns whenever
they operate on the offending sectors... the drive keeps retrying and then
it succeeds with the only visible symptom being that you waited a really
long time for the request to complete. During this time, of course, the
whole drive is retarded and cannot be accessed at all by the driver...

These are just some ideas. Of course, since it's not so likely to be the
kernel, I thought we could rule out other more likely suspects. :)

-Calin


2001-12-26 17:26:56

by Alan

[permalink] [raw]
Subject: Re: severe slowdown with 2.4 series w/heavy disk access

> When this happens in X, the mouse drags and skips, any processes running
> (like tar/gzip. ls in an empty dir takes about 10 seconds) slow down,
> and it happens usually for about 10sec-2min, often for no apparent
> reason. The big decompression was just a way I can easily duplicate
> it. Oddly enough though, according to top, it caches all that memory at

Ok

> once, and my free goes down to 5 megs, with the system hanging/slow to

The free behaviour is correct (free memory is wasted memory). The delays are
obviously not

> under 1% cpu while this happens, and about 50% of the cpu is in use by
> the system (not by any processes that I can see. kupdated goes up to
> about 0.3% during this)

> > Also what disks do you have and how are they set up ?
> > -

[I meant are they in DMA / UDMA modes ?]

Alan

2001-12-26 19:25:54

by Guillaume Morin

[permalink] [raw]
Subject: Re: severe slowdown with 2.4 series w/heavy disk access

FYI,

I experience the same problem with recent kernels. I run 2.4.16. When I
copy a file from one hd to another, ls'ing a empty directory takes about
6-7 secs.

Disks are

/dev/hda:

non-removable ATA device, with non-removable media
Model Number: QUANTUM FIREBALLP LM15
Serial Number: 883011568812
Firmware Revision: A35.0700
Standards:
Used: ATA/ATAPI-5 T13 1321D revision 1
Supported: 1 2 3 4 5
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
bytes/track: 32256 (obsolete)
bytes/sector: 21298 (obsolete)
current sector capacity: 16514064
LBA user addressable sectors = 29336832
Capabilities:
LBA, IORDY(can be disabled)
Buffer size: 1900.0kB ECC bytes: 4 Queue depth: 1
Standby timer values: spec'd by Vendor, no device specific
minimum
r/w multiple sector transfer: Max = 16 Current = 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 *udma4
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* READ BUFFER cmd
* WRITE BUFFER cmd
* Host Protected Area feature set
* look-ahead
* write cache
* Power Management feature set
Security Mode feature set
SMART feature set
SET MAX security extension
* DOWNLOAD MICROCODE cmd
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
12min for SECURITY ERASE UNIT.
HW reset results:
CBLID- above Vih
Device num = 0 determined by the jumper
Checksum: correct

/dev/hdd:

non-removable ATA device, with non-removable media
Model Number: WDC WD400BB-00CLB0
Serial Number: WD-WMAAN1229771
Firmware Revision: 05.04E05
Standards:
Supported: 1 2 3 4 5
Likely used: 5
Configuration:
Logical max current
cylinders 16383 17475
heads 16 15
sectors/track 63 63
bytes/track: 57600 (obsolete)
bytes/sector: 600 (obsolete)
current sector capacity: 16513875
LBA user addressable sectors = 78165360
Capabilities:
LBA, IORDY(can be disabled)
Buffer size: 2048.0kB ECC bytes: 40 Queue depth: 1
Standby timer values: spec'd by standard, with device specific
minimum
r/w multiple sector transfer: Max = 16 Current = 16
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* READ BUFFER cmd
* WRITE BUFFER cmd
* Host Protected Area feature set
* look-ahead
* write cache
* Power Management feature set
Security Mode feature set
SMART feature set
SET MAX security extension
* DOWNLOAD MICROCODE cmd
Security:
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
HW reset results:
CBLID- above Vih
Device num = 1 determined by the jumper
Checksum: correct

at boot :

ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 29336832 sectors (15020 MB) w/1900KiB Cache, CHS=1826/255/63,
UDMA(66)
hdd: 78165360 sectors (40021 MB) w/2048KiB Cache, CHS=77545/16/63,
UDMA(33)

Let me know if you want me to test patches or provide more information.

Regards,

--
Guillaume Morin <[email protected]>

La vie est fac?tieuse