2002-12-19 12:32:55

by martin f krafft

[permalink] [raw]
Subject: 'D' processes on a healthy system?

[please CC me on replies]

Hi folks,

I just pulled up a fresh, powerful, and errorfree server on Debian
woody, with some packages from testing. It's running a vanilla 2.4.19
kernel with the grsecurity 1.9.7d and freeswan 1.99 patches. This is
an AMD Athlon Duron 1.2 GHz with 512 Mb of SD-RAM and a 7,200 UPM HDD
by Maxtor. Reiserfs is used as filesystem.

I also have a machine that's about a year old, still running a vanilla
2.4.12 kernel without any patches. It's an AMD K6-2 500 MHz with 192
Mb of RAM and a 5,400 UPM Western Digital drive. ext2 is the
filesystem here.

When either machine becomes loaded, certain processes become unusable
for a couple of seconds. top shows me this:

19268 madduck 18 0 2208 2208 1076 D 1.3 0.4 0:00 sanitizer
6 root 9 0 0 0 0 DW 0.3 0.0 1:10 kupdated
8457 root 10 0 1156 1156 820 R 0.3 0.2 0:02 top
10843 postfix 9 0 1364 1364 1016 D 0.3 0.2 0:00 cleanup
3156 madduck 0 0 636 636 540 D 0.1 0.1 0:00 procmail
28356 root 9 0 292 288 240 R 0.0 0.0 0:01 supervise
28706 root 9 0 2060 2036 1724 D 0.0 0.4 0:00 sshd
21395 root 15 0 1944 1944 1876 D 0.0 0.3 0:00 zsh

notice the number of processes in ^
uninterruptible sleep mode in this column.

This was after i did something like:

while true; do echo test | sendmail madduck; mailq; done

When this state is reached, programs like mutt take 7 minutes to open
a mailbox of 100 messages. With the server specs, this should not
happen.

I have memtest86'd the RAM and ran badblocks over all partitions
without finding anything.

My laptop, which is running Debian testing/unstable is not showing
this behaviour, and its load goes far higher at times. I also run
various other servers, partially on P5-120 systems, vanilla 2.4.xx
kernels and Debian testing, and there are no such problems there.

What is this an indication of? Hardware problems? Software problems?
Have you heard of this before? How can I fix it?

Thanks,

[Please CC me on replies]

--
.''`. martin f. krafft <[email protected]>
: :' : proud Debian developer, admin, and user
`. `'`
`- Debian - when you have better things to do than fixing a system

NOTE: The public PGP keyservers are broken!
Get my key here: http://people.debian.org/~madduck/gpg/330c4a75.asc


Attachments:
(No filename) (2.33 kB)
(No filename) (189.00 B)
Download all attachments

2002-12-19 16:55:00

by Alan

[permalink] [raw]
Subject: Re: 'D' processes on a healthy system?

On Thu, 2002-12-19 at 12:40, martin f krafft wrote:
> [please CC me on replies]
> 19268 madduck 18 0 2208 2208 1076 D 1.3 0.4 0:00 sanitizer
> 6 root 9 0 0 0 0 DW 0.3 0.0 1:10 kupdated
> 8457 root 10 0 1156 1156 820 R 0.3 0.2 0:02 top
> 10843 postfix 9 0 1364 1364 1016 D 0.3 0.2 0:00 cleanup
> 3156 madduck 0 0 636 636 540 D 0.1 0.1 0:00 procmail
> 28356 root 9 0 292 288 240 R 0.0 0.0 0:01 supervise
> 28706 root 9 0 2060 2036 1724 D 0.0 0.4 0:00 sshd
> 21395 root 15 0 1944 1944 1876 D 0.0 0.3 0:00 zsh
>
> notice the number of processes in ^
> uninterruptible sleep mode in this column.

Your disk is too slow for the work being asked of it, thats all.
Eventually it'll get there

> My laptop, which is running Debian testing/unstable is not showing
> this behaviour, and its load goes far higher at times. I also run
> various other servers, partially on P5-120 systems, vanilla 2.4.xx
> kernels and Debian testing, and there are no such problems there.

sendmail tuning ?

2002-12-19 18:16:17

by martin f krafft

[permalink] [raw]
Subject: Re: 'D' processes on a healthy system?

[please continue to CC me]

Thank you for your reply:

also sprach Alan Cox <[email protected]> [2002.12.19.1843 +0100]:
> Your disk is too slow for the work being asked of it, thats all.
> Eventually it'll get there

Alan, I am in no position to doubt what you say, but I can't imagine
that. Sure, maybe the 5,400 RPM one, but not the 7,200 RPM one.

The reason why I am saying this is twofold and empirical:

- When the above occurs, the system in question might not be doing
anything. My example with /usr/sbin/sendmail in a while loop is
hardcore stresstesting. I have had the problem with no users on
the system, no requests being served by the servers (ifconfig
down), just two ssh connections, one displaying top, the other
opening a Maildir folder of 1,000 messages with mutt. I really
don't (want to) believe that a system with these specs can't
handle that.

- I have another system with exactly the same specs (AMD K6 Duron
1.2 GHz, 512 MB, 7,200 HDD) that is happy doing all of the
following at the same time
* compiling a kernel
* streaming local MP3s to three other computers
* being used intensely through X (it's my main computer).
In fact, to verify this, I told the system to also check and
update tripwire while I was additionally running the slocate
updater. Other than the interactive use, these activities are
very tough on the disk, and yet I see no 'D' processes.

In any case, loading up mutt on a Maildir folder of 1,000 messages
should not take seven Minutes. If it does, there must be heavy usage
of the disk from another source. If top doesn't show anything, what
other tool could I use to see what processes are accessing the
harddrive? Is there something like a disk monitor for Linux, which
registers every request to the HDD like there is for Windoze
(http://www.sysinternals.com/ntw2k/freeware/diskmon.shtml)?

> > My laptop, which is running Debian testing/unstable is not showing
> > this behaviour, and its load goes far higher at times. I also run
> > various other servers, partially on P5-120 systems, vanilla 2.4.xx
> > kernels and Debian testing, and there are no such problems there.
>
> sendmail tuning ?

postfix... but no. All my machines have identical postfix
configurations, and, as mentioned above, the problem is not only
triggered when postfix is active...

Thank you for your time!

[please continue to CC me]

--
.''`. martin f. krafft <[email protected]>
: :' : proud Debian developer, admin, and user
`. `'`
`- Debian - when you have better things to do than fixing a system

NOTE: The pgp.net keyservers and their mirrors are broken!
Get my key here: http://people.debian.org/~madduck/gpg/330c4a75.asc


Attachments:
(No filename) (2.69 kB)
(No filename) (189.00 B)
Download all attachments

2002-12-19 18:38:17

by Alan

[permalink] [raw]
Subject: Re: 'D' processes on a healthy system?

On Thu, 2002-12-19 at 18:23, martin f krafft wrote:
> [please continue to CC me]
>
> Thank you for your reply:
>
> also sprach Alan Cox <[email protected]> [2002.12.19.1843 +0100]:
> > Your disk is too slow for the work being asked of it, thats all.
> > Eventually it'll get there
>
> Alan, I am in no position to doubt what you say, but I can't imagine
> that. Sure, maybe the 5,400 RPM one, but not the 7,200 RPM one.

Its more to do with the controller and configuration. Eg if your disk
isnt in DMA mode it'll certainly show up

2002-12-19 18:44:07

by martin f krafft

[permalink] [raw]
Subject: Re: 'D' processes on a healthy system?

also sprach Alan Cox <[email protected]> [2002.12.19.2027 +0100]:
> Its more to do with the controller and configuration. Eg if your disk
> isnt in DMA mode it'll certainly show up

Interesting. The 500 MHz system wasn't in DMA mode (and I though I had
it there). I'll continue monitoring it now that I turned it on.

Thank you for your help so far, Alan!

--
.''`. martin f. krafft <[email protected]>
: :' : proud Debian developer, admin, and user
`. `'`
`- Debian - when you have better things to do than fixing a system

NOTE: The pgp.net keyservers and their mirrors are broken!
Get my key here: http://people.debian.org/~madduck/gpg/330c4a75.asc


Attachments:
(No filename) (674.00 B)
(No filename) (189.00 B)
Download all attachments

2003-01-07 07:54:28

by martin f krafft

[permalink] [raw]
Subject: Re: 'D' processes on a healthy system?

also sprach martin f krafft <[email protected]> [2003.01.07.0901 +0100]:
> this is on an AMD K6-2 500 MHz machine with 160 Mb RAM, 256Mb of swap

sorry, 512 Mb of swap.

the kernel is 2.4.19 with the grsecurity patches. the problem did also
occur with 2.4.12 without the grsecurity patches.

--
Please do not CC me! Mutt (http://www.mutt.org) can handle this automatically.

.''`. martin f. krafft <[email protected]>
: :' : proud Debian developer, admin, and user
`. `'`
`- Debian - when you have better things to do than fixing a system

NOTE: The pgp.net keyservers and their mirrors are broken!
Get my key here: http://people.debian.org/~madduck/gpg/330c4a75.asc


Attachments:
(No filename) (677.00 B)
(No filename) (189.00 B)
Download all attachments

2003-01-07 07:53:18

by martin f krafft

[permalink] [raw]
Subject: Re: 'D' processes on a healthy system?

also sprach Alan Cox <[email protected]> [2002.12.19.2027 +0100]:
> Its more to do with the controller and configuration. Eg if your disk
> isnt in DMA mode it'll certainly show up

i know took the system offline to test a little more. the harddrive is
operating fine without errors. i defined runlevel 4 to be single user
mode + sshd, now the system is running 14 processes including the
kernel processes.

hdparm shows this for /dev/hda:

/dev/hda:
multcount = 16 (on)
I/O support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 2491/255/63, sectors = 40021632, start = 0
busstate = 1 (on)

correct me if i am wrong, but it is properly tweaked. moreover, lspci
shows that there is a VT82C598 [Apollo MVP3] VIA Chipset in there, and
my kernel config is optimized for that:

CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y

Nevertheless, with 14 processes running and none of them accessing
the disk, i started an rsync process over ssh for the home partition.
and performance is ridiculous. rsync will transfer about 40k before
the rsync process enters 'D' state as shown by top. this takes about
10 seconds, then rsync gets to transfer another 40k.

this is on an AMD K6-2 500 MHz machine with 160 Mb RAM, 256Mb of swap
and a Maxtor 10Gb drive spinning at 5,400 I believe.

What's the problem?

--
Please do not CC me! Mutt (http://www.mutt.org) can handle this automatically.

.''`. martin f. krafft <[email protected]>
: :' : proud Debian developer, admin, and user
`. `'`
`- Debian - when you have better things to do than fixing a system

NOTE: The pgp.net keyservers and their mirrors are broken!
Get my key here: http://people.debian.org/~madduck/gpg/330c4a75.asc


Attachments:
(No filename) (1.98 kB)
(No filename) (189.00 B)
Download all attachments

2003-01-07 12:03:38

by Alan

[permalink] [raw]
Subject: Re: 'D' processes on a healthy system?

On Tue, 2003-01-07 at 08:01, martin f krafft wrote:
>
> correct me if i am wrong, but it is properly tweaked. moreover, lspci
> shows that there is a VT82C598 [Apollo MVP3] VIA Chipset in there, and
> my kernel config is optimized for that:

Looks good yes.

hdparm -t will give you the raw disk speed

> and performance is ridiculous. rsync will transfer about 40k before
> the rsync process enters 'D' state as shown by top. this takes about
> 10 seconds, then rsync gets to transfer another 40k.
>
> this is on an AMD K6-2 500 MHz machine with 160 Mb RAM, 256Mb of swap
> and a Maxtor 10Gb drive spinning at 5,400 I believe.
>
> What's the problem?

No idea. strace the rsync see what its spending its time stuck doing
(eg read from disk, or write to net etc). It'll probably show read from
disk if this is a disk problem