Hey guys,
We got to the bottom of the sendmail problem. The line:
-O QueueLA=20
and
-O RefuseLA=18
Need to be cranked up in sendmail.cf to something high since the
background VM on a very busy Linux box seems to exceed this which causes
large emails to get stuck in the /var/spool/mqueue directory for long
periods of time. Since vger is getting hammered with FTP all the time,
and is rarely idle. This also explains what Richard was seeing with VM
thrashing in a box with low memory.
The problem of dropping connections on 2.4 was related to the O RefuseLA
settings. The defaults in the RedHat, Suse, and OpenLinux RPMs are
clearly set too low for modern Linux kernels. You may want them cranked
up to 100 or something if you want sendmail to always work.
Jeff
"Jeff V. Merkey" wrote:
>
> Claus,
>
> This is a bug. emails should not get stuck in the mail queue because
> your load averaging routine doesn't work right. If this is so, then why
> do some emails (small ones) get through and big ones do not,
> irreguardless of delivery order. If it were a loading problem one would
> think emails would still get processed in the order they arrived, not
> some arbitrary "order from hell" which is what was happening. This is
> severely broken IMHO and you need to fix it.
>
> Jeff
>
> Claus Assmann wrote:
> >
> > All of these entries have an 'X':
> >
> > > Mail Queue (11 requests)
> > > --Q-ID-- --Size-- -Priority- ---Q-Time--- -----------Sender/Recipient-----------
> > > FAA15716X 31418 200564 Nov 9 05:01 <[email protected]>
> > > 7BIT
> > > <[email protected]>
> > > <[email protected]>
> > > FAA20318X 32693 201751 Nov 10 05:29 <[email protected]>
> > > 7BIT
> > > <[email protected]>
> > > <[email protected]>
> > > SAA01998X 34484 203865 Nov 6 18:20 <[email protected]>
> > > 7BIT
> > > <[email protected]>
> > > <[email protected]>
> > > QAA01341X 65091 204150 Nov 6 16:50 <[email protected]>
> > > 7BIT
> > > <[email protected]>
> > > SAA13390X 41368 210478 Nov 8 18:03 <[email protected]>
> > > 7BIT
> > > <[email protected]>
> > > <[email protected]>
> > > LAA03425X 158115 218595 Nov 6 11:27 <[email protected]>
> > > <[email protected]>
> > > <[email protected]>
> > > QAA01343X 65091 234150 Nov 6 16:50 <[email protected]>
> > > 7BIT
> > > <[email protected]>
> > > <[email protected]>
> > > KAA21225X 205041 235799 Nov 10 10:26 <[email protected]>
> > > 8BITMIME
> > > <[email protected]>
> > > FAA20229X 1457 272283+Nov 10 05:01 <[email protected]>
> > > (Warning: could not send message for past 1 hour)
> > > <[email protected]>
> > > QAA06681X 242511 272929 Nov 7 16:18 <[email protected]>
> > > 8BITMIME
> > > <[email protected]>
> > > PAA12261X 576306 606701 Nov 8 15:06 <[email protected]>
> > > <[email protected]>
> >
> > That is, the load on your machine is too high.
> > 3:27pm up 29 min, 2 users, load average: 10.00, 9.97, 8.50
> >
> > It seems as if this is broken, top shows 2 running processes
> > and 67 sleeping.
> >
> > If you run the queue with -O QueueLA=20 the entries are processed.
> > So you have to change your configuration to deal with the "high"
> > load, which I did right now by editing your .cf file.
how many CPUs in these high loadave boxes? unless you have a very
impressive machine (8+SMP) the defaults should be plenty high.
also I thought the QueueLA default was 8 and the RefuseLA was 12 or have
they been bumped up since I last examined them (8.8/8.9 timeframes)
David Lang
On Fri, 10 Nov 2000, Jeff V. Merkey wrote:
> Date: Fri, 10 Nov 2000 14:52:01 -0700
> From: Jeff V. Merkey <[email protected]>
> To: [email protected], [email protected]
> Subject: Re: sendmail fails to deliver mail with attachments in
> /var/spool/mqueue
>
>
>
> Hey guys,
>
> We got to the bottom of the sendmail problem. The line:
>
> -O QueueLA=20
>
> and
>
> -O RefuseLA=18
>
> Need to be cranked up in sendmail.cf to something high since the
> background VM on a very busy Linux box seems to exceed this which causes
> large emails to get stuck in the /var/spool/mqueue directory for long
> periods of time. Since vger is getting hammered with FTP all the time,
> and is rarely idle. This also explains what Richard was seeing with VM
> thrashing in a box with low memory.
>
> The problem of dropping connections on 2.4 was related to the O RefuseLA
> settings. The defaults in the RedHat, Suse, and OpenLinux RPMs are
> clearly set too low for modern Linux kernels. You may want them cranked
> up to 100 or something if you want sendmail to always work.
>
> Jeff
>
> "Jeff V. Merkey" wrote:
> >
> > Claus,
> >
> > This is a bug. emails should not get stuck in the mail queue because
> > your load averaging routine doesn't work right. If this is so, then why
> > do some emails (small ones) get through and big ones do not,
> > irreguardless of delivery order. If it were a loading problem one would
> > think emails would still get processed in the order they arrived, not
> > some arbitrary "order from hell" which is what was happening. This is
> > severely broken IMHO and you need to fix it.
> >
> > Jeff
> >
> > Claus Assmann wrote:
> > >
> > > All of these entries have an 'X':
> > >
> > > > Mail Queue (11 requests)
> > > > --Q-ID-- --Size-- -Priority- ---Q-Time--- -----------Sender/Recipient-----------
> > > > FAA15716X 31418 200564 Nov 9 05:01 <[email protected]>
> > > > 7BIT
> > > > <[email protected]>
> > > > <[email protected]>
> > > > FAA20318X 32693 201751 Nov 10 05:29 <[email protected]>
> > > > 7BIT
> > > > <[email protected]>
> > > > <[email protected]>
> > > > SAA01998X 34484 203865 Nov 6 18:20 <[email protected]>
> > > > 7BIT
> > > > <[email protected]>
> > > > <[email protected]>
> > > > QAA01341X 65091 204150 Nov 6 16:50 <[email protected]>
> > > > 7BIT
> > > > <[email protected]>
> > > > SAA13390X 41368 210478 Nov 8 18:03 <[email protected]>
> > > > 7BIT
> > > > <[email protected]>
> > > > <[email protected]>
> > > > LAA03425X 158115 218595 Nov 6 11:27 <[email protected]>
> > > > <[email protected]>
> > > > <[email protected]>
> > > > QAA01343X 65091 234150 Nov 6 16:50 <[email protected]>
> > > > 7BIT
> > > > <[email protected]>
> > > > <[email protected]>
> > > > KAA21225X 205041 235799 Nov 10 10:26 <[email protected]>
> > > > 8BITMIME
> > > > <[email protected]>
> > > > FAA20229X 1457 272283+Nov 10 05:01 <[email protected]>
> > > > (Warning: could not send message for past 1 hour)
> > > > <[email protected]>
> > > > QAA06681X 242511 272929 Nov 7 16:18 <[email protected]>
> > > > 8BITMIME
> > > > <[email protected]>
> > > > PAA12261X 576306 606701 Nov 8 15:06 <[email protected]>
> > > > <[email protected]>
> > >
> > > That is, the load on your machine is too high.
> > > 3:27pm up 29 min, 2 users, load average: 10.00, 9.97, 8.50
> > >
> > > It seems as if this is broken, top shows 2 running processes
> > > and 67 sleeping.
> > >
> > > If you run the queue with -O QueueLA=20 the entries are processed.
> > > So you have to change your configuration to deal with the "high"
> > > load, which I did right now by editing your .cf file.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>
David Lang wrote:
>
> how many CPUs in these high loadave boxes? unless you have a very
> impressive machine (8+SMP) the defaults should be plenty high.
>
> also I thought the QueueLA default was 8 and the RefuseLA was 12 or have
> they been bumped up since I last examined them (8.8/8.9 timeframes)
I think this may be related to VM activity. I looked at /proc/meminfo
and the sendmail sickness seems directly related to heavy VM activity in
the box. This may be one for Rik/Linus. I'm just trying to get
Ute-NWFS out the door and want stuff to work.
:-)
Jeff
>
> David Lang
>
> On Fri, 10 Nov 2000, Jeff V. Merkey wrote:
>
> > Date: Fri, 10 Nov 2000 14:52:01 -0700
> > From: Jeff V. Merkey <[email protected]>
> > To: [email protected], [email protected]
> > Subject: Re: sendmail fails to deliver mail with attachments in
> > /var/spool/mqueue
> >
> >
> >
> > Hey guys,
> >
> > We got to the bottom of the sendmail problem. The line:
> >
> > -O QueueLA=20
> >
> > and
> >
> > -O RefuseLA=18
> >
> > Need to be cranked up in sendmail.cf to something high since the
> > background VM on a very busy Linux box seems to exceed this which causes
> > large emails to get stuck in the /var/spool/mqueue directory for long
> > periods of time. Since vger is getting hammered with FTP all the time,
> > and is rarely idle. This also explains what Richard was seeing with VM
> > thrashing in a box with low memory.
> >
> > The problem of dropping connections on 2.4 was related to the O RefuseLA
> > settings. The defaults in the RedHat, Suse, and OpenLinux RPMs are
> > clearly set too low for modern Linux kernels. You may want them cranked
> > up to 100 or something if you want sendmail to always work.
> >
> > Jeff
> >
> > "Jeff V. Merkey" wrote:
> > >
> > > Claus,
> > >
> > > This is a bug. emails should not get stuck in the mail queue because
> > > your load averaging routine doesn't work right. If this is so, then why
> > > do some emails (small ones) get through and big ones do not,
> > > irreguardless of delivery order. If it were a loading problem one would
> > > think emails would still get processed in the order they arrived, not
> > > some arbitrary "order from hell" which is what was happening. This is
> > > severely broken IMHO and you need to fix it.
> > >
> > > Jeff
> > >
> > > Claus Assmann wrote:
> > > >
> > > > All of these entries have an 'X':
> > > >
> > > > > Mail Queue (11 requests)
> > > > > --Q-ID-- --Size-- -Priority- ---Q-Time--- -----------Sender/Recipient-----------
> > > > > FAA15716X 31418 200564 Nov 9 05:01 <[email protected]>
> > > > > 7BIT
> > > > > <[email protected]>
> > > > > <[email protected]>
> > > > > FAA20318X 32693 201751 Nov 10 05:29 <[email protected]>
> > > > > 7BIT
> > > > > <[email protected]>
> > > > > <[email protected]>
> > > > > SAA01998X 34484 203865 Nov 6 18:20 <[email protected]>
> > > > > 7BIT
> > > > > <[email protected]>
> > > > > <[email protected]>
> > > > > QAA01341X 65091 204150 Nov 6 16:50 <[email protected]>
> > > > > 7BIT
> > > > > <[email protected]>
> > > > > SAA13390X 41368 210478 Nov 8 18:03 <[email protected]>
> > > > > 7BIT
> > > > > <[email protected]>
> > > > > <[email protected]>
> > > > > LAA03425X 158115 218595 Nov 6 11:27 <[email protected]>
> > > > > <[email protected]>
> > > > > <[email protected]>
> > > > > QAA01343X 65091 234150 Nov 6 16:50 <[email protected]>
> > > > > 7BIT
> > > > > <[email protected]>
> > > > > <[email protected]>
> > > > > KAA21225X 205041 235799 Nov 10 10:26 <[email protected]>
> > > > > 8BITMIME
> > > > > <[email protected]>
> > > > > FAA20229X 1457 272283+Nov 10 05:01 <[email protected]>
> > > > > (Warning: could not send message for past 1 hour)
> > > > > <[email protected]>
> > > > > QAA06681X 242511 272929 Nov 7 16:18 <[email protected]>
> > > > > 8BITMIME
> > > > > <[email protected]>
> > > > > PAA12261X 576306 606701 Nov 8 15:06 <[email protected]>
> > > > > <[email protected]>
> > > >
> > > > That is, the load on your machine is too high.
> > > > 3:27pm up 29 min, 2 users, load average: 10.00, 9.97, 8.50
> > > >
> > > > It seems as if this is broken, top shows 2 running processes
> > > > and 67 sleeping.
> > > >
> > > > If you run the queue with -O QueueLA=20 the entries are processed.
> > > > So you have to change your configuration to deal with the "high"
> > > > load, which I did right now by editing your .cf file.
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
On Fri, Nov 10, 2000, David Lang wrote:
> how many CPUs in these high loadave boxes? unless you have a very
> impressive machine (8+SMP) the defaults should be plenty high.
>
> also I thought the QueueLA default was 8 and the RefuseLA was 12 or have
> they been bumped up since I last examined them (8.8/8.9 timeframes)
Those are the defaults. Jeff quoted the values from the .cf file
I edited on his machine to get the e-mails through.
> > We got to the bottom of the sendmail problem. The line:
> >
> > -O QueueLA=20
> >
> > and
> >
> > -O RefuseLA=18
Why does Linux report a LA of 10 if there are only two processes
running?
Followup to: <[email protected]>
By author: Claus Assmann <[email protected]>
In newsgroup: linux.dev.kernel
>
> Why does Linux report a LA of 10 if there are only two processes
> running?
>
Load Average = runnable processes (R) + processes in disk wait (D).
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
H. Peter Anvin wrote:
> Followup to: <[email protected]>
> By author: Claus Assmann <[email protected]>
> In newsgroup: linux.dev.kernel
> >
> > Why does Linux report a LA of 10 if there are only two processes
> > running?
> >
>
> Load Average = runnable processes (R) + processes in disk wait (D).
Keep in mind that on some operating systems, sometimes processes
become STUCK in "short disk wait". That may mean that if you just
discard those processes (they won't do any useful work until you
reboot the system), you will see the load average one point higher
than what should be expected.
This is almost always a bug somewhere.
So, if you're not actually loading the machine with 12 processes doing
disk IO, and still seeing a load of 12, chances are that there are
processes stuck in the (D) state. That's a bug. Report the bug.
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* Common sense is the collection of *
****** prejudices acquired by age eighteen. -- Albert Einstein ********
Claus Assmann:
> Why does Linux report a LA of 10 if there are only two processes
> running?
[This goes out of subject]
I have learned that load avarage means
"Processes on run queue" +
"process waiting disk (or short-term) I/O"
That was before Linux times.
I have seen a workstation go to show load-average 100.
That happened when NFS-server (or network) died. These
workstations were diskless, so all processes ended to
waiting of "disk" I/O.
These were Sun's diskless workstation models.
So it is not new that load average includes something else than
processes waiting for CPU.
/ Kari Hurtta
(That was on Computer Science department of University of Helsinki.)