Some wild blatherings about sendmail...
- Uses lots of memory to send a big file.
Incorrect. I just verified it with a 10 meg file which became a 14 meg attachment.
Sendmail consumed an additional 5 megs combined while handling the input and output v.s.
an idle daemon. Idle is 1.8M, recv was 4.0M, send was 2.3M, no measure on the remote
side. I sent it via pine to a remote address.
- Requires high load average allowance
Incorrect. Same machine barely spiked a tenth of a point for this load and dropped
back to .05. Only time I adjusted the configured load average allowance was back in my
naive days and we got hit with 80,000 in the queue at one time from multiple spammers.
Part of this test's load came from numerous things running and the mail sending required
spinup of the drive which blocked.
- Can't send large files
Incorrect. I've used sendmail for the last seven years and sometimes sent emails
with attachments totally near 100 megs. I very frequently handle mails through this
queue that are larger than 1M.
- Qmail's time has come v.s. sendmail
I strongly disagree, I'm not the richest person or company so I frequently run most
things on one box. Sendmail handles perfectly fine. If your setup is hitting the sky
with load average or failing to send mails, you have a site setup problem.
- New sendmails have problems talking to old sendmails.
Not since I discovered the problem was ECN and not sendmail.
I noticed in the original email that the system was stuck in D state on the
/sbin/modprobe for 11 nwfs attempts. Did nobody notice this? (load average goes up due
to IO bound procs; D state) I have a lot of options enabled w/ all of my sendmail
setups, and most of them include patches to use SQL for the tables which requires even
more daemons to be active.
In short, sendmail does just fine on linux. If it's not doing just fine, there's static
in the headset. There isn't any TCP/IP issue or we would have heard a whole lot more
screeching.
End.
-d
--
"The difference between 'involvement' and 'commitment' is like an
eggs-and-ham breakfast: the chicken was 'involved' - the pig was
'committed'."
David Ford wrote:
David,
We got to the bottom of it. sendmail is using a BSD method to react to
load average which is different than what linux is providing. You have
to crank up
O QueueLA = 18
O RefuseLA = 12
on a busy Linux server since the defaults will result in large emails
never getting sent.
Jeff
To be honest Jeff, most of my sendmail systems have default load values
and large (read created by microsoft mua) emails make it through
constantly with no distinguishable delays. I just launched 45 "cat
core|mail [email protected]" and core is a 10 meg binary file. It
results in a 14 meg total message size.
The load spiked to .75 and dropped back to .45 while launching. I started
them two minutes ago and they are all in client DATA phase with the remote
MTA at the moment. I only have 30K/s upstream.
At present the load is .10 and the net is hopping. This isn't a power box
and the rest of the system is running as well.
My guess is that the system reporting the problem has an elevated load
average from those 11 modprobes stuck in D state.
I manage servers that transport hundreds of thousands of emails daily and
their load is minimal. They handle large messages fine. The only
defaults I've really had to change are the max children and some of the
timing simply because I want stalled connections (read routing loss) to
requeue quickly.
-d
"Jeff V. Merkey" wrote:
> David Ford wrote:
>
> David,
>
> We got to the bottom of it. sendmail is using a BSD method to react to
> load average which is different than what linux is providing. You have
> to crank up
>
> O QueueLA = 18
> O RefuseLA = 12
>
> on a busy Linux server since the defaults will result in large emails
> never getting sent.
>
> Jeff
--
"The difference between 'involvement' and 'commitment' is like an
eggs-and-ham breakfast: the chicken was 'involved' - the pig was
'committed'."
Followup to: <[email protected]>
By author: David Ford <[email protected]>
In newsgroup: linux.dev.kernel
>
> - Requires high load average allowance
> Incorrect. Same machine barely spiked a tenth of a point for this load and dropped
> back to .05. Only time I adjusted the configured load average allowance was back in my
> naive days and we got hit with 80,000 in the queue at one time from multiple spammers.
> Part of this test's load came from numerous things running and the mail sending required
> spinup of the drive which blocked.
>
Well, I think it does, but not because it itself is generating much of
a load. I had it block traffic on my desktop machine while doing a
kernel compile; I run with high parallelism and the load occationally
spikes in the high 20's. However, the machine is perfectly
responsive, and so I was a little taken back by this.
The way Linux computes the load average really does call for higher
limits than what BSD does. This isn't inherently a "good" or "bad"
thing -- it's just a fact of life. That being said, it probably would
be useful if the Sendmail people would provide higher default limits
in cf/ostype/linux.m4 than for other systems.
The one thing about load average that is making it a bit hard to deal
with is that workloads on modern machines tend to vary a little too
quickly for the standard load average time constants to deal well with
them. It's probably fine for throttling down a machine that is
getting killed with requests, but not really enough to keep, say,
parallel make without a limit ("make -j" as opposed to "make -j5")
from forking the machine to the point where the make itself fails
before knowing what just hit it.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
> They're not modprobes, they're misnamed processes sleeping from NWFS.
If they're sleeping, why are they in D state? That ups the load average.
> I got the fix from someone so now they display their proper names.
> top displays the names correctly, ps does not. Several people have
> verified this problem, and all you are saying is that your servers
> are never heavily loaded for long periods of time, say 200 hours
> at a stretch of consatnt ftp traffic?
If I had a normally expected constant load average that came very close to the
sendmail configured limit, I would increase the limit. That's just common
sense for an admin. Sendmail in itself doesn't affect the load average any
more than any daemon does.
If your normal operating load is significant, and your configured limits are
close to that, you have to expect sendmail to throttle back. It doesn't pick
large emails as it's victims, everything gets throttled.
I would suspect that if you are near the limit then your disk is blocking
causing a load spike which is detected by sendmail so sendmail throttles back.
In my experience, Linux (and others) can be very sluggish at a load of 2 and at
another time be quite responsive w/ a load of 200. An admin should configure
limits based on that machine's load history, not any given default number.
I've run sendmail for a lot of years at a lot of places. I've never seen this
'large emails aren't sent' issue that people have verified. The only reason I
find valid is if the machine hovers near the limit and disk io causes the
spike. That isn't sendmail's fault, it's a configuration fault.
-d
--
"The difference between 'involvement' and 'commitment' is like an
eggs-and-ham breakfast: the chicken was 'involved' - the pig was
'committed'."
"Jeff V. Merkey" wrote:
>
> They're not modprobes, they're misnamed processes sleeping from NWFS.
> I got the fix from someone so now they display their proper names.
> top displays the names correctly, ps does not. Several people have
> verified this problem, and all you are saying is that your servers
> are never heavily loaded for long periods of time, say 200 hours
> at a stretch of consatnt ftp traffic?
Kernel threads? Do this:
strcpy(current->comm, "threadname"); /* 16 char array!! */
current->mm->arg_start = current->mm->arg_end = 0; /* black magic */
and `ps' should be happy.
Andrew Morton wrote:
>
> "Jeff V. Merkey" wrote:
> >
> > They're not modprobes, they're misnamed processes sleeping from NWFS.
> > I got the fix from someone so now they display their proper names.
> > top displays the names correctly, ps does not. Several people have
> > verified this problem, and all you are saying is that your servers
> > are never heavily loaded for long periods of time, say 200 hours
> > at a stretch of consatnt ftp traffic?
>
> Kernel threads? Do this:
>
> strcpy(current->comm, "threadname"); /* 16 char array!! */
> current->mm->arg_start = current->mm->arg_end = 0; /* black magic */
>
> and `ps' should be happy.
Even better, use sched.c:daemonize().