2004-11-27 13:15:01

by Matti Aarnio

[permalink] [raw]
Subject: VGER had a spell of email dysfunction

For a while VGER did answer to all MAIL FROM and RCPT TO lines of
the SMTP protocol "450 I am sick.." (actual text was something else)
and thus refused to take any email from outside and asked for latter
retry. (All proper MTAs have a queue, and they use it.)


It is amazing how many noticed the slow-down of the message flood,
and asked from VGER's postmaster (or webmaster) about the thing.
Unfortunately at this time the problem prevented the question from
being delivered to us... Linus himself knew other addresses, and
got our immediate attention.


Technically: VGER's SMTP input server's one helper program had ran
out of sockets, and it didn't do correct resource rollback in all
failure situations -> it didn't heal itself.


This had started to some extent on Wednesday morning (UTC) and it went
all bad on Thursday noonish (UTC) and finally we became aware of the
thing on Friday around 18:00 UTC, analyzed the problem, and restarted
relevant subsystem. (Restart does cure some problems, this was in that
group of things.)


Accumulated queues from all over were delivered into VGER in about
10 hours, and in about the same time they were also processed
(delivery bounces and list messages, both.)



/Matti Aarnio -- one of VGER's postmasters