2007-05-07 14:56:00

by Matti Aarnio

[permalink] [raw]
Subject: gmail is a bit too popular..

In the linux-kernel -list subscribers domain popularity
analysis I got following results:

2101 gmail.com
49 googlemail.com
46 gmx.de
41 redhat.com
33 yahoo.com
23 suse.de
22 gmx.net
21 comcast.net


The gmail is so popular, that with their somewhat rudimentary
inbound MTA software this kind of recipient masses take horrible
time to feed in... Mere 0.5-0.7 seconds per recipient, but..

So far we have tried to feed all recipients in one go per
message - that is sending 2100 RCPT TO -lines in one swoop,
and the system has taken some 15-25 minutes per message to
feed it to gmail. We are running the delivery 20 streams in
parallel, so it isn't quite as bad as it sounds..

I do have one thing that gmail could enable to speed up the message
delivery (a lot!) from VGER and other list delivery sources.
That single magic needed thing is called "PIPELINING" support
at gmail's inbound MX servers. With suitably well behaving
smtpserver it is really trivial to implement, all real difficult
magic is at the sending side smtp client codes.

Once upon a time I implemented that thing for a trans-atlantic
SMTP fanout feed -- message delivery time became slashed from
hundreds of RTT delays to mere few..

/Matti Aarnio


2007-05-07 15:04:12

by John Anthony Kazos Jr.

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

> In the linux-kernel -list subscribers domain popularity
> analysis I got following results:
>
> 2101 gmail.com
> 49 googlemail.com
> 46 gmx.de
> 41 redhat.com
> 33 yahoo.com
> 23 suse.de
> 22 gmx.net
> 21 comcast.net
>
>
> The gmail is so popular, that with their somewhat rudimentary
> inbound MTA software this kind of recipient masses take horrible
> time to feed in... Mere 0.5-0.7 seconds per recipient, but..
>
> So far we have tried to feed all recipients in one go per
> message - that is sending 2100 RCPT TO -lines in one swoop,
> and the system has taken some 15-25 minutes per message to
> feed it to gmail. We are running the delivery 20 streams in
> parallel, so it isn't quite as bad as it sounds..
>
> I do have one thing that gmail could enable to speed up the message
> delivery (a lot!) from VGER and other list delivery sources.
> That single magic needed thing is called "PIPELINING" support
> at gmail's inbound MX servers. With suitably well behaving
> smtpserver it is really trivial to implement, all real difficult
> magic is at the sending side smtp client codes.
>
> Once upon a time I implemented that thing for a trans-atlantic
> SMTP fanout feed -- message delivery time became slashed from
> hundreds of RTT delays to mere few..

How about some elitism here? Dedicate a certain number of streams to
everything-except-gmail, so MTAs from the 21st century can get their mail
faster, and set the rest on gmail-only. Slows down gmail and speeds up the
rest?

2007-05-07 15:35:17

by Matti Aarnio

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

On Mon, May 07, 2007 at 11:03:54AM -0400, John Anthony Kazos Jr. wrote:
> > In the linux-kernel -list subscribers domain popularity
> > analysis I got following results:
> >
> > 2101 gmail.com
> > 49 googlemail.com
> > 46 gmx.de
....
> How about some elitism here? Dedicate a certain number of streams to
> everything-except-gmail, so MTAs from the 21st century can get their mail
> faster, and set the rest on gmail-only. Slows down gmail and speeds up the
> rest?

No need. gmail has done that slowing down all by themselves quite handily..

VGER has dedicated number of streams to gmail, and bigger pools to elsewere.
The gmail parallelism-pool is configured so that daily message volume does
usually make it thru in a day. I would prefer it going much faster...

If you are interested to see VGER's queues and monitor gauges, they are viewable
with tools at web-page:

http://vger.kernel.org/z/

Most of what it tells is not easily understandable, and much needs deep internal
system knowledge to be understood at all - but mostly the queue display is
self-explanatory.

/Matti Aarnio

PS: Contact address for VGER's postmasters is: [email protected]

2007-05-07 16:29:31

by Satyam Sharma

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

On 5/7/07, John Anthony Kazos Jr. <[email protected]> wrote:
> > In the linux-kernel -list subscribers domain popularity
> > analysis I got following results:
> >
> > 2101 gmail.com
> > 49 googlemail.com
> > 46 gmx.de
> > 41 redhat.com
> > 33 yahoo.com
> > 23 suse.de
> > 22 gmx.net
> > 21 comcast.net
> >
> >
> > The gmail is so popular, that with their somewhat rudimentary
> > inbound MTA software this kind of recipient masses take horrible
> > time to feed in... Mere 0.5-0.7 seconds per recipient, but..
> [...]
> How about some elitism here? Dedicate a certain number of streams to
> everything-except-gmail, so MTAs from the 21st century can get their mail
> faster, and set the rest on gmail-only. Slows down gmail and speeds up the
> rest?

Aargh ... what did us poor folk do to deserve this? I understand ~2000
of those Gmail users are only spectators, but most others don't have a
choice. In my case, my university doesn't allow mailboxes to go beyond
a certain limit and only a couple of weeks worth of lkml would eat
that up. And I suspect other free web mail servers that offer 3 GB of
space too would be equally slower. But I do hope someone from Google
listened to this -- half a second per recipient ... *sucks*.

2007-05-07 20:18:26

by Willy Tarreau

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

On Mon, May 07, 2007 at 05:55:55PM +0300, Matti Aarnio wrote:
> In the linux-kernel -list subscribers domain popularity
> analysis I got following results:
>
> 2101 gmail.com
> 49 googlemail.com
> 46 gmx.de
> 41 redhat.com
> 33 yahoo.com
> 23 suse.de
> 22 gmx.net
> 21 comcast.net
>
>
> The gmail is so popular, that with their somewhat rudimentary
> inbound MTA software this kind of recipient masses take horrible
> time to feed in... Mere 0.5-0.7 seconds per recipient, but..
>
> So far we have tried to feed all recipients in one go per
> message - that is sending 2100 RCPT TO -lines in one swoop,
> and the system has taken some 15-25 minutes per message to
> feed it to gmail. We are running the delivery 20 streams in
> parallel, so it isn't quite as bad as it sounds..
>
> I do have one thing that gmail could enable to speed up the message
> delivery (a lot!) from VGER and other list delivery sources.
> That single magic needed thing is called "PIPELINING" support
> at gmail's inbound MX servers. With suitably well behaving
> smtpserver it is really trivial to implement, all real difficult
> magic is at the sending side smtp client codes.

I suspect they behave like that on purpose to fight spam. If they
implemented pipelining, it wouldn't help them. Most probably they
can add whitelists to allow some known sources to reach them with
no slowdown. I hope someone at gmail will read and forward the
information so that gmail users can still be served when the load
increases. 12k mails in March and April are somewhat higher than
the usual 9-10k.

Willy

2007-05-07 20:21:51

by Martin Bligh

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

Satyam Sharma wrote:
> On 5/7/07, John Anthony Kazos Jr. <[email protected]> wrote:
>> > In the linux-kernel -list subscribers domain popularity
>> > analysis I got following results:
>> >
>> > 2101 gmail.com
>> > 49 googlemail.com
>> > 46 gmx.de
>> > 41 redhat.com
>> > 33 yahoo.com
>> > 23 suse.de
>> > 22 gmx.net
>> > 21 comcast.net
>> >
>> >
>> > The gmail is so popular, that with their somewhat rudimentary
>> > inbound MTA software this kind of recipient masses take horrible
>> > time to feed in... Mere 0.5-0.7 seconds per recipient, but..
>> [...]
>> How about some elitism here? Dedicate a certain number of streams to
>> everything-except-gmail, so MTAs from the 21st century can get their mail
>> faster, and set the rest on gmail-only. Slows down gmail and speeds up
>> the rest?
>
> Aargh ... what did us poor folk do to deserve this? I understand ~2000
> of those Gmail users are only spectators, but most others don't have a
> choice. In my case, my university doesn't allow mailboxes to go beyond
> a certain limit and only a couple of weeks worth of lkml would eat
> that up. And I suspect other free web mail servers that offer 3 GB of
> space too would be equally slower. But I do hope someone from Google
> listened to this -- half a second per recipient ... *sucks*.

I filed a bug on it, copying Matti's email. Whether it's easily fixable
or not, I have no idea.

M.

2007-05-07 20:44:53

by David Miller

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

From: Matti Aarnio <[email protected]>
Date: Mon, 7 May 2007 18:35:11 +0300

> On Mon, May 07, 2007 at 11:03:54AM -0400, John Anthony Kazos Jr. wrote:
> > > In the linux-kernel -list subscribers domain popularity
> > > analysis I got following results:
> > >
> > > 2101 gmail.com
> > > 49 googlemail.com
> > > 46 gmx.de
> ....
> > How about some elitism here? Dedicate a certain number of streams to
> > everything-except-gmail, so MTAs from the 21st century can get their mail
> > faster, and set the rest on gmail-only. Slows down gmail and speeds up the
> > rest?
>
> No need. gmail has done that slowing down all by themselves quite handily..

Another thing that plays into this is that a lot of folks
think it is funny to open up a gmail account and then
subscribe it to every vger.kernel.org mailing list in order
to fill it up and slow vger down.

I've been cracking down on such malicious subscriptions lately.

2007-05-07 21:19:18

by Matti Aarnio

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

On Mon, May 07, 2007 at 10:18:21PM +0200, Willy Tarreau wrote:
> On Mon, May 07, 2007 at 05:55:55PM +0300, Matti Aarnio wrote:
...
> > The gmail is so popular, that with their somewhat rudimentary
> > inbound MTA software this kind of recipient masses take horrible
> > time to feed in... Mere 0.5-0.7 seconds per recipient, but..
> >
> > So far we have tried to feed all recipients in one go per
> > message - that is sending 2100 RCPT TO -lines in one swoop,
> > and the system has taken some 15-25 minutes per message to
> > feed it to gmail. We are running the delivery 20 streams in
> > parallel, so it isn't quite as bad as it sounds..
> >
>
> I suspect they behave like that on purpose to fight spam.

No. VGER is on the extreme outer edge of things, only very few
legitimate systems have a need to send this much recipients for
each and every message. Some small list with perhaps 10 subscribers
at gmail notice nothing. Not even with 100 subscribers, but soon
after that the sending list sysadmin may notice something...

Comparing with spammers - message content analysis (ever so difficult
thing anyway) is _easier_ per recipient when we are sending 100
recipients for each DATA-dot body. It is 1/100:th cost per recipient
compared to "send one RCPT for each body". We could send all 2100
recipients if systems could negotiate such raised limit (there is
no standardized way, just one private ad-hoc,) and do the recipient
acceptance analysis fast enough so that PIPELINING-mode would really
gain benefits.

> If they implemented pipelining, it wouldn't help them.

It would not hurt them either. Just help us very few who
are at this extreme outer edge of things..

Lightspeed delay (ping) is about 160 ms, so there is still some
300-600 ms that the gmail system is munching somewhere per recipient..

Anyway, VGER is now tuned so that the delivery delay stays usually
under about 1 hour per message to gmail.

....
> Willy

/Matti Aarnio

2007-05-08 06:43:21

by Adrian Bunk

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

On Mon, May 07, 2007 at 06:35:11PM +0300, Matti Aarnio wrote:
> On Mon, May 07, 2007 at 11:03:54AM -0400, John Anthony Kazos Jr. wrote:
> > > In the linux-kernel -list subscribers domain popularity
> > > analysis I got following results:
> > >
> > > 2101 gmail.com
> > > 49 googlemail.com
> > > 46 gmx.de
> ....
> > How about some elitism here? Dedicate a certain number of streams to
> > everything-except-gmail, so MTAs from the 21st century can get their mail
> > faster, and set the rest on gmail-only. Slows down gmail and speeds up the
> > rest?
>
> No need. gmail has done that slowing down all by themselves quite handily..
>
> VGER has dedicated number of streams to gmail, and bigger pools to elsewere.
> The gmail parallelism-pool is configured so that daily message volume does
> usually make it thru in a day. I would prefer it going much faster...
>
> If you are interested to see VGER's queues and monitor gauges, they are viewable
> with tools at web-page:
>
> http://vger.kernel.org/z/
>
> Most of what it tells is not easily understandable, and much needs deep internal
> system knowledge to be understood at all - but mostly the queue display is
> self-explanatory.
>
> /Matti Aarnio
>
> PS: Contact address for VGER's postmasters is: [email protected]

BTW (not related to gmail):
Are there any news regarding the buggy 451 handling in zmailer I'm
reporting again and again that regularly results in every single
linux-kernel message sent to me being delayed by up to 13 hours?

TIA
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-04 10:56:01

by David Woodhouse

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

On Tue, 2007-05-08 at 08:43 +0200, Adrian Bunk wrote:
> BTW (not related to gmail):
> Are there any news regarding the buggy 451 handling in zmailer I'm
> reporting again and again that regularly results in every single
> linux-kernel message sent to me being delayed by up to 13 hours?

Er, you're giving it a 451 response to _every_ message? Why?

--
dwmw2

2007-06-04 12:10:42

by Matti Aarnio

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

On Mon, Jun 04, 2007 at 11:55:32AM +0100, David Woodhouse wrote:
> On Tue, 2007-05-08 at 08:43 +0200, Adrian Bunk wrote:
> > BTW (not related to gmail):
> > Are there any news regarding the buggy 451 handling in zmailer I'm
> > reporting again and again that regularly results in every single
> > linux-kernel message sent to me being delayed by up to 13 hours?
>
> Er, you're giving it a 451 response to _every_ message? Why?

His secondary MX server:

220 mailrelay1.lrz-muenchen.de (IntraStore TurboSendmail) ESMTP Service ready

yields at times "451-responses". (It suffers from occasional resource starvations
and only way to get out of it is to temporarily give those replies.)


Indeed there was queue timer bug at VGER in case remote replied with
TEMPFAILs sufficiently many times, the whole destination queue to it
got kicked back way too much.

> --
> dwmw2

/Matti Aarnio

2007-06-04 12:27:21

by Adrian Bunk

[permalink] [raw]
Subject: Re: gmail is a bit too popular..

On Mon, Jun 04, 2007 at 11:55:32AM +0100, David Woodhouse wrote:
> On Tue, 2007-05-08 at 08:43 +0200, Adrian Bunk wrote:
> > BTW (not related to gmail):
> > Are there any news regarding the buggy 451 handling in zmailer I'm
> > reporting again and again that regularly results in every single
> > linux-kernel message sent to me being delayed by up to 13 hours?
>
> Er, you're giving it a 451 response to _every_ message? Why?

Not me, a MTA 4 hops away on the way to me.

And not on _every_ message, it is more like "randomly one out of
100 messages gets an 451 4.3.2 system not accepting network messages".

That's a perfectly legal behaviour considering that RFC 3463 gives
"immanent shutdown, excessive load, or system maintenance" (which are
for the sending MTA equal to "random") as example causes for this
response.

The problem is really that Zmailer on vger under some circumstances
never retries to send these emails for days, on one occasion the emails
bounced after 5 days without any retry.

> dwmw2

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed