2001-10-14 00:23:05

by Mika Liljeberg

[permalink] [raw]
Subject: TCP acking too fast

Hi all,

It seems that recent (and maybe not so recent) linux kernels have a TCP
problem that causes them to acknowledge almost every segment. While,
strictly speaking, this is not against the spec, any sane TCP only acks
every second segment in steady state. The statistics appended below
illustrate the problem.

Why do I care? Because I'm connected through a cable network with a
severe bandwidth asymmetry; the upstream is rate limited to 256 kbps,
while the downstream can theoretically yield 10 Mbps (assuming a quiet
period). Right now, the excessive ack rate seems to be a limiting factor
on peak performance.

I've already disabled quickacks, replaced the receive MSS estimate with
advertised MSS in the ack sending policy (two places), and removed one
dubious "immediate ack" condition from send_delay_ack(). The annoying
thing is that none of this seem to make any real difference. I must be
missing something huge that's right in front of my nose, but I'm
starting to run out of steam.

Any thoughts on this?

Regards,

MikaL

Some stats from a unidirectional 6MB transfer:

c->d: d->c:
total packets: 3643 total packets: 4498
ack pkts sent: 3642 ack pkts sent: 4498
pure acks sent: 3640 pure acks sent: 2
unique bytes sent: 108 unique bytes sent: 6161570
actual data pkts: 1 actual data pkts: 4494
actual data bytes: 108 actual data bytes: 6161570
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data bytes: 0 rexmt data bytes: 0
outoforder pkts: 0 outoforder pkts: 10
pushed data pkts: 1 pushed data pkts: 3043
SYN/FIN pkts sent: 1/1 SYN/FIN pkts sent: 1/1
req 1323 ws/ts: Y/Y req 1323 ws/ts: Y/Y
adv wind scale: 0 adv wind scale: 0
req sack: Y req sack: Y
sacks sent: 28 sacks sent: 0
mss requested: 1460 bytes mss requested: 1460
bytes
max segm size: 108 bytes max segm size: 1448
bytes
min segm size: 108 bytes min segm size: 92
bytes
avg segm size: 107 bytes avg segm size: 1371
bytes
max win adv: 63712 bytes max win adv: 32120
bytes
min win adv: 5840 bytes min win adv: 32120
bytes
zero win adv: 0 times zero win adv: 0
times
avg win adv: 63515 bytes avg win adv: 32120
bytes
initial window: 108 bytes initial window: 489
bytes
initial window: 1 pkts initial window: 1 pkts
ttl stream length: 108 bytes ttl stream length: 6161570
bytes
missed data: 0 bytes missed data: 0
bytes
truncated data: 46 bytes truncated data: 5882942
bytes
truncated packets: 1 pkts truncated packets: 4494 pkts
data xmit time: 0.000 secs data xmit time: 10.663 secs
idletime max: 98.1 ms idletime max: 98.0 ms
throughput: 10 Bps throughput: 572079 Bps


2001-10-14 06:40:09

by David Miller

[permalink] [raw]
Subject: Re: TCP acking too fast


You need to post for us a tcpdump trace of a connection you feel
exhibits bad behavior.

Otherwise we can do nothing but guess, effectively your statistics
aren't helpful at all if we have no idea what is happening on the
wire.

Franks a lot,
David S. Miller
[email protected]

2001-10-14 07:05:26

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

"David S. Miller" wrote:
>
> You need to post for us a tcpdump trace of a connection you feel
> exhibits bad behavior.
>
> Otherwise we can do nothing but guess, effectively your statistics
> aren't helpful at all if we have no idea what is happening on the
> wire.

Fair enough, chalk it down to lack of sleep addling my brain. I also
forgot to mention my kernel version, which is 2.4.10-ac10.

I've attached a fragment of tcpdump output from the middle of steady
state transfer. Looking at the dump, it seems that most arriving
segments have the PSH bit set. This leads me to believe that the
transfer is mostly application limited at the sender side.

For some reason, this causes the receiver to ack every segment
immediately (which is not suggested by the spec as far as I know). I'm
guessing that this is some kind of optimization for HTTP (i.e., avoid
Nagle on the last runt segment by acking pushed segments immediately).
However, this seems to produce less than desirable behaviour on sender
limited bulk transfers.

However, despite appearances, I can't seem to find the bit of code that
tests the PSH flag for immediate ack. Still sleepy, I guess.

Regards,

MikaL


Attachments:
tcpdump.txt.gz (7.73 kB)

2001-10-14 07:47:42

by David Miller

[permalink] [raw]
Subject: Re: TCP acking too fast

From: Mika Liljeberg <[email protected]>
Date: Sun, 14 Oct 2001 10:05:33 +0300

I've attached a fragment of tcpdump output from the middle of steady
state transfer. Looking at the dump, it seems that most arriving
segments have the PSH bit set. This leads me to believe that the
transfer is mostly application limited at the sender side.

This means the application is doing many small writes. To be honest,
to only sure way to cure any performance problems from that is to
fix the application in question. What is this application?

Franks a lot,
David S. Miller
[email protected]

2001-10-14 07:50:32

by David Miller

[permalink] [raw]
Subject: Re: TCP acking too fast

From: Mika Liljeberg <[email protected]>
Date: Sun, 14 Oct 2001 10:05:33 +0300

Looking at the dump, it seems that most arriving
segments have the PSH bit set.

I know you said what is running on the receiver, but do
you have any clue what is running on the sender? It looks
_really_ broken.

The transfer looks like a bulk one but every segment (as you have
stated) has PSH set, which is completely stupid.

At least, I can guarentee you that the sender is not Linux. Or,
if it is Linux, it is running a really broken implementation of
a web server. :-)

Franks a lot,
David S. Miller
[email protected]

2001-10-14 07:51:53

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

"David S. Miller" wrote:
>
> From: Mika Liljeberg <[email protected]>
> Date: Sun, 14 Oct 2001 10:05:33 +0300
>
> I've attached a fragment of tcpdump output from the middle of steady
> state transfer. Looking at the dump, it seems that most arriving
> segments have the PSH bit set. This leads me to believe that the
> transfer is mostly application limited at the sender side.
>
> This means the application is doing many small writes.

Nope, it simply means that the remote machine has a 100 Mbit Ethernet
card that keeps emptying the transmit queue faster than it can be
filled.

> To be honest,
> to only sure way to cure any performance problems from that is to
> fix the application in question. What is this application?

I don't control the remote machine, but it's linux (don't know which
version). I tried with both HTTP (Apache 1.3.9) and FTP. I doubt it's
the application. :-)

Regards,

MikaL

2001-10-14 07:53:15

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

"David S. Miller" wrote:
>
> From: Mika Liljeberg <[email protected]>
> Date: Sun, 14 Oct 2001 10:05:33 +0300
>
> Looking at the dump, it seems that most arriving
> segments have the PSH bit set.
>
> I know you said what is running on the receiver, but do
> you have any clue what is running on the sender? It looks
> _really_ broken.
>
> The transfer looks like a bulk one but every segment (as you have
> stated) has PSH set, which is completely stupid.
>
> At least, I can guarentee you that the sender is not Linux. Or,
> if it is Linux, it is running a really broken implementation of
> a web server. :-)

I've got a feeling you're going to rue saying that (see my other email).
;-)

Regards,

MikaL

2001-10-14 08:12:38

by David Miller

[permalink] [raw]
Subject: Re: TCP acking too fast

From: Mika Liljeberg <[email protected]>
Date: Sun, 14 Oct 2001 10:51:56 +0300

I don't control the remote machine, but it's linux (don't know which
version). I tried with both HTTP (Apache 1.3.9) and FTP. I doubt it's
the application. :-)

Well, the version of the kernel is pretty important.
Setting PSH all the time does sound like a possibly familiar bug.

Franks a lot,
David S. Miller
[email protected]

2001-10-14 08:39:10

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

"David S. Miller" wrote:
> I don't control the remote machine, but it's linux (don't know which
> version). I tried with both HTTP (Apache 1.3.9) and FTP. I doubt it's
> the application. :-)
>
> Well, the version of the kernel is pretty important.

Unfortunately I have no way to ascertain that, but I do know it's
running Debian. I would venture a guess that it's a series 2.2 kernel. I
tried a nmap fingerprint, but it couldn't identify the kernel.

> Setting PSH all the time does sound like a possibly familiar bug.

You have no problem with the reiceiver immediately acking PSH segments?
Shouldn't we be robust against this kind of behaviour? [Otherwise a
sender can force us into a permanent quickack mode simply by setting PSH
on every segment.]

Regards,

MikaL

2001-10-14 09:03:22

by David Miller

[permalink] [raw]
Subject: Re: TCP acking too fast

From: Mika Liljeberg <[email protected]>
Date: Sun, 14 Oct 2001 11:39:22 +0300

[Otherwise a sender can force us into a permanent quickack mode
simply by setting PSH on every segment.]

"A sending TCP can send us garbage so bad that it hinders
performance."

So, your point is? :-) A sensible sending application, and a sensible
TCP should not being setting PSH every single segment. And we're not
coding up hacks to make the Linux receiver handle this case better.
You'll have much better luck convincing us to implement ECN black hole
workarounds :-)

Franks a lot,
David S. Miller
[email protected]

2001-10-14 09:15:17

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

"David S. Miller" wrote:
>
> From: Mika Liljeberg <[email protected]>
> Date: Sun, 14 Oct 2001 11:39:22 +0300
>
> [Otherwise a sender can force us into a permanent quickack mode
> simply by setting PSH on every segment.]
>
> "A sending TCP can send us garbage so bad that it hinders
> performance."
>
> So, your point is? :-) A sensible sending application, and a sensible
> TCP should not being setting PSH every single segment.

Like apache and linux? :-)

> And we're not
> coding up hacks to make the Linux receiver handle this case better.

By the same logic we could throw away Nagle and SWS avoidance! Whatever
happened to "be conservative in what you send" (i.e. acks, in this
case)?

Frankly, I see no reason for acking PSH segments immediately. What's the
rationale for doing so? Looks like a hack to me...

I don't mean to be a pest, but it would be nice to get some technical
grounds for this behavour, since you're obviously convinced that there
are some. Please?

> You'll have much better luck convincing us to implement ECN black hole
> workarounds :-)

Oh, no. I'm not going to be dragged into that discussion! :) [Do we have
such workarounds for PMTUD detection, I wonder...]

Cheers,

MikaL

2001-10-14 09:16:07

by David Miller

[permalink] [raw]
Subject: Re: TCP acking too fast

From: Mika Liljeberg <[email protected]>
Date: Sun, 14 Oct 2001 12:15:24 +0300

Like apache and linux? :-)

"BROKEN LINUX" I suspect it's just a buggy 2.2.x that machine has.

Franks a lot,
David S. Miller
[email protected]

2001-10-14 09:24:58

by Andi Kleen

[permalink] [raw]
Subject: Re: TCP acking too fast

In article <[email protected]>,
"David S. Miller" <[email protected]> writes:

> So, your point is? :-) A sensible sending application, and a sensible
> TCP should not being setting PSH every single segment. And we're not
> coding up hacks to make the Linux receiver handle this case better.
> You'll have much better luck convincing us to implement ECN black hole
> workarounds :-)

Ignoring PSH completely on RX would probably not be a worse heuristic
than forcing an ACK on it. At least other stacks seem to do fine too
without the force-ack-on-psh. I think you added it a long time ago, but
I do not remember why you did it; but at least here is an counter example
now that may be a good case for a reconsider.

-Andi

2001-10-14 09:39:43

by David Miller

[permalink] [raw]
Subject: Re: TCP acking too fast

From: Andi Kleen <[email protected]>
Date: 14 Oct 2001 11:25:09 +0200

but at least here is an counter example
now that may be a good case for a reconsider.

A buggy 2.2.x kernel is not a good case counter example.

Franks a lot,
David S. Miller
[email protected]

2001-10-14 11:29:35

by Andi Kleen

[permalink] [raw]
Subject: Re: TCP acking too fast

On Sun, Oct 14, 2001 at 11:39:48AM +0200, David S. Miller wrote:
> From: Andi Kleen <[email protected]>
> Date: 14 Oct 2001 11:25:09 +0200
>
> but at least here is an counter example
> now that may be a good case for a reconsider.
>
> A buggy 2.2.x kernel is not a good case counter example.

I just checked and the 2.4 kernel doesn't have the PSH quickack check
anymore, so it cannot be the cause. The original poster didn't which
kernel version he used, but he said "recent"; so I'll assume 2.4
The only special case for PSH in RX left I can is in rcv_mss estimation,
where is assumes that a packet with PSH set is not full sized. On further
look the 2.4 tcp_measure_rcv_mss will never update rcv_mss for packets
which do have PSH set and in this case cause random ack behaviour depending
on the initial rcv_mss guess.
Not very nice; definitely violates the "be conservative what you accept"
rule. I'm not sure how to fix it, adding a fallback to every-two-packet-add
would pollute the fast path a bit.


-Andi

2001-10-14 11:50:10

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

Andi Kleen wrote:
> The only special case for PSH in RX left I can is in rcv_mss estimation,
> where is assumes that a packet with PSH set is not full sized.

A packet without PSH should be full size. Assuming the sender implemets
SWS avoidance correctly, this should be a safe enough assumption.

> On further
> look the 2.4 tcp_measure_rcv_mss will never update rcv_mss for packets
> which do have PSH set and in this case cause random ack behaviour depending
> on the initial rcv_mss guess.
> Not very nice; definitely violates the "be conservative what you accept"
> rule. I'm not sure how to fix it, adding a fallback to every-two-packet-add
> would pollute the fast path a bit.

You're right. As far as I can see, it's not necessary to set the
TCP_ACK_PUSHED flag at all (except maybe for SYN-ACK). I'm just writing
a patch to clean this up.

> -Andi

Regards,

MikaL

2001-10-14 13:15:04

by Mika Liljeberg

[permalink] [raw]
Subject: [PATCH] TCP acking too fast

--- tcp_input.c.org Sat Oct 13 23:24:38 2001
+++ tcp_input.c Sun Oct 14 15:47:10 2001
@@ -126,24 +126,25 @@
* sends good full-sized frames.
*/
len = skb->len;
+
if (len >= tp->ack.rcv_mss) {
tp->ack.rcv_mss = len;
- /* Dubious? Rather, it is final cut. 8) */
- if (tcp_flag_word(skb->h.th)&TCP_REMNANT)
- tp->ack.pending |= TCP_ACK_PUSHED;
} else {
- /* Otherwise, we make more careful check taking into account,
- * that SACKs block is variable.
+ /* If PSH is not set, packet should be full sized, assuming
+ * that the peer implements Nagle correctly.
+ * This observation (if it is correct 8)) allows
+ * to handle super-low mtu links fairly.
*
- * "len" is invariant segment length, including TCP header.
+ * However, If sender sets TCP_NODELAY, this could effectively
+ * turn receiver side SWS algorithms off. TCP_MIN_MSS guards
+ * against a ridiculously small rcv_mss estimate.
+ *
+ * We also have to be careful checking the header size, since
+ * the SACK option is variable length. "len" is the invariant
+ * segment length, including TCP header.
*/
len += skb->data - skb->h.raw;
if (len >= TCP_MIN_RCVMSS + sizeof(struct tcphdr) ||
- /* If PSH is not set, packet should be
- * full sized, provided peer TCP is not badly broken.
- * This observation (if it is correct 8)) allows
- * to handle super-low mtu links fairly.
- */
(len >= TCP_MIN_MSS + sizeof(struct tcphdr) &&
!(tcp_flag_word(skb->h.th)&TCP_REMNANT))) {
/* Subtract also invariant (if peer is RFC compliant),
@@ -152,12 +153,9 @@
*/
len -= tp->tcp_header_len;
tp->ack.last_seg_size = len;
- if (len == lss) {
+ if (len == lss)
tp->ack.rcv_mss = len;
- return;
- }
}
- tp->ack.pending |= TCP_ACK_PUSHED;
}
}


Attachments:
over_ack.patch (1.78 kB)

2001-10-14 14:04:30

by Andi Kleen

[permalink] [raw]
Subject: Re: TCP acking too fast

On Sun, Oct 14, 2001 at 01:49:25PM +0200, Mika Liljeberg wrote:
> Andi Kleen wrote:
> > The only special case for PSH in RX left I can is in rcv_mss estimation,
> > where is assumes that a packet with PSH set is not full sized.
>
> A packet without PSH should be full size. Assuming the sender implemets
> SWS avoidance correctly, this should be a safe enough assumption.

It's not guaranteed by any spec; just common behaviour from BSD derived
stacks. SWS avoidance does not say anything about PSH flags.


>
> > On further
> > look the 2.4 tcp_measure_rcv_mss will never update rcv_mss for packets
> > which do have PSH set and in this case cause random ack behaviour depending
> > on the initial rcv_mss guess.
> > Not very nice; definitely violates the "be conservative what you accept"
> > rule. I'm not sure how to fix it, adding a fallback to every-two-packet-add
> > would pollute the fast path a bit.
>
> You're right. As far as I can see, it's not necessary to set the
> TCP_ACK_PUSHED flag at all (except maybe for SYN-ACK). I'm just writing
> a patch to clean this up.

Setting it for packets >= rcv_mss looks useful to me to catch mistakes.
Better too many acks than to few.


-Andi

2001-10-14 14:27:02

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

Andi Kleen wrote:

> It's not guaranteed by any spec; just common behaviour from BSD derived
> stacks. SWS avoidance does not say anything about PSH flags.

True enough. This is a slightly dubious heuristic at best. Besides, if
the sender sets TCP_NODELAY and sends packets that are between
TCP_MIN_MSS and true receive MSS, the estimate is probably totally
hosed.

My solution to this would be to recalculate rcv_mss once per window.
I.e., start new_rcv_mss from 0, keep increasing it for one window width,
and then copy it to rcv_mss. No funny heuristics, and it would adjust to
a shrunken MSS within one transmission window.

> > > On further
> > > look the 2.4 tcp_measure_rcv_mss will never update rcv_mss for packets
> > > which do have PSH set and in this case cause random ack behaviour depending
> > > on the initial rcv_mss guess.
> > > Not very nice; definitely violates the "be conservative what you accept"
> > > rule. I'm not sure how to fix it, adding a fallback to every-two-packet-add
> > > would pollute the fast path a bit.
> >
> > You're right. As far as I can see, it's not necessary to set the
> > TCP_ACK_PUSHED flag at all (except maybe for SYN-ACK). I'm just writing
> > a patch to clean this up.
>
> Setting it for packets >= rcv_mss looks useful to me to catch mistakes.
> Better too many acks than to few.

Maybe so, but in that case I would only set it for packets > rcv_mss.
Otherwise, my ack-every-segment-with-PSH problem would come back.

Actually, I think it would be better to simply to always ack every other
segment (except in quickack and fast recovery modes) and only use the
receive window estimation for window updates. This would guarantee
self-clocking in all cases.

> -Andi

Regards,

MikaL

2001-10-14 16:14:35

by Andi Kleen

[permalink] [raw]
Subject: Re: TCP acking too fast

On Sun, Oct 14, 2001 at 04:26:53PM +0200, Mika Liljeberg wrote:
> My solution to this would be to recalculate rcv_mss once per window.
> I.e., start new_rcv_mss from 0, keep increasing it for one window width,
> and then copy it to rcv_mss. No funny heuristics, and it would adjust to
> a shrunken MSS within one transmission window.

Sounds complicated. How would you implement it?

>
> > > > On further
> > > > look the 2.4 tcp_measure_rcv_mss will never update rcv_mss for packets
> > > > which do have PSH set and in this case cause random ack behaviour depending
> > > > on the initial rcv_mss guess.
> > > > Not very nice; definitely violates the "be conservative what you accept"
> > > > rule. I'm not sure how to fix it, adding a fallback to every-two-packet-add
> > > > would pollute the fast path a bit.
> > >
> > > You're right. As far as I can see, it's not necessary to set the
> > > TCP_ACK_PUSHED flag at all (except maybe for SYN-ACK). I'm just writing
> > > a patch to clean this up.
> >
> > Setting it for packets >= rcv_mss looks useful to me to catch mistakes.
> > Better too many acks than to few.
>
> Maybe so, but in that case I would only set it for packets > rcv_mss.
> Otherwise, my ack-every-segment-with-PSH problem would come back.

Yes > rcv_mss. Sorry for the typo.
>
> Actually, I think it would be better to simply to always ack every other
> segment (except in quickack and fast recovery modes) and only use the
> receive window estimation for window updates. This would guarantee
> self-clocking in all cases.

The original "ack after 2*mss" had been carefully tuned to work with well
slow PPP links in all case; after some bad experiences. It came
together with the variable length delayed ack.

The rcv_mss stuff was added later to fix some performance problems
on very big MTU links like HIPPI (where you have a MSS of 64k, but
often stacks send smaller packets like 48k; the ack after 2*mss check
only triggered every third packet, causing bad peroformance)

Now if nobody used slow PPP links anymore it would be probably ok
to go back to the simpler "ack every other packet" rule; but I'm afraid
that's not the case yet.

-Andi

2001-10-14 16:36:34

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP acking too fast

Hello!

> I just checked and the 2.4 kernel doesn't have the PSH quickack check
> anymore,

Right, it is removed because all the PSHed packets are acked as soon
as rcvbuf is completely drained and window is full open.

See? It is the reason of "too frequent" ACKs and I daresay they
are not too frequent and it is impossible to do something with this.
These ACKs are an _absolute_ demand and delay by some small time
helps nothing destroying performance instead.

Well, it is the place, commented with "Dubious? ... final cut."
It is enough to delete it to avoid "too frequent" ACKs and to return
to too rare ACKs instead.

Alexey

2001-10-14 16:55:57

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

Andi Kleen wrote:
>
> On Sun, Oct 14, 2001 at 04:26:53PM +0200, Mika Liljeberg wrote:
> > My solution to this would be to recalculate rcv_mss once per window.
> > I.e., start new_rcv_mss from 0, keep increasing it for one window width,
> > and then copy it to rcv_mss. No funny heuristics, and it would adjust to
> > a shrunken MSS within one transmission window.
>
> Sounds complicated. How would you implement it?

Not very hard at all. It could be done easily with a couple of extra
state variables. The following is a rough pseudo code (ignores
initialization of state variables):

if (seg.len > rcv.new_mss)
rcv.new_mss = seg.len;
if (rcv.nxt >= rcv.mss_seq || rcv.new_mss > rcv.mss) {
rcv.mss = max(rcv.new_mss, TCP_MIN_MSS);
rcv.new_mss = 0;
rcv.mss_seq = rcv.nxt + measurement_window;
}

The basic property is that you can balance the time required to detect a
decreased receive MSS against the reliability of the estimate by tuning
the measurement window. Increased receive MSS would be detected
immediately. Of course, I'm not claiming that there might not be a
better algorithim somewhere that doesn't require the two state
variables.

> > Actually, I think it would be better to simply to always ack every other
> > segment (except in quickack and fast recovery modes) and only use the
> > receive window estimation for window updates. This would guarantee
> > self-clocking in all cases.
>
> The original "ack after 2*mss" had been carefully tuned to work with well
> slow PPP links in all case; after some bad experiences. It came
> together with the variable length delayed ack.
>
> The rcv_mss stuff was added later to fix some performance problems
> on very big MTU links like HIPPI (where you have a MSS of 64k, but
> often stacks send smaller packets like 48k; the ack after 2*mss check
> only triggered every third packet, causing bad peroformance)
>
> Now if nobody used slow PPP links anymore it would be probably ok
> to go back to the simpler "ack every other packet" rule; but I'm afraid
> that's not the case yet.

Why would PPP links perform badly with ack-every-other? That isn't the
case in my experience, at least.

> -Andi

Regards,

MikaL

2001-10-14 17:08:11

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP acking too fast

Hello!

> Not very hard at all. It could be done easily with a couple of extra
> state variables.

Does current heuristics not work? :-)


> state variables. The following is a rough pseudo code (ignores
> initialization of state variables):

You missed one crucial moment: stream may consist of remnants
for long time or even forever. It is normal case. And rcv_mss is used
not only and mostly not for ACKing, it is used in really important places
(SWS avoidance et al), where specs propose to use your advertised MSS,
which does not work at all when you talk over high MTU interfaces.

The approach (invented by Andi?) provided necessary robustness,
checking for two segments in row and suppressing MSS drops below 536.
Check for PSHless segments allows to detect really low mtu reliably.

Alexey

2001-10-14 17:26:21

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

[email protected] wrote:
>
> Hello!
>
> > Not very hard at all. It could be done easily with a couple of extra
> > state variables.
>
> Does current heuristics not work? :-)

Well, you should read the preceding messages to understand how we got
here.

Andi had some reservations and I tend to agree. The current heuristic
assumes specific TCP behaviour, which is left as an implementation issue
in specifications. Conclusion: it works if you're lucky.

But it's true I can't show you any data to the contrary, either. This is
not the issue that started this thread.

> > state variables. The following is a rough pseudo code (ignores
> > initialization of state variables):
>
> You missed one crucial moment: stream may consist of remnants
> for long time or even forever. It is normal case. And rcv_mss is used
> not only and mostly not for ACKing, it is used in really important places
> (SWS avoidance et al), where specs propose to use your advertised MSS,
> which does not work at all when you talk over high MTU interfaces.

I don't think I missed that point.

> The approach (invented by Andi?) provided necessary robustness,
> checking for two segments in row and suppressing MSS drops below 536.
> Check for PSHless segments allows to detect really low mtu reliably.

When you say "reliably", you should recognize the underlying assumptions
as well.

> Alexey

Regards,

MikaL

2001-10-14 17:35:21

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP acking too fast

Hello!

> Well, you should read the preceding messages to understand how we got
> here.

I am reading now and until now I did not find why problem of calculating
rcv_mss raised at all. :-)

You nicely understood the reason of the problem and
it is surely not related to rcv_mss in any way. :-)


> When you say "reliably", you should recognize the underlying assumptions
> as well.

The assumptions are so conservative, that it is not worth to tell about them.

Heuristics does not predict fall of rcv_mss below 536 when sender
sets PSH on each frame. And it is pretty evident that such prediction
is impossible theoretically in this sad case. All that we can do is
to cry and to hold rcv_mss at 536 and to ack each 4th segment on
with mtu of 256.

Alexey

2001-10-14 17:56:13

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

[email protected] wrote:
>
> Hello!
>
> > Well, you should read the preceding messages to understand how we got
> > here.
>
> I am reading now and until now I did not find why problem of calculating
> rcv_mss raised at all. :-)

I think Andi brought it up. I was actually saying that it probably works
most of the time.

> You nicely understood the reason of the problem and
> it is surely not related to rcv_mss in any way. :-)
>
> > When you say "reliably", you should recognize the underlying assumptions
> > as well.
>
> The assumptions are so conservative, that it is not worth to tell about them.

The assumption is that the peer is implemented the way you expect and
that the application doesn't toy with TCP_NODELAY.

> Heuristics does not predict fall of rcv_mss below 536 when sender
> sets PSH on each frame. And it is pretty evident that such prediction
> is impossible theoretically in this sad case. All that we can do is
> to cry and to hold rcv_mss at 536 and to ack each 4th segment on
> with mtu of 256.

Not really. You could do one of two things: either ack every second
segment and leave rcv estimation only for window calculations, or use an
algorithm like the one I outlined. Either approach would work, I think,
and not produce strech acks.

Regards,

MikaL

2001-10-14 18:20:38

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP acking too fast

Hello!

> The assumption is that the peer is implemented the way you expect and
> that the application doesn't toy with TCP_NODELAY.

Sorry??

It is the most important _exactly_ for TCP_NODELAY, which
generates lots of remnants.


> Not really. You could do one of two things: either ack every second
> segment

I do not worry about this _at_ _all_. See?
"each other", "each two mss" --- all this is red herring.

I do understand your problem, which is not related to rcv_mss.
When bandwidth in different directions differ more than 20 times,
stretch ACKs are even preferred. Look into tcplw work, using stretch ACKs
is even considered as something normal.

I really commiserate and think that removing "final cut" clause
will help you. But sending ACK on buffer drain at least for short
packets is real demand, which cannot be relaxed.
"final cut" is also better not to remove actually, but the case
when it is required is probabilistically marginal.

Alexey

2001-10-14 18:48:53

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

[email protected] wrote:
>
> Hello!
>
> > The assumption is that the peer is implemented the way you expect and
> > that the application doesn't toy with TCP_NODELAY.
>
> Sorry??
>
> It is the most important _exactly_ for TCP_NODELAY, which
> generates lots of remnants.

I simply meant that with the application in control of packet size, you
simply can't make a reliable estimate of maximum receive MSS unless our
assumption that only maximum sized segments don't have PSH.

> > Not really. You could do one of two things: either ack every second
> > segment
>
> I do not worry about this _at_ _all_. See?
> "each other", "each two mss" --- all this is red herring.

Whatever.

> I do understand your problem, which is not related to rcv_mss.

I know.

> When bandwidth in different directions differ more than 20 times,
> stretch ACKs are even preferred. Look into tcplw work, using stretch ACKs
> is even considered as something normal.

I know. It's a difficult tradeoff between saving bandwidth on the return
path, trying to maintain self clocking, and avoiding bursts caused by
ack compression.

> I really commiserate and think that removing "final cut" clause
> will help you.

Yes.

> But sending ACK on buffer drain at least for short
> packets is real demand, which cannot be relaxed.

Why? This one has me stumped.

> "final cut" is also better not to remove actually, but the case
> when it is required is probabilistically marginal.
>
> Alexey

Regards,

MikaL

2001-10-14 19:12:47

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP acking too fast

Hello!

> > But sending ACK on buffer drain at least for short
> > packets is real demand, which cannot be relaxed.
>
> Why? This one has me stumped.

To remove sick delays with nagling transfers (1) and to remove
deadlocks due to starvation on rcvbuf (2) at receiver and on sndbuf
at sender (3).

Actually, (2) is solved nowadays with compressing queue. (3) can be solved
acking each other segment. But (1) remains. The solution used in 2.2,
when delack timeout was reduced to short value on short packets with PSH
set worked with probability of 50% on very slow links i.e. in the case
when wrong delay is not important at all and not covering cases
where absence of long gaps is really important.

Actually, any alternative idea how to solve this could be very useful.

Alexey

2001-10-14 19:33:13

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

[email protected] wrote:
>
> Hello!
>
> > > But sending ACK on buffer drain at least for short
> > > packets is real demand, which cannot be relaxed.
> >
> > Why? This one has me stumped.
>
> To remove sick delays with nagling transfers (1) and to remove
> deadlocks due to starvation on rcvbuf (2) at receiver and on sndbuf
> at sender (3).
>
> Actually, (2) is solved nowadays with compressing queue. (3) can be solved
> acking each other segment. But (1) remains.
>
> Actually, any alternative idea how to solve this could be very useful.

And why (1) is a problem is precisely what I don't understand. Nagle is
*supposed* to prevent you from sending multiple remnants. If you don't
like it, you disable it in the sender! However:

The only awkward Nagle-related delay I know of appears with e.g. HTTP,
when the last undersized segment cannot be sent before everything else
is acked. This can be solved using an idea from Greg Minshall, which I
thought was quite cool.

The normal Nagle rule goes:

- You cannot send a remnant if there are any unacknowledged segments
outstanding

Minshall's version goes:

- You cannot send a remnant if there is already one unacknowledged
remnant outstanding

This fixes the trailing remnant problem with HTTP and similar
request-reply protocols, while adherring to the spirit of Nagle. There
was even an I-D at some point but for some reason it has not been
updated.

> Alexey

Regards,

MikaL

2001-10-14 19:40:21

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP acking too fast

Hello!

> And why (1) is a problem is precisely what I don't understand. Nagle is
> *supposed* to prevent you from sending multiple remnants.

It is not supposed to delay between sends for delack timeout.
Nagle did not know about brain damages which his great idea
will cause when used together with delaying acks. :-)


> is acked. This can be solved using an idea from Greg Minshall, which I
> thought was quite cool.

It is approach used in 2.4. :-)

It does help when sender is also linux-2.4. :-)

Alexey

2001-10-14 20:06:07

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

[email protected] wrote:
> > And why (1) is a problem is precisely what I don't understand. Nagle is
> > *supposed* to prevent you from sending multiple remnants.
>
> It is not supposed to delay between sends for delack timeout.
> Nagle did not know about brain damages which his great idea
> will cause when used together with delaying acks. :-)

Well, I think this "problem" is way overstated. With a low latency path
the delay ack estimator should already take care of this. With a high
latency path you're out of luck in any case.

Besides, as I said, you can always disable Nagle in an interactive
application. I suppose it would be nice to have a socket option to
disable delayack as well, just for completeness.

> > is acked. This can be solved using an idea from Greg Minshall, which I
> > thought was quite cool.
>
> It is approach used in 2.4. :-)

Cool. :)

> It does help when sender is also linux-2.4. :-)
>
> Alexey

Regards,

MikaL

2001-10-15 18:41:07

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP acking too fast

Hello!

> Well, I think this "problem" is way overstated.

Understated. :-)

Actually, people who designed all this engine always kept in the mind
only two cases: ftp and telnet. Who did care that some funny
protocols sort of smtp work thousand times slower than they could?
Nobody. Until the time when mail agents started to push really
lots of mails.

> Besides, as I said, you can always disable Nagle

And you will finish with Nagle enabled only on ftp-data. I do not know
another standard protosols which are not broken by delack+nagle. :-)

This is sad but this is already truth: apache, samba etc, even ssh(!),
each of them disable nagle by default, even despite of they are able
to cure this problem with less of damage.

Well, I answered to the question: "tcp is slow!" --- "Guy, you forgot
to enable TCP_NODELAY. TCP is not supposed to work well in your case
without this" so much of times, that started to suspect that nagling
must be disabled by default. It would cause less of troubles. :-)

Alexey

2001-10-15 19:15:44

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

[email protected] wrote:
> > Well, I think this "problem" is way overstated.
>
> Understated. :-)
>
> Actually, people who designed all this engine always kept in the mind
> only two cases: ftp and telnet. Who did care that some funny
> protocols sort of smtp work thousand times slower than they could?

Well, if you ask me, it's smtp that is a prime example of braindead
protocol design. It's a wonder we're still using it. If you put that
many request-reply interactions into a protocol that could easily be
done in one you're simply begging for a bloody nose. Nagle or not, smtp
sucks. :)

Anyway, Minshall's version of Nagle is ok with smtp as long as the smtp
implementation isn't stupid enough to emit two remants in one go (yeah,
right).

Anyway, it would be interesting to try a (even more) relaxed version of
Nagle that would allow a maximum of two remnants in flight. This would
basically cover all TCP request/reply cases (leading AND trailing
remnant). Coupled with large initial window to get rid of small-cwnd
interactions, it might be almost be all right.

Assuming the above, we woulnd't need your ack-every-pushed-remnant
policy, except for the following pathological bidirection case:

A and B send two remnants to each other at the same time. Then both
block waiting for ack, until finally one of them sends a delay ack. You
could break this deadlock by using the following rule:

- if we're blocked on Nagle (two remnants out) and the received segment
has PSH, send ACK immediately

In other cases you wouldn't need to ack pushed segments. What do you
think? :-)

> Alexey

Regards,

MikaL

2001-10-15 19:38:17

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP acking too fast

Mika Liljeberg wrote:
> Anyway, it would be interesting to try a (even more) relaxed version of
> Nagle that would allow a maximum of two remnants in flight. This would
> basically cover all TCP request/reply cases (leading AND trailing
> remnant). Coupled with large initial window to get rid of small-cwnd
> interactions, it might be almost be all right.

Oops, bad idea. You can quench the objections, I already figured out it
won't work. :-(

I guess we're stuck with the current status quo: braindead application
protocols will perform badly no matter what we do. All we can really do
is prevent them harming the network.

Regards,

MikaL

2001-10-15 20:59:00

by Bill Davidsen

[permalink] [raw]
Subject: Re: TCP acking too fast

In article <[email protected]> [email protected] wrote:

>I've already disabled quickacks, replaced the receive MSS estimate with
>advertised MSS in the ack sending policy (two places), and removed one
>dubious "immediate ack" condition from send_delay_ack(). The annoying
>thing is that none of this seem to make any real difference. I must be
>missing something huge that's right in front of my nose, but I'm
>starting to run out of steam.
>
>Any thoughts on this?

The discussion has been most complete, I guess at this point is you
can't fix the sender to stop this anti-social behaviour, you might try
using iptables to "mangle" the PSH off from this host or rate limit the
ACKs, or some other hack. None of which is a "solution," just some
interesting things to try.

As noted, the core problem is that TCP doesn't like really asymmetric
bandwidth.

--
bill davidsen <[email protected]>
"If I were a diplomat, in the best case I'd go hungry. In the worst
case, people would die."
-- Robert Lipe