2006-03-06 22:34:33

by Michael S. Tsirkin

[permalink] [raw]
Subject: TSO and IPoIB performance degradation

Hello, Dave!
As you might know, the TSO patches merged into mainline kernel
since 2.6.11 have hurt performance for the simple (non-TSO)
high-speed netdevice that is IPoIB driver.

This was discussed at length here
http://openib.org/pipermail/openib-general/2005-October/012271.html

I'm trying to figure out what can be done to improve the situation.
In partucular, I'm looking at the Super TSO patch
http://oss.sgi.com/archives/netdev/2005-05/msg00889.html

merged into mainline here

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=314324121f9b94b2ca657a494cf2b9cb0e4a28cc

There, you said:

When we do ucopy receive (ie. copying directly to userspace
during tcp input processing) we attempt to delay the ACK
until cleanup_rbuf() is invoked. Most of the time this
technique works very well, and we emit one ACK advertising
the largest window.

But this explodes if the ucopy prequeue is large enough.
When the receiver is cpu limited and TSO frames are large,
the receiver is inundated with ucopy processing, such that
the ACK comes out very late. Often, this is so late that
by the time the sender gets the ACK the window has emptied
too much to be kept full by the sender.

The existing TSO code mostly avoided this by keeping the
TSO packets no larger than 1/8 of the available window.
But with the new code we can get much larger TSO frames.

So I'm trying to get a handle on it: could a solution be to simply
look at the frame size, and call tcp_send_delayed_ack from
if the frame size is no larger than 1/8?

Does this make sense?

Thanks,


--
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies


2006-03-06 22:40:33

by David Miller

[permalink] [raw]
Subject: Re: TSO and IPoIB performance degradation

From: "Michael S. Tsirkin" <[email protected]>
Date: Tue, 7 Mar 2006 00:34:38 +0200

> So I'm trying to get a handle on it: could a solution be to simply
> look at the frame size, and call tcp_send_delayed_ack from
> if the frame size is no larger than 1/8?
>
> Does this make sense?

The comment you mention is very old, and no longer applies.

Get full packet traces from the kernel TSO code in the 2.6.x
kernel, analyze is, and post here what you think is occuring
that is causing the performance problems.

One thing to note is that the newer TSO code really needs to
have large socket buffers, so you can experiment with that.

2006-03-06 22:51:05

by Stephen Hemminger

[permalink] [raw]
Subject: Re: TSO and IPoIB performance degradation

On Tue, 7 Mar 2006 00:34:38 +0200
"Michael S. Tsirkin" <[email protected]> wrote:

> Hello, Dave!
> As you might know, the TSO patches merged into mainline kernel
> since 2.6.11 have hurt performance for the simple (non-TSO)
> high-speed netdevice that is IPoIB driver.
>
> This was discussed at length here
> http://openib.org/pipermail/openib-general/2005-October/012271.html
>
> I'm trying to figure out what can be done to improve the situation.
> In partucular, I'm looking at the Super TSO patch
> http://oss.sgi.com/archives/netdev/2005-05/msg00889.html
>
> merged into mainline here
>
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=314324121f9b94b2ca657a494cf2b9cb0e4a28cc
>
> There, you said:
>
> When we do ucopy receive (ie. copying directly to userspace
> during tcp input processing) we attempt to delay the ACK
> until cleanup_rbuf() is invoked. Most of the time this
> technique works very well, and we emit one ACK advertising
> the largest window.
>
> But this explodes if the ucopy prequeue is large enough.
> When the receiver is cpu limited and TSO frames are large,
> the receiver is inundated with ucopy processing, such that
> the ACK comes out very late. Often, this is so late that
> by the time the sender gets the ACK the window has emptied
> too much to be kept full by the sender.
>
> The existing TSO code mostly avoided this by keeping the
> TSO packets no larger than 1/8 of the available window.
> But with the new code we can get much larger TSO frames.
>
> So I'm trying to get a handle on it: could a solution be to simply
> look at the frame size, and call tcp_send_delayed_ack from
> if the frame size is no larger than 1/8?
>
> Does this make sense?
>
> Thanks,
>
>


More likely you are getting hit by the fact that TSO prevents the congestion
window from increasing properly. This was fixed in 2.6.15 (around mid of Nov 2005).