2010-11-25 14:18:23

by nisse

[permalink] [raw]
Subject: TCP_MAXSEG vs TCP/generic segmentation offload (tso/gso)

[ This is a slightly updated repost of a an October 21 mail to the
linux-net list. Any hints or advice appreciated. /Niels ]

I have been observing large ethernet packets when generating TCP traffic
over a local ethernet, up to a bit over 20000 bytes, even though the
interface MTU is 1500 bytes.

Furthermore, I tried to use setsockopt with TCP_MAXSEG to limit the TCP
segment size further, to 1000 bytes, and that didn't have any effect.

When bugreporting a related problem to the debian kernel maintainers, I
was told that the behaviour may be linked to the use of TCP segmentation
offload (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=600286).

Disabling TSO and GSO using ethtool solves both problems: Generated
packets now are limited in size by both the interface MTU and the
segment size set with setsockopt. (Except the atl1c driver, where
ethtool -K eth0 tso off only results in a "Cannot set device tcp
segmentation offload settings: Operation not supported").

Before I try to write proper bug reports on specific network drivers (I
have seen problems with several network drivers on different machines,
unfortunately using different linux versions), I would like to know:

1. Is TCP_MAXSEG supposed to work at all with network drivers that do
tcp segmentation offload?

2. If it is supposed to work, can someone give a rough sketch on how the
per-socket segment size, set with setsockopt(... TCP_MAXSEG,...), is
passed down to the driver and to the network hardware? I suspect it
ought to be passed with each "pseudo-packet" to be transmitted.

I have spent some time searching the documentation and the net for
answers, without result, hence I'm posting to this list. I'm not
subscribed, so please cc any replies.

(Regarding packets larger than the interface MTU, that seems clearly
buggy to me, and I think I already know enough to be able to file proper
bug reports. And in the atl1c driver, it appears to have been fixed
between 1.0.0.1-NAPI and 1.0.1.0-NAPI).

Best regards,
/Niels

--
Niels M?ller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


2010-11-25 14:27:41

by Eric Dumazet

[permalink] [raw]
Subject: Re: TCP_MAXSEG vs TCP/generic segmentation offload (tso/gso)

Le jeudi 25 novembre 2010 à 14:44 +0100, Niels Möller a écrit :
> [ This is a slightly updated repost of a an October 21 mail to the
> linux-net list. Any hints or advice appreciated. /Niels ]
>

CC netdev

> I have been observing large ethernet packets when generating TCP traffic
> over a local ethernet, up to a bit over 20000 bytes, even though the
> interface MTU is 1500 bytes.
>
> Furthermore, I tried to use setsockopt with TCP_MAXSEG to limit the TCP
> segment size further, to 1000 bytes, and that didn't have any effect.
>
> When bugreporting a related problem to the debian kernel maintainers, I
> was told that the behaviour may be linked to the use of TCP segmentation
> offload (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=600286).
>
> Disabling TSO and GSO using ethtool solves both problems: Generated
> packets now are limited in size by both the interface MTU and the
> segment size set with setsockopt. (Except the atl1c driver, where
> ethtool -K eth0 tso off only results in a "Cannot set device tcp
> segmentation offload settings: Operation not supported").
>
> Before I try to write proper bug reports on specific network drivers (I
> have seen problems with several network drivers on different machines,
> unfortunately using different linux versions), I would like to know:
>
> 1. Is TCP_MAXSEG supposed to work at all with network drivers that do
> tcp segmentation offload?
>
> 2. If it is supposed to work, can someone give a rough sketch on how the
> per-socket segment size, set with setsockopt(... TCP_MAXSEG,...), is
> passed down to the driver and to the network hardware? I suspect it
> ought to be passed with each "pseudo-packet" to be transmitted.
>
> I have spent some time searching the documentation and the net for
> answers, without result, hence I'm posting to this list. I'm not
> subscribed, so please cc any replies.
>
> (Regarding packets larger than the interface MTU, that seems clearly
> buggy to me, and I think I already know enough to be able to file proper
> bug reports. And in the atl1c driver, it appears to have been fixed
> between 1.0.0.1-NAPI and 1.0.1.0-NAPI).

GSO is a software technique. Same for GRO.

Physical frames are indeed 1500 bytes (on regular ethernet links)

tcpdump gives you the high level view, before segmentation done in lower
levels (by NIC itself or in linux stack) in Transmit path.

We also have GRO in receive path, able to coalesce several 1500 bytes
frames into a single one (if same tcp flow), so that overhead in stacks
is lowered (netfilter, IP stack, tcp stack, bridge, routing ...)

So... there is no 'bug', unless you trust too much tcpdump output.


2010-11-25 15:09:51

by nisse

[permalink] [raw]
Subject: Re: TCP_MAXSEG vs TCP/generic segmentation offload (tso/gso)

Eric Dumazet <[email protected]> writes:

> So... there is no 'bug', unless you trust too much tcpdump output.

I really expected tcpdump -e to display the actual values in the link
layer header, including the correct frame size. It's more than a bit
confusing if that is not the case...

In the future, I will try to remember to always run tcpdump on a network
node which (i) is different from the sending one, and (ii) has GRO
disabled (and hence will discard packets if it has trouble processing
them all, rather than coalesce them).

What about the TCP_MAXSEG socket option, should that work? From a quick
look at driver source code, I could only see the handling of the
per-interface MTU, no per-socket segment size.

Thanks for the quick reply,
/Niels

--
Niels M?ller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

2010-11-25 15:18:38

by Eric Dumazet

[permalink] [raw]
Subject: Re: TCP_MAXSEG vs TCP/generic segmentation offload (tso/gso)

Le jeudi 25 novembre 2010 à 16:09 +0100, Niels Möller a écrit :
> Eric Dumazet <[email protected]> writes:
>
> > So... there is no 'bug', unless you trust too much tcpdump output.
>
> I really expected tcpdump -e to display the actual values in the link
> layer header, including the correct frame size. It's more than a bit
> confusing if that is not the case...
>
> In the future, I will try to remember to always run tcpdump on a network
> node which (i) is different from the sending one, and (ii) has GRO
> disabled (and hence will discard packets if it has trouble processing
> them all, rather than coalesce them).
>

Just disable GSO and TSO on sending machine, then tcpdump will show you
individual frames.


> What about the TCP_MAXSEG socket option, should that work? From a quick
> look at driver source code, I could only see the handling of the
> per-interface MTU, no per-socket segment size.

TCP_MAXSEG is certainly not handled in driver layer, but TCP layer.

/* If user gave his TCP_MAXSEG, record it to clamp */
if (tp->rx_opt.user_mss)
tp->rx_opt.mss_clamp = tp->rx_opt.user_mss;

I believe TCP_MAXSEG is working fine, but GRO/GSO dont care at all :
They coalesce frames whatever their size is.


2010-11-25 15:57:03

by Ben Gamari

[permalink] [raw]
Subject: Re: TCP_MAXSEG vs TCP/generic segmentation offload (tso/gso)

> I have spent some time searching the documentation and the net for
> answers, without result, hence I'm posting to this list. I'm not
> subscribed, so please cc any replies.
>
You may find that responses are more forthcoming if you include the
maintainers for the tcp subsystem and a few of the relevant
drivers. This will ensure that they see your message. The LKML is a
pretty high-traffic list and it is frequently that messages simply
aren't seen. Also, I would definitely start by filing bug reports
against individual drivers; as you said producing packets larger than
the maximum segment size definitely seems like buggy behavior. A bug
report will ensure that the problem doesn't fall through the cracks.
Good luck,

- Ben

2010-11-25 16:25:45

by nisse

[permalink] [raw]
Subject: Re: TCP_MAXSEG vs TCP/generic segmentation offload (tso/gso)

Eric Dumazet <[email protected]> writes:

> I believe TCP_MAXSEG is working fine, but GRO/GSO dont care at all :
> They coalesce frames whatever their size is.

I was under the impression that TSO (and maybe GSO) implied more
cleverness in the network card; that the network card more or less gets
to decide by itself how to divide a tcp stream into segments. And for
example in the atl1c driver which I looked a bit into, this was what the
REG_MTU register was for. Seems I have gotten this totally wrong.

Maybe Documentation/networking/netdevices.txt could clarify how it
works. Currently, it says

: Segmentation Offload (GSO, TSO) is an exception to this rule. The
: upper layer protocol may pass a large socket buffer to the device
: transmit routine, and the device will break that up into separate
: packets based on the current MTU.

Regards, and thanks for your patience,
/Niels

--
Niels M?ller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

2010-11-25 16:44:52

by Eric Dumazet

[permalink] [raw]
Subject: Re: TCP_MAXSEG vs TCP/generic segmentation offload (tso/gso)

Le jeudi 25 novembre 2010 à 17:25 +0100, Niels Möller a écrit :


>
> I was under the impression that TSO (and maybe GSO) implied more
> cleverness in the network card; that the network card more or less gets
> to decide by itself how to divide a tcp stream into segments. And for
> example in the atl1c driver which I looked a bit into, this was what the
> REG_MTU register was for. Seems I have gotten this totally wrong.
>

You were not totally wrong, but device does not use its own MTU to
perform the split : We give it the MSS of the flow.

You can have multiple flows in parallel, each with its own MSS, while
device has a single MTU.

> Maybe Documentation/networking/netdevices.txt could clarify how it
> works. Currently, it says
>
> : Segmentation Offload (GSO, TSO) is an exception to this rule. The
> : upper layer protocol may pass a large socket buffer to the device
> : transmit routine, and the device will break that up into separate
> : packets based on the current MTU.


MTU means : maximum transmission unit. But each layer has its own :)

In this context, TCP protocol, so MSS should be taken into account.

By default, MSS derives from device MTU (ipv4 without options case :
MSS = MTU - 40), but user can change it with TCP_MAXSEG.