2008-11-21 12:18:59

by Petr Tesařík

[permalink] [raw]
Subject: [PATCH] Do not use TSO/GSO when there is urgent data

This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014

Since most (if not all) implementations of TSO and even the in-kernel
software GSO do not update the urgent pointer when splitting a large
segment, it is necessary to turn off TSO/GSO for all outgoing traffic
with the URG pointer set.

Looking at tcp_current_mss (and the preceding comment) I even think
this was the original intention. However, this approach is insufficient,
because TSO/GSO is turned off only for newly created frames, not for
frames which were already pending at the arrival of a message with
MSG_OOB set. These frames were created when TSO/GSO was enabled,
so they may be large, and they will have the urgent pointer set
in tcp_transmit_skb().

With this patch, such large packets will be fragmented again before
going to the transmit routine.

As a side note, at least the following NICs are known to screw up
the urgent pointer in the TCP header when doing TSO:

Intel 82566MM (PCI ID 8086:1049)
Intel 82566DC (PCI ID 8086:104b)
Intel 82541GI (PCI ID 8086:1076)
Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)

Signed-off-by: Petr Tesarik <[email protected]>
CC: Jan Sembera <[email protected]>
CC: Ilpo Jarvinen <[email protected]>

--
tcp_output.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -722,7 +722,8 @@ static void tcp_queue_skb(struct sock *sk, struct sk_buff
*skb)
static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb,
unsigned int mss_now)
{
- if (skb->len <= mss_now || !sk_can_gso(sk)) {
+ if (skb->len <= mss_now || !sk_can_gso(sk) ||
+ tcp_urg_mode(tcp_sk(sk))) {
/* Avoid the costly divide in the normal
* non-TSO case.
*/
@@ -1163,7 +1164,9 @@ static int tcp_init_tso_segs(struct sock *sk, struct
sk_buff *skb,
{
int tso_segs = tcp_skb_pcount(skb);

- if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
+ if (!tso_segs ||
+ (tso_segs > 1 && (tcp_skb_mss(skb) != mss_now ||
+ tcp_urg_mode(tcp_sk(sk))))) {
tcp_set_skb_tso_segs(sk, skb, mss_now);
tso_segs = tcp_skb_pcount(skb);
}


2008-11-21 13:07:49

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH] tcp: fix potential corner case issue in segmentation (Was: Re: [PATCH] Do not use TSO/GSO when there is urgent data)

On Fri, 21 Nov 2008, Petr Tesarik wrote:

> This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
>
> Since most (if not all) implementations of TSO and even the in-kernel
> software GSO do not update the urgent pointer when splitting a large
> segment, it is necessary to turn off TSO/GSO for all outgoing traffic
> with the URG pointer set.

Good observation, I totally missed this possibility of T/GSO while
looking.

> Looking at tcp_current_mss (and the preceding comment) I even think
> this was the original intention. However, this approach is insufficient,
> because TSO/GSO is turned off only for newly created frames, not for
> frames which were already pending at the arrival of a message with
> MSG_OOB set. These frames were created when TSO/GSO was enabled,
> so they may be large, and they will have the urgent pointer set
> in tcp_transmit_skb().
>
> With this patch, such large packets will be fragmented again before
> going to the transmit routine.

I wonder if there's some corner case which still fails to fragment
in tcp_retransmit_xmit's in skb->len <= cur_mss case if cur_mss
grew very recently (and therefore skb-len now fits to a single segment).

Patch below, changed subject is due to patchwork as requested by DaveM
(if you'd find that strange).

> As a side note, at least the following NICs are known to screw up
> the urgent pointer in the TCP header when doing TSO:
>
> Intel 82566MM (PCI ID 8086:1049)
> Intel 82566DC (PCI ID 8086:104b)
> Intel 82541GI (PCI ID 8086:1076)
> Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)

Heh, it might already be longer than the list we would get from the
working ones. Not very likely too many people consider urg too
seriously to get it right.

> Signed-off-by: Petr Tesarik <[email protected]>
> CC: Jan Sembera <[email protected]>
> CC: Ilpo Jarvinen <[email protected]>
>
> --
> tcp_output.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -722,7 +722,8 @@ static void tcp_queue_skb(struct sock *sk, struct sk_buff
> *skb)
> static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb,
> unsigned int mss_now)
> {
> - if (skb->len <= mss_now || !sk_can_gso(sk)) {
> + if (skb->len <= mss_now || !sk_can_gso(sk) ||
> + tcp_urg_mode(tcp_sk(sk))) {
> /* Avoid the costly divide in the normal
> * non-TSO case.
> */
> @@ -1163,7 +1164,9 @@ static int tcp_init_tso_segs(struct sock *sk, struct
> sk_buff *skb,
> {
> int tso_segs = tcp_skb_pcount(skb);
>
> - if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
> + if (!tso_segs ||
> + (tso_segs > 1 && (tcp_skb_mss(skb) != mss_now ||
> + tcp_urg_mode(tcp_sk(sk))))) {
> tcp_set_skb_tso_segs(sk, skb, mss_now);
> tso_segs = tcp_skb_pcount(skb);
> }

It's a bit intrusive but I couldn't immediately come up with alternative
that would have worked (came up with some not working ones :-)).

Acked-by: Ilpo J?rvinen <[email protected]>


--
i.


--

[PATCH] tcp: fix potential corner case issue in segmentation

If cur_mss grew very recently so that the previously G/TSOed skb
now fits well into a single segment, it might get send up in
parts unless we calculate # of segments again. Whether it actually
would break something more than that I don't know. But this would
create a more significant problem with the urg t/gso problem that
was found by Petr Tesarik <[email protected]>.

Compile tested only.

Signed-off-by: Ilpo J?rvinen <[email protected]>
---
net/ipv4/tcp_output.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a524627..129f46b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1944,6 +1944,8 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
if (skb->len > cur_mss) {
if (tcp_fragment(sk, skb, cur_mss, cur_mss))
return -ENOMEM; /* We'll try again later. */
+ } else {
+ tcp_init_tso_segs(sk, skb, cur_mss);
}

/* Collapse two adjacent packets if worthwhile and we can. */
--
1.5.2.2

2008-11-21 15:51:10

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH] tcp: fix potential corner case issue in segmentation (Was: Re: [PATCH] Do not use TSO/GSO when there is urgent data)

Dne Friday 21 of November 2008 14:07:32 Ilpo Järvinen napsal(a):
> On Fri, 21 Nov 2008, Petr Tesarik wrote:
> > This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
> >
> > Since most (if not all) implementations of TSO and even the in-kernel
> > software GSO do not update the urgent pointer when splitting a large
> > segment, it is necessary to turn off TSO/GSO for all outgoing traffic
> > with the URG pointer set.
>
> Good observation, I totally missed this possibility of T/GSO while
> looking.
>
> > Looking at tcp_current_mss (and the preceding comment) I even think
> > this was the original intention. However, this approach is insufficient,
> > because TSO/GSO is turned off only for newly created frames, not for
> > frames which were already pending at the arrival of a message with
> > MSG_OOB set. These frames were created when TSO/GSO was enabled,
> > so they may be large, and they will have the urgent pointer set
> > in tcp_transmit_skb().
> >
> > With this patch, such large packets will be fragmented again before
> > going to the transmit routine.
>
> I wonder if there's some corner case which still fails to fragment
> in tcp_retransmit_xmit's in skb->len <= cur_mss case if cur_mss
> grew very recently (and therefore skb-len now fits to a single segment).

This shouldn't be a problem, because TSO only applies to packets which are
larger than MSS, so the problematic case is when cur_mss gets smaller, not
when it grows. In other words, the original implementation of
tcp_retransmit_xmit() could never make use of TSO/GSO, anyway...

>[...]
> > Signed-off-by: Petr Tesarik <[email protected]>
> > CC: Jan Sembera <[email protected]>
> > CC: Ilpo Jarvinen <[email protected]>
> >
> > --
> > tcp_output.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -722,7 +722,8 @@ static void tcp_queue_skb(struct sock *sk, struct
> > sk_buff *skb)
> > static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb,
> > unsigned int mss_now)
> > {
> > - if (skb->len <= mss_now || !sk_can_gso(sk)) {
> > + if (skb->len <= mss_now || !sk_can_gso(sk) ||
> > + tcp_urg_mode(tcp_sk(sk))) {
> > /* Avoid the costly divide in the normal
> > * non-TSO case.
> > */
> > @@ -1163,7 +1164,9 @@ static int tcp_init_tso_segs(struct sock *sk,
> > struct sk_buff *skb,
> > {
> > int tso_segs = tcp_skb_pcount(skb);
> >
> > - if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
> > + if (!tso_segs ||
> > + (tso_segs > 1 && (tcp_skb_mss(skb) != mss_now ||
> > + tcp_urg_mode(tcp_sk(sk))))) {
> > tcp_set_skb_tso_segs(sk, skb, mss_now);
> > tso_segs = tcp_skb_pcount(skb);
> > }
>
> It's a bit intrusive but I couldn't immediately come up with alternative
> that would have worked (came up with some not working ones :-)).

Yes, I also noticed that. We could add some more code to tcp_mark_urg(), e.g.
walk sk_write_queue and adjust the pending SKBs there...

Is it OK to simply set all skb->gso_segs to zero, and let the next call to
tcp_init_tso_segs redo them? I mean, will tcp_init_tso_segs() be always
called on all SKBs which are in the write queue at the time tcp_mark_urg() is
called?

Petr

2008-11-21 17:49:23

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH] tcp: fix potential corner case issue in segmentation (Was: Re: [PATCH] Do not use TSO/GSO when there is urgent data)

On Fri, 21 Nov 2008, Petr Tesarik wrote:

> Dne Friday 21 of November 2008 14:07:32 Ilpo J?rvinen napsal(a):
> > On Fri, 21 Nov 2008, Petr Tesarik wrote:
> > > This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
> > >
> > > Since most (if not all) implementations of TSO and even the in-kernel
> > > software GSO do not update the urgent pointer when splitting a large
> > > segment, it is necessary to turn off TSO/GSO for all outgoing traffic
> > > with the URG pointer set.
> >
> > Good observation, I totally missed this possibility of T/GSO while
> > looking.
> >
> > > Looking at tcp_current_mss (and the preceding comment) I even think
> > > this was the original intention. However, this approach is insufficient,
> > > because TSO/GSO is turned off only for newly created frames, not for
> > > frames which were already pending at the arrival of a message with
> > > MSG_OOB set. These frames were created when TSO/GSO was enabled,
> > > so they may be large, and they will have the urgent pointer set
> > > in tcp_transmit_skb().
> > >
> > > With this patch, such large packets will be fragmented again before
> > > going to the transmit routine.
> >
> > I wonder if there's some corner case which still fails to fragment
> > in tcp_retransmit_xmit's in skb->len <= cur_mss case if cur_mss
> > grew very recently (and therefore skb-len now fits to a single segment).
>
> This shouldn't be a problem, because TSO only applies to packets which are
> larger than MSS, so the problematic case is when cur_mss gets smaller, not
> when it grows. In other words, the original implementation of
> tcp_retransmit_xmit() could never make use of TSO/GSO, anyway...

I disagree:

1. mss == x
2. skb->len == 2*x => Send using TSO/GSO
3. mss = 2*x (e.g, mtu probe completes)
4. rexmit skb, nothing resets TSO/GSO fields though now skb->len <= mss,
thus it will get sent as two, smaller than what's necessary, packets using
TSO/GSO and will not get into that fixed place of yours.

Or did I miss something?

> >[...]
> > > Signed-off-by: Petr Tesarik <[email protected]>
> > > CC: Jan Sembera <[email protected]>
> > > CC: Ilpo Jarvinen <[email protected]>
> > >
> > > --
> > > tcp_output.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -722,7 +722,8 @@ static void tcp_queue_skb(struct sock *sk, struct
> > > sk_buff *skb)
> > > static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb,
> > > unsigned int mss_now)
> > > {
> > > - if (skb->len <= mss_now || !sk_can_gso(sk)) {
> > > + if (skb->len <= mss_now || !sk_can_gso(sk) ||
> > > + tcp_urg_mode(tcp_sk(sk))) {
> > > /* Avoid the costly divide in the normal
> > > * non-TSO case.
> > > */
> > > @@ -1163,7 +1164,9 @@ static int tcp_init_tso_segs(struct sock *sk,
> > > struct sk_buff *skb,
> > > {
> > > int tso_segs = tcp_skb_pcount(skb);
> > >
> > > - if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
> > > + if (!tso_segs ||
> > > + (tso_segs > 1 && (tcp_skb_mss(skb) != mss_now ||
> > > + tcp_urg_mode(tcp_sk(sk))))) {
> > > tcp_set_skb_tso_segs(sk, skb, mss_now);
> > > tso_segs = tcp_skb_pcount(skb);
> > > }
> >
> > It's a bit intrusive but I couldn't immediately come up with alternative
> > that would have worked (came up with some not working ones :-)).
>
> Yes, I also noticed that. We could add some more code to tcp_mark_urg(), e.g.
> walk sk_write_queue and adjust the pending SKBs there...
>
> Is it OK to simply set all skb->gso_segs to zero, and let the next call to
> tcp_init_tso_segs redo them?

If we walk backwards we could consider short-circuit the walk at 16-bit
urg field limit. I wouldn't mind if users of such obscure feature pay the
price but the final decision is up to Dave of course.

> I mean, will tcp_init_tso_segs() be always
> called on all SKBs which are in the write queue at the time tcp_mark_urg() is
> called?

I realized there is more breakage: That tcp_current_mss tcp_urg_mode
is too late as well, it won't work either until we're already past the
relevant seqno range... It basically starts working at snd_up upwards
rather than working on snd_up-2^16..snd_up region.

In addition, it's quite impossible to tso/gso successfully past
ceil_to_mss(snd_up - 2^16) boundary anyway because we don't have enough
bits in the urgent field to tell at which point the fields should get set
(unless there would be some out-of-band communication channel for the
urgent sequence number).


--
i.

2008-11-21 18:21:22

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH] tcp: fix potential corner case issue in segmentation (Was: Re: [PATCH] Do not use TSO/GSO when there is urgent data)

On Fri, 21 Nov 2008, Ilpo J?rvinen wrote:

> On Fri, 21 Nov 2008, Petr Tesarik wrote:
>
> > > It's a bit intrusive but I couldn't immediately come up with alternative
> > > that would have worked (came up with some not working ones :-)).
> >
> > Yes, I also noticed that. We could add some more code to tcp_mark_urg(), e.g.
> > walk sk_write_queue and adjust the pending SKBs there...
> >
> > Is it OK to simply set all skb->gso_segs to zero, and let the next call to
> > tcp_init_tso_segs redo them?
>
> If we walk backwards we could consider short-circuit the walk at 16-bit
> urg field limit. I wouldn't mind if users of such obscure feature pay the
> price but the final decision is up to Dave of course.

On second though, it won't work since those fields get initialized later,
ie., at the send time so it would undo the effort.

--
i.

2008-11-22 00:43:52

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Do not use TSO/GSO when there is urgent data

From: Petr Tesarik <[email protected]>
Date: Fri, 21 Nov 2008 13:19:45 +0100

> This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
>
> Since most (if not all) implementations of TSO and even the in-kernel
> software GSO do not update the urgent pointer when splitting a large
> segment, it is necessary to turn off TSO/GSO for all outgoing traffic
> with the URG pointer set.
>
> Looking at tcp_current_mss (and the preceding comment) I even think
> this was the original intention. However, this approach is insufficient,
> because TSO/GSO is turned off only for newly created frames, not for
> frames which were already pending at the arrival of a message with
> MSG_OOB set. These frames were created when TSO/GSO was enabled,
> so they may be large, and they will have the urgent pointer set
> in tcp_transmit_skb().
>
> With this patch, such large packets will be fragmented again before
> going to the transmit routine.
>
> As a side note, at least the following NICs are known to screw up
> the urgent pointer in the TCP header when doing TSO:
>
> Intel 82566MM (PCI ID 8086:1049)
> Intel 82566DC (PCI ID 8086:104b)
> Intel 82541GI (PCI ID 8086:1076)
> Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
>
> Signed-off-by: Petr Tesarik <[email protected]>

Applied, thanks Petr.

BTW, your email client screwed up the patch by breaking up long lines,
for example:

> @@ -722,7 +722,8 @@ static void tcp_queue_skb(struct sock *sk, struct sk_buff
> *skb)

And:

> @@ -1163,7 +1164,9 @@ static int tcp_init_tso_segs(struct sock *sk, struct
> sk_buff *skb,

I fixed it up this time, but I will not in the future.