2021-04-29 10:22:40

by Dongseok Yi

[permalink] [raw]
Subject: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

tcp_gso_segment check for the size of GROed payload if it is bigger
than the mss. bpf_skb_proto_6_to_4 increases mss, but the mss can be
bigger than the size of GROed payload unexpectedly if data_len is not
big enough.

Assume that skb gso_size = 1372 and data_len = 8. bpf_skb_proto_6_to_4
would increse the gso_size to 1392. tcp_gso_segment will get an error
with 1380 <= 1392.

Check for the size of GROed payload if it is really bigger than target
mss when increase mss.

Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
Signed-off-by: Dongseok Yi <[email protected]>
---
net/core/filter.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 9323d34..3f79e3c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
}

/* Due to IPv4 header, MSS can be upgraded. */
- skb_increase_gso_size(shinfo, len_diff);
+ if (skb->data_len > len_diff)
+ skb_increase_gso_size(shinfo, len_diff);
+
/* Header must be checked, and gso_segs recomputed. */
shinfo->gso_type |= SKB_GSO_DODGY;
shinfo->gso_segs = 0;
--
2.7.4


2021-05-05 21:22:24

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On 4/29/21 12:08 PM, Dongseok Yi wrote:
> tcp_gso_segment check for the size of GROed payload if it is bigger
> than the mss. bpf_skb_proto_6_to_4 increases mss, but the mss can be
> bigger than the size of GROed payload unexpectedly if data_len is not
> big enough.
>
> Assume that skb gso_size = 1372 and data_len = 8. bpf_skb_proto_6_to_4
> would increse the gso_size to 1392. tcp_gso_segment will get an error
> with 1380 <= 1392.
>
> Check for the size of GROed payload if it is really bigger than target
> mss when increase mss.
>
> Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> Signed-off-by: Dongseok Yi <[email protected]>
> ---
> net/core/filter.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 9323d34..3f79e3c 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> }
>
> /* Due to IPv4 header, MSS can be upgraded. */
> - skb_increase_gso_size(shinfo, len_diff);
> + if (skb->data_len > len_diff)

Could you elaborate some more on what this has to do with data_len specifically
here? I'm not sure I follow exactly your above commit description. Are you saying
that you're hitting in tcp_gso_segment():

[...]
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
goto out;
[...]

Please provide more context on the bug, thanks!

> + skb_increase_gso_size(shinfo, len_diff);
> +
> /* Header must be checked, and gso_segs recomputed. */
> shinfo->gso_type |= SKB_GSO_DODGY;
> shinfo->gso_segs = 0;
>

2021-05-06 00:52:22

by Dongseok Yi

[permalink] [raw]
Subject: RE: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Wed, May 05, 2021 at 10:55:10PM +0200, Daniel Borkmann wrote:
> On 4/29/21 12:08 PM, Dongseok Yi wrote:
> > tcp_gso_segment check for the size of GROed payload if it is bigger
> > than the mss. bpf_skb_proto_6_to_4 increases mss, but the mss can be
> > bigger than the size of GROed payload unexpectedly if data_len is not
> > big enough.
> >
> > Assume that skb gso_size = 1372 and data_len = 8. bpf_skb_proto_6_to_4
> > would increse the gso_size to 1392. tcp_gso_segment will get an error
> > with 1380 <= 1392.
> >
> > Check for the size of GROed payload if it is really bigger than target
> > mss when increase mss.
> >
> > Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> > Signed-off-by: Dongseok Yi <[email protected]>
> > ---
> > net/core/filter.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 9323d34..3f79e3c 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > }
> >
> > /* Due to IPv4 header, MSS can be upgraded. */
> > - skb_increase_gso_size(shinfo, len_diff);
> > + if (skb->data_len > len_diff)
>
> Could you elaborate some more on what this has to do with data_len specifically
> here? I'm not sure I follow exactly your above commit description. Are you saying
> that you're hitting in tcp_gso_segment():
>
> [...]
> mss = skb_shinfo(skb)->gso_size;
> if (unlikely(skb->len <= mss))
> goto out;
> [...]

Yes, right

>
> Please provide more context on the bug, thanks!

tcp_gso_segment():
[...]
__skb_pull(skb, thlen);

mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
[...]

skb->len will have total GROed TCP payload size after __skb_pull.
skb->len <= mss will not be happened in a normal GROed situation. But
bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
hit an error condition.

We should ensure the following condition.
total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)

Due to
total GROed TCP payload = the original mss + skb->data_len
IPv6 size - IPv4 size = len_diff

Finally, we can get the condition.
skb->data_len > len_diff

>
> > + skb_increase_gso_size(shinfo, len_diff);
> > +
> > /* Header must be checked, and gso_segs recomputed. */
> > shinfo->gso_type |= SKB_GSO_DODGY;
> > shinfo->gso_segs = 0;
> >


2021-05-06 01:48:19

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Wed, May 5, 2021 at 8:45 PM Dongseok Yi <[email protected]> wrote:
>
> On Wed, May 05, 2021 at 10:55:10PM +0200, Daniel Borkmann wrote:
> > On 4/29/21 12:08 PM, Dongseok Yi wrote:
> > > tcp_gso_segment check for the size of GROed payload if it is bigger
> > > than the mss. bpf_skb_proto_6_to_4 increases mss, but the mss can be
> > > bigger than the size of GROed payload unexpectedly if data_len is not
> > > big enough.
> > >
> > > Assume that skb gso_size = 1372 and data_len = 8. bpf_skb_proto_6_to_4

Is this a typo and is this intended to read skb->data_len = 1380?

The issue is that payload length (1380) is greater than mss with ipv6
(1372), but less than mss with ipv4 (1392).

I don't understand data_len = 8 or why the patch compares
skb->data_len to len_diff (20).

One simple solution if this packet no longer needs to be segmented
might be to reset the gso_type completely.

In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
converting from IPv6 to IPv4, fixed gso will end up building packets
that are slightly below the MTU. That opportunity cost is negligible
(especially with TSO). Unfortunately, I see that that flag is
available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.


> > > would increse the gso_size to 1392. tcp_gso_segment will get an error
> > > with 1380 <= 1392.
> > >
> > > Check for the size of GROed payload if it is really bigger than target
> > > mss when increase mss.
> > >
> > > Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> > > Signed-off-by: Dongseok Yi <[email protected]>
> > > ---
> > > net/core/filter.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 9323d34..3f79e3c 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > > }
> > >
> > > /* Due to IPv4 header, MSS can be upgraded. */
> > > - skb_increase_gso_size(shinfo, len_diff);
> > > + if (skb->data_len > len_diff)
> >
> > Could you elaborate some more on what this has to do with data_len specifically
> > here? I'm not sure I follow exactly your above commit description. Are you saying
> > that you're hitting in tcp_gso_segment():
> >
> > [...]
> > mss = skb_shinfo(skb)->gso_size;
> > if (unlikely(skb->len <= mss))
> > goto out;
> > [...]
>
> Yes, right
>
> >
> > Please provide more context on the bug, thanks!
>
> tcp_gso_segment():
> [...]
> __skb_pull(skb, thlen);
>
> mss = skb_shinfo(skb)->gso_size;
> if (unlikely(skb->len <= mss))
> [...]
>
> skb->len will have total GROed TCP payload size after __skb_pull.
> skb->len <= mss will not be happened in a normal GROed situation. But
> bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
> hit an error condition.
>
> We should ensure the following condition.
> total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
>
> Due to
> total GROed TCP payload = the original mss + skb->data_len
> IPv6 size - IPv4 size = len_diff
>
> Finally, we can get the condition.
> skb->data_len > len_diff
>
> >
> > > + skb_increase_gso_size(shinfo, len_diff);
> > > +
> > > /* Header must be checked, and gso_segs recomputed. */
> > > shinfo->gso_type |= SKB_GSO_DODGY;
> > > shinfo->gso_segs = 0;
> > >
>
>

2021-05-06 02:30:07

by Dongseok Yi

[permalink] [raw]
Subject: RE: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Wed, May 05, 2021 at 09:45:37PM -0400, Willem de Bruijn wrote:
> On Wed, May 5, 2021 at 8:45 PM Dongseok Yi <[email protected]> wrote:
> >
> > On Wed, May 05, 2021 at 10:55:10PM +0200, Daniel Borkmann wrote:
> > > On 4/29/21 12:08 PM, Dongseok Yi wrote:
> > > > tcp_gso_segment check for the size of GROed payload if it is bigger
> > > > than the mss. bpf_skb_proto_6_to_4 increases mss, but the mss can be
> > > > bigger than the size of GROed payload unexpectedly if data_len is not
> > > > big enough.
> > > >
> > > > Assume that skb gso_size = 1372 and data_len = 8. bpf_skb_proto_6_to_4
>
> Is this a typo and is this intended to read skb->data_len = 1380?

This is not a typo. I intended skb->data_len = 8.

>
> The issue is that payload length (1380) is greater than mss with ipv6
> (1372), but less than mss with ipv4 (1392).
>
> I don't understand data_len = 8 or why the patch compares
> skb->data_len to len_diff (20).

skb_gro_receive():
unsigned int len = skb_gro_len(skb);
[...]
done:
NAPI_GRO_CB(p)->count++;
p->data_len += len;

head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
data_len could be 8 if server sent a small size packet and it is GROed
to head_skb.

Please let me know if I am missing something.

>
> One simple solution if this packet no longer needs to be segmented
> might be to reset the gso_type completely.

I am not sure gso_type can be cleared even when GSO is needed.

>
> In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
> converting from IPv6 to IPv4, fixed gso will end up building packets
> that are slightly below the MTU. That opportunity cost is negligible
> (especially with TSO). Unfortunately, I see that that flag is
> available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.
>
>
> > > > would increse the gso_size to 1392. tcp_gso_segment will get an error
> > > > with 1380 <= 1392.
> > > >
> > > > Check for the size of GROed payload if it is really bigger than target
> > > > mss when increase mss.
> > > >
> > > > Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> > > > Signed-off-by: Dongseok Yi <[email protected]>
> > > > ---
> > > > net/core/filter.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index 9323d34..3f79e3c 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > > > }
> > > >
> > > > /* Due to IPv4 header, MSS can be upgraded. */
> > > > - skb_increase_gso_size(shinfo, len_diff);
> > > > + if (skb->data_len > len_diff)
> > >
> > > Could you elaborate some more on what this has to do with data_len specifically
> > > here? I'm not sure I follow exactly your above commit description. Are you saying
> > > that you're hitting in tcp_gso_segment():
> > >
> > > [...]
> > > mss = skb_shinfo(skb)->gso_size;
> > > if (unlikely(skb->len <= mss))
> > > goto out;
> > > [...]
> >
> > Yes, right
> >
> > >
> > > Please provide more context on the bug, thanks!
> >
> > tcp_gso_segment():
> > [...]
> > __skb_pull(skb, thlen);
> >
> > mss = skb_shinfo(skb)->gso_size;
> > if (unlikely(skb->len <= mss))
> > [...]
> >
> > skb->len will have total GROed TCP payload size after __skb_pull.
> > skb->len <= mss will not be happened in a normal GROed situation. But
> > bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
> > hit an error condition.
> >
> > We should ensure the following condition.
> > total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
> >
> > Due to
> > total GROed TCP payload = the original mss + skb->data_len
> > IPv6 size - IPv4 size = len_diff
> >
> > Finally, we can get the condition.
> > skb->data_len > len_diff
> >
> > >
> > > > + skb_increase_gso_size(shinfo, len_diff);
> > > > +
> > > > /* Header must be checked, and gso_segs recomputed. */
> > > > shinfo->gso_type |= SKB_GSO_DODGY;
> > > > shinfo->gso_segs = 0;
> > > >
> >
> >

2021-05-06 19:49:10

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Wed, May 5, 2021 at 10:27 PM Dongseok Yi <[email protected]> wrote:
>
> On Wed, May 05, 2021 at 09:45:37PM -0400, Willem de Bruijn wrote:
> > On Wed, May 5, 2021 at 8:45 PM Dongseok Yi <[email protected]> wrote:
> > >
> > > On Wed, May 05, 2021 at 10:55:10PM +0200, Daniel Borkmann wrote:
> > > > On 4/29/21 12:08 PM, Dongseok Yi wrote:
> > > > > tcp_gso_segment check for the size of GROed payload if it is bigger
> > > > > than the mss. bpf_skb_proto_6_to_4 increases mss, but the mss can be
> > > > > bigger than the size of GROed payload unexpectedly if data_len is not
> > > > > big enough.
> > > > >
> > > > > Assume that skb gso_size = 1372 and data_len = 8. bpf_skb_proto_6_to_4
> >
> > Is this a typo and is this intended to read skb->data_len = 1380?
>
> This is not a typo. I intended skb->data_len = 8.
>
> >
> > The issue is that payload length (1380) is greater than mss with ipv6
> > (1372), but less than mss with ipv4 (1392).
> >
> > I don't understand data_len = 8 or why the patch compares
> > skb->data_len to len_diff (20).
>
> skb_gro_receive():
> unsigned int len = skb_gro_len(skb);
> [...]
> done:
> NAPI_GRO_CB(p)->count++;
> p->data_len += len;
>
> head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
> data_len could be 8 if server sent a small size packet and it is GROed
> to head_skb.
>
> Please let me know if I am missing something.

This is my understanding of the data path. This is a forwarding path
for TCP traffic.

GRO is enabled and will coalesce multiple segments into a single large
packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
+ 20.

Somewhere between GRO and GSO you have a BPF program that converts the
IPv6 address to IPv4.

There is no concept of head_skb at the time of this BPF program. It is
a single SKB, with an skb linear part and multiple data items in the
frags (no frag_list).

When entering the GSO stack, this single skb now has a payload length
< MSS. So it would just make a valid TCP packet on its own?

skb_gro_len is only relevant inside the GRO stack. It internally casts
the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
reused for other purposes later by other layers of the datapath. It is
not safe to read this inside bpf_skb_proto_6_to_4.


> >
> > One simple solution if this packet no longer needs to be segmented
> > might be to reset the gso_type completely.
>
> I am not sure gso_type can be cleared even when GSO is needed.
>
> >
> > In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
> > converting from IPv6 to IPv4, fixed gso will end up building packets
> > that are slightly below the MTU. That opportunity cost is negligible
> > (especially with TSO). Unfortunately, I see that that flag is
> > available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.
> >
> >
> > > > > would increse the gso_size to 1392. tcp_gso_segment will get an error
> > > > > with 1380 <= 1392.
> > > > >
> > > > > Check for the size of GROed payload if it is really bigger than target
> > > > > mss when increase mss.
> > > > >
> > > > > Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> > > > > Signed-off-by: Dongseok Yi <[email protected]>
> > > > > ---
> > > > > net/core/filter.c | 4 +++-
> > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > index 9323d34..3f79e3c 100644
> > > > > --- a/net/core/filter.c
> > > > > +++ b/net/core/filter.c
> > > > > @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > > > > }
> > > > >
> > > > > /* Due to IPv4 header, MSS can be upgraded. */
> > > > > - skb_increase_gso_size(shinfo, len_diff);
> > > > > + if (skb->data_len > len_diff)
> > > >
> > > > Could you elaborate some more on what this has to do with data_len specifically
> > > > here? I'm not sure I follow exactly your above commit description. Are you saying
> > > > that you're hitting in tcp_gso_segment():
> > > >
> > > > [...]
> > > > mss = skb_shinfo(skb)->gso_size;
> > > > if (unlikely(skb->len <= mss))
> > > > goto out;
> > > > [...]
> > >
> > > Yes, right
> > >
> > > >
> > > > Please provide more context on the bug, thanks!
> > >
> > > tcp_gso_segment():
> > > [...]
> > > __skb_pull(skb, thlen);
> > >
> > > mss = skb_shinfo(skb)->gso_size;
> > > if (unlikely(skb->len <= mss))
> > > [...]
> > >
> > > skb->len will have total GROed TCP payload size after __skb_pull.
> > > skb->len <= mss will not be happened in a normal GROed situation. But
> > > bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
> > > hit an error condition.
> > >
> > > We should ensure the following condition.
> > > total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
> > >
> > > Due to
> > > total GROed TCP payload = the original mss + skb->data_len
> > > IPv6 size - IPv4 size = len_diff
> > >
> > > Finally, we can get the condition.
> > > skb->data_len > len_diff
> > >
> > > >
> > > > > + skb_increase_gso_size(shinfo, len_diff);
> > > > > +
> > > > > /* Header must be checked, and gso_segs recomputed. */
> > > > > shinfo->gso_type |= SKB_GSO_DODGY;
> > > > > shinfo->gso_segs = 0;
> > > > >
> > >
> > >
>

2021-05-07 01:36:22

by Dongseok Yi

[permalink] [raw]
Subject: RE: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Thu, May 06, 2021 at 02:21:37PM -0400, Willem de Bruijn wrote:
> On Wed, May 5, 2021 at 10:27 PM Dongseok Yi <[email protected]> wrote:
> >
> > On Wed, May 05, 2021 at 09:45:37PM -0400, Willem de Bruijn wrote:
> > > On Wed, May 5, 2021 at 8:45 PM Dongseok Yi <[email protected]> wrote:
> > > >
> > > > On Wed, May 05, 2021 at 10:55:10PM +0200, Daniel Borkmann wrote:
> > > > > On 4/29/21 12:08 PM, Dongseok Yi wrote:
> > > > > > tcp_gso_segment check for the size of GROed payload if it is bigger
> > > > > > than the mss. bpf_skb_proto_6_to_4 increases mss, but the mss can be
> > > > > > bigger than the size of GROed payload unexpectedly if data_len is not
> > > > > > big enough.
> > > > > >
> > > > > > Assume that skb gso_size = 1372 and data_len = 8. bpf_skb_proto_6_to_4
> > >
> > > Is this a typo and is this intended to read skb->data_len = 1380?
> >
> > This is not a typo. I intended skb->data_len = 8.
> >
> > >
> > > The issue is that payload length (1380) is greater than mss with ipv6
> > > (1372), but less than mss with ipv4 (1392).
> > >
> > > I don't understand data_len = 8 or why the patch compares
> > > skb->data_len to len_diff (20).
> >
> > skb_gro_receive():
> > unsigned int len = skb_gro_len(skb);
> > [...]
> > done:
> > NAPI_GRO_CB(p)->count++;
> > p->data_len += len;
> >
> > head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
> > data_len could be 8 if server sent a small size packet and it is GROed
> > to head_skb.
> >
> > Please let me know if I am missing something.
>
> This is my understanding of the data path. This is a forwarding path
> for TCP traffic.
>
> GRO is enabled and will coalesce multiple segments into a single large
> packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
> + 20.
>
> Somewhere between GRO and GSO you have a BPF program that converts the
> IPv6 address to IPv4.

Your understanding is right. The data path is GRO -> BPF 6 to 4 ->
GSO.

>
> There is no concept of head_skb at the time of this BPF program. It is
> a single SKB, with an skb linear part and multiple data items in the
> frags (no frag_list).

Sorry for the confusion. head_skb what I mentioned was a skb linear
part. I'm considering a single SKB with frags too.

>
> When entering the GSO stack, this single skb now has a payload length
> < MSS. So it would just make a valid TCP packet on its own?
>
> skb_gro_len is only relevant inside the GRO stack. It internally casts
> the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
> reused for other purposes later by other layers of the datapath. It is
> not safe to read this inside bpf_skb_proto_6_to_4.

The condition what I made uses skb->data_len not skb_gro_len. Does
skb->data_len have a different meaning on each layer? As I know,
data_len indicates the amount of frags or frag_list. skb->data_len
should be > 20 in the sample case because the payload size of the skb
linear part is the same with mss.

We can modify netif_needs_gso as another option to hit
skb_needs_linearize in validate_xmit_skb. But I think we should compare
skb->gso_size and skb->data_len too to check if mss exceed a payload
size.

>
>
> > >
> > > One simple solution if this packet no longer needs to be segmented
> > > might be to reset the gso_type completely.
> >
> > I am not sure gso_type can be cleared even when GSO is needed.
> >
> > >
> > > In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
> > > converting from IPv6 to IPv4, fixed gso will end up building packets
> > > that are slightly below the MTU. That opportunity cost is negligible
> > > (especially with TSO). Unfortunately, I see that that flag is
> > > available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.
> > >
> > >
> > > > > > would increse the gso_size to 1392. tcp_gso_segment will get an error
> > > > > > with 1380 <= 1392.
> > > > > >
> > > > > > Check for the size of GROed payload if it is really bigger than target
> > > > > > mss when increase mss.
> > > > > >
> > > > > > Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> > > > > > Signed-off-by: Dongseok Yi <[email protected]>
> > > > > > ---
> > > > > > net/core/filter.c | 4 +++-
> > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > > index 9323d34..3f79e3c 100644
> > > > > > --- a/net/core/filter.c
> > > > > > +++ b/net/core/filter.c
> > > > > > @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > > > > > }
> > > > > >
> > > > > > /* Due to IPv4 header, MSS can be upgraded. */
> > > > > > - skb_increase_gso_size(shinfo, len_diff);
> > > > > > + if (skb->data_len > len_diff)
> > > > >
> > > > > Could you elaborate some more on what this has to do with data_len specifically
> > > > > here? I'm not sure I follow exactly your above commit description. Are you saying
> > > > > that you're hitting in tcp_gso_segment():
> > > > >
> > > > > [...]
> > > > > mss = skb_shinfo(skb)->gso_size;
> > > > > if (unlikely(skb->len <= mss))
> > > > > goto out;
> > > > > [...]
> > > >
> > > > Yes, right
> > > >
> > > > >
> > > > > Please provide more context on the bug, thanks!
> > > >
> > > > tcp_gso_segment():
> > > > [...]
> > > > __skb_pull(skb, thlen);
> > > >
> > > > mss = skb_shinfo(skb)->gso_size;
> > > > if (unlikely(skb->len <= mss))
> > > > [...]
> > > >
> > > > skb->len will have total GROed TCP payload size after __skb_pull.
> > > > skb->len <= mss will not be happened in a normal GROed situation. But
> > > > bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
> > > > hit an error condition.
> > > >
> > > > We should ensure the following condition.
> > > > total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
> > > >
> > > > Due to
> > > > total GROed TCP payload = the original mss + skb->data_len
> > > > IPv6 size - IPv4 size = len_diff
> > > >
> > > > Finally, we can get the condition.
> > > > skb->data_len > len_diff
> > > >
> > > > >
> > > > > > + skb_increase_gso_size(shinfo, len_diff);
> > > > > > +
> > > > > > /* Header must be checked, and gso_segs recomputed. */
> > > > > > shinfo->gso_type |= SKB_GSO_DODGY;
> > > > > > shinfo->gso_segs = 0;
> > > > > >
> > > >
> > > >
> >

2021-05-07 01:58:35

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

> > > head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
> > > data_len could be 8 if server sent a small size packet and it is GROed
> > > to head_skb.
> > >
> > > Please let me know if I am missing something.
> >
> > This is my understanding of the data path. This is a forwarding path
> > for TCP traffic.
> >
> > GRO is enabled and will coalesce multiple segments into a single large
> > packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
> > + 20.
> >
> > Somewhere between GRO and GSO you have a BPF program that converts the
> > IPv6 address to IPv4.
>
> Your understanding is right. The data path is GRO -> BPF 6 to 4 ->
> GSO.
>
> >
> > There is no concept of head_skb at the time of this BPF program. It is
> > a single SKB, with an skb linear part and multiple data items in the
> > frags (no frag_list).
>
> Sorry for the confusion. head_skb what I mentioned was a skb linear
> part. I'm considering a single SKB with frags too.
>
> >
> > When entering the GSO stack, this single skb now has a payload length
> > < MSS. So it would just make a valid TCP packet on its own?
> >
> > skb_gro_len is only relevant inside the GRO stack. It internally casts
> > the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
> > reused for other purposes later by other layers of the datapath. It is
> > not safe to read this inside bpf_skb_proto_6_to_4.
>
> The condition what I made uses skb->data_len not skb_gro_len. Does
> skb->data_len have a different meaning on each layer? As I know,
> data_len indicates the amount of frags or frag_list. skb->data_len
> should be > 20 in the sample case because the payload size of the skb
> linear part is the same with mss.

Ah, got it.

data_len is the length of the skb minus the length in the skb linear
section (as seen in skb_headlen).

So this gso skb consists of two segments, the first one entirely
linear, the payload of the second is in skb_shinfo(skb)->frags[0].

It is not guaranteed that gso skbs built from two individual skbs end
up looking like that. Only protocol headers in the linear segment and
the payload of both in frags is common.

> We can modify netif_needs_gso as another option to hit
> skb_needs_linearize in validate_xmit_skb. But I think we should compare
> skb->gso_size and skb->data_len too to check if mss exceed a payload
> size.

The rest of the stack does not build such gso packets with payload len
< mss, so we should not have to add workarounds in the gso hot path
for this.

Also no need to linearize this skb. I think that if the bpf program
would just clear the gso type, the packet would be sent correctly.
Unless I'm missing something.

But I don't mean to argue that it should do that in production.
Instead, not playing mss games would solve this and stay close to the
original datapath if no bpf program had been present. Including
maintaining the GSO invariant of sending out the same chain of packets
as received (bar the IPv6 to IPv4 change).

This could be achieved by adding support for the flag
BPF_F_ADJ_ROOM_FIXED_GSO in the flags field of bpf_skb_change_proto.
And similar to bpf_skb_net_shrink:

/* Due to header shrink, MSS can be upgraded. */
if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
skb_increase_gso_size(shinfo, len_diff);

The other case, from IPv4 to IPv6 is more difficult to address, as not
reducing the MSS will result in packets exceeding MTU. That calls for
workarounds like MSS clamping. Anyway, that is out of scope here.



> >
> >
> > > >
> > > > One simple solution if this packet no longer needs to be segmented
> > > > might be to reset the gso_type completely.
> > >
> > > I am not sure gso_type can be cleared even when GSO is needed.
> > >
> > > >
> > > > In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
> > > > converting from IPv6 to IPv4, fixed gso will end up building packets
> > > > that are slightly below the MTU. That opportunity cost is negligible
> > > > (especially with TSO). Unfortunately, I see that that flag is
> > > > available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.
> > > >
> > > >
> > > > > > > would increse the gso_size to 1392. tcp_gso_segment will get an error
> > > > > > > with 1380 <= 1392.
> > > > > > >
> > > > > > > Check for the size of GROed payload if it is really bigger than target
> > > > > > > mss when increase mss.
> > > > > > >
> > > > > > > Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> > > > > > > Signed-off-by: Dongseok Yi <[email protected]>
> > > > > > > ---
> > > > > > > net/core/filter.c | 4 +++-
> > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > > > index 9323d34..3f79e3c 100644
> > > > > > > --- a/net/core/filter.c
> > > > > > > +++ b/net/core/filter.c
> > > > > > > @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > > > > > > }
> > > > > > >
> > > > > > > /* Due to IPv4 header, MSS can be upgraded. */
> > > > > > > - skb_increase_gso_size(shinfo, len_diff);
> > > > > > > + if (skb->data_len > len_diff)
> > > > > >
> > > > > > Could you elaborate some more on what this has to do with data_len specifically
> > > > > > here? I'm not sure I follow exactly your above commit description. Are you saying
> > > > > > that you're hitting in tcp_gso_segment():
> > > > > >
> > > > > > [...]
> > > > > > mss = skb_shinfo(skb)->gso_size;
> > > > > > if (unlikely(skb->len <= mss))
> > > > > > goto out;
> > > > > > [...]
> > > > >
> > > > > Yes, right
> > > > >
> > > > > >
> > > > > > Please provide more context on the bug, thanks!
> > > > >
> > > > > tcp_gso_segment():
> > > > > [...]
> > > > > __skb_pull(skb, thlen);
> > > > >
> > > > > mss = skb_shinfo(skb)->gso_size;
> > > > > if (unlikely(skb->len <= mss))
> > > > > [...]
> > > > >
> > > > > skb->len will have total GROed TCP payload size after __skb_pull.
> > > > > skb->len <= mss will not be happened in a normal GROed situation. But
> > > > > bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
> > > > > hit an error condition.
> > > > >
> > > > > We should ensure the following condition.
> > > > > total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
> > > > >
> > > > > Due to
> > > > > total GROed TCP payload = the original mss + skb->data_len
> > > > > IPv6 size - IPv4 size = len_diff
> > > > >
> > > > > Finally, we can get the condition.
> > > > > skb->data_len > len_diff
> > > > >
> > > > > >
> > > > > > > + skb_increase_gso_size(shinfo, len_diff);
> > > > > > > +
> > > > > > > /* Header must be checked, and gso_segs recomputed. */
> > > > > > > shinfo->gso_type |= SKB_GSO_DODGY;
> > > > > > > shinfo->gso_segs = 0;
> > > > > > >
> > > > >
> > > > >
> > >
>

2021-05-07 02:15:32

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On 2021/5/7 9:25, Willem de Bruijn wrote:
>>>> head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
>>>> data_len could be 8 if server sent a small size packet and it is GROed
>>>> to head_skb.
>>>>
>>>> Please let me know if I am missing something.
>>>
>>> This is my understanding of the data path. This is a forwarding path
>>> for TCP traffic.
>>>
>>> GRO is enabled and will coalesce multiple segments into a single large
>>> packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
>>> + 20.
>>>
>>> Somewhere between GRO and GSO you have a BPF program that converts the
>>> IPv6 address to IPv4.
>>
>> Your understanding is right. The data path is GRO -> BPF 6 to 4 ->
>> GSO.
>>
>>>
>>> There is no concept of head_skb at the time of this BPF program. It is
>>> a single SKB, with an skb linear part and multiple data items in the
>>> frags (no frag_list).
>>
>> Sorry for the confusion. head_skb what I mentioned was a skb linear
>> part. I'm considering a single SKB with frags too.
>>
>>>
>>> When entering the GSO stack, this single skb now has a payload length
>>> < MSS. So it would just make a valid TCP packet on its own?
>>>
>>> skb_gro_len is only relevant inside the GRO stack. It internally casts
>>> the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
>>> reused for other purposes later by other layers of the datapath. It is
>>> not safe to read this inside bpf_skb_proto_6_to_4.
>>
>> The condition what I made uses skb->data_len not skb_gro_len. Does
>> skb->data_len have a different meaning on each layer? As I know,
>> data_len indicates the amount of frags or frag_list. skb->data_len
>> should be > 20 in the sample case because the payload size of the skb
>> linear part is the same with mss.
>
> Ah, got it.
>
> data_len is the length of the skb minus the length in the skb linear
> section (as seen in skb_headlen).
>
> So this gso skb consists of two segments, the first one entirely
> linear, the payload of the second is in skb_shinfo(skb)->frags[0].
>
> It is not guaranteed that gso skbs built from two individual skbs end
> up looking like that. Only protocol headers in the linear segment and
> the payload of both in frags is common.
>
>> We can modify netif_needs_gso as another option to hit
>> skb_needs_linearize in validate_xmit_skb. But I think we should compare
>> skb->gso_size and skb->data_len too to check if mss exceed a payload
>> size.
>
> The rest of the stack does not build such gso packets with payload len
> < mss, so we should not have to add workarounds in the gso hot path
> for this.
>
> Also no need to linearize this skb. I think that if the bpf program
> would just clear the gso type, the packet would be sent correctly.
> Unless I'm missing something.

Does the checksum/len field in ip and tcp/udp header need adjusting
before clearing gso type as the packet has became bigger?

Also, instead of testing skb->data_len, may test the skb->len?

skb->len - (mac header + ip/ipv6 header + udp/tcp header) > mss + len_diff

>
> But I don't mean to argue that it should do that in production.
> Instead, not playing mss games would solve this and stay close to the
> original datapath if no bpf program had been present. Including
> maintaining the GSO invariant of sending out the same chain of packets
> as received (bar the IPv6 to IPv4 change).
>
> This could be achieved by adding support for the flag
> BPF_F_ADJ_ROOM_FIXED_GSO in the flags field of bpf_skb_change_proto.
> And similar to bpf_skb_net_shrink:
>
> /* Due to header shrink, MSS can be upgraded. */
> if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
> skb_increase_gso_size(shinfo, len_diff);
>
> The other case, from IPv4 to IPv6 is more difficult to address, as not
> reducing the MSS will result in packets exceeding MTU. That calls for
> workarounds like MSS clamping. Anyway, that is out of scope here.
>
>
>
>>>
>>>
>>>>>
>>>>> One simple solution if this packet no longer needs to be segmented
>>>>> might be to reset the gso_type completely.
>>>>
>>>> I am not sure gso_type can be cleared even when GSO is needed.
>>>>
>>>>>
>>>>> In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
>>>>> converting from IPv6 to IPv4, fixed gso will end up building packets
>>>>> that are slightly below the MTU. That opportunity cost is negligible
>>>>> (especially with TSO). Unfortunately, I see that that flag is
>>>>> available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.
>>>>>
>>>>>
>>>>>>>> would increse the gso_size to 1392. tcp_gso_segment will get an error
>>>>>>>> with 1380 <= 1392.
>>>>>>>>
>>>>>>>> Check for the size of GROed payload if it is really bigger than target
>>>>>>>> mss when increase mss.
>>>>>>>>
>>>>>>>> Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
>>>>>>>> Signed-off-by: Dongseok Yi <[email protected]>
>>>>>>>> ---
>>>>>>>> net/core/filter.c | 4 +++-
>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>>>>> index 9323d34..3f79e3c 100644
>>>>>>>> --- a/net/core/filter.c
>>>>>>>> +++ b/net/core/filter.c
>>>>>>>> @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
>>>>>>>> }
>>>>>>>>
>>>>>>>> /* Due to IPv4 header, MSS can be upgraded. */
>>>>>>>> - skb_increase_gso_size(shinfo, len_diff);
>>>>>>>> + if (skb->data_len > len_diff)
>>>>>>>
>>>>>>> Could you elaborate some more on what this has to do with data_len specifically
>>>>>>> here? I'm not sure I follow exactly your above commit description. Are you saying
>>>>>>> that you're hitting in tcp_gso_segment():
>>>>>>>
>>>>>>> [...]
>>>>>>> mss = skb_shinfo(skb)->gso_size;
>>>>>>> if (unlikely(skb->len <= mss))
>>>>>>> goto out;
>>>>>>> [...]
>>>>>>
>>>>>> Yes, right
>>>>>>
>>>>>>>
>>>>>>> Please provide more context on the bug, thanks!
>>>>>>
>>>>>> tcp_gso_segment():
>>>>>> [...]
>>>>>> __skb_pull(skb, thlen);
>>>>>>
>>>>>> mss = skb_shinfo(skb)->gso_size;
>>>>>> if (unlikely(skb->len <= mss))
>>>>>> [...]
>>>>>>
>>>>>> skb->len will have total GROed TCP payload size after __skb_pull.
>>>>>> skb->len <= mss will not be happened in a normal GROed situation. But
>>>>>> bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
>>>>>> hit an error condition.
>>>>>>
>>>>>> We should ensure the following condition.
>>>>>> total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
>>>>>>
>>>>>> Due to
>>>>>> total GROed TCP payload = the original mss + skb->data_len
>>>>>> IPv6 size - IPv4 size = len_diff
>>>>>>
>>>>>> Finally, we can get the condition.
>>>>>> skb->data_len > len_diff
>>>>>>
>>>>>>>
>>>>>>>> + skb_increase_gso_size(shinfo, len_diff);
>>>>>>>> +
>>>>>>>> /* Header must be checked, and gso_segs recomputed. */
>>>>>>>> shinfo->gso_type |= SKB_GSO_DODGY;
>>>>>>>> shinfo->gso_segs = 0;
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>
> .
>

2021-05-07 06:24:57

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Thu, May 6, 2021 at 9:45 PM Yunsheng Lin <[email protected]> wrote:
>
> On 2021/5/7 9:25, Willem de Bruijn wrote:
> >>>> head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
> >>>> data_len could be 8 if server sent a small size packet and it is GROed
> >>>> to head_skb.
> >>>>
> >>>> Please let me know if I am missing something.
> >>>
> >>> This is my understanding of the data path. This is a forwarding path
> >>> for TCP traffic.
> >>>
> >>> GRO is enabled and will coalesce multiple segments into a single large
> >>> packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
> >>> + 20.
> >>>
> >>> Somewhere between GRO and GSO you have a BPF program that converts the
> >>> IPv6 address to IPv4.
> >>
> >> Your understanding is right. The data path is GRO -> BPF 6 to 4 ->
> >> GSO.
> >>
> >>>
> >>> There is no concept of head_skb at the time of this BPF program. It is
> >>> a single SKB, with an skb linear part and multiple data items in the
> >>> frags (no frag_list).
> >>
> >> Sorry for the confusion. head_skb what I mentioned was a skb linear
> >> part. I'm considering a single SKB with frags too.
> >>
> >>>
> >>> When entering the GSO stack, this single skb now has a payload length
> >>> < MSS. So it would just make a valid TCP packet on its own?
> >>>
> >>> skb_gro_len is only relevant inside the GRO stack. It internally casts
> >>> the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
> >>> reused for other purposes later by other layers of the datapath. It is
> >>> not safe to read this inside bpf_skb_proto_6_to_4.
> >>
> >> The condition what I made uses skb->data_len not skb_gro_len. Does
> >> skb->data_len have a different meaning on each layer? As I know,
> >> data_len indicates the amount of frags or frag_list. skb->data_len
> >> should be > 20 in the sample case because the payload size of the skb
> >> linear part is the same with mss.
> >
> > Ah, got it.
> >
> > data_len is the length of the skb minus the length in the skb linear
> > section (as seen in skb_headlen).
> >
> > So this gso skb consists of two segments, the first one entirely
> > linear, the payload of the second is in skb_shinfo(skb)->frags[0].
> >
> > It is not guaranteed that gso skbs built from two individual skbs end
> > up looking like that. Only protocol headers in the linear segment and
> > the payload of both in frags is common.
> >
> >> We can modify netif_needs_gso as another option to hit
> >> skb_needs_linearize in validate_xmit_skb. But I think we should compare
> >> skb->gso_size and skb->data_len too to check if mss exceed a payload
> >> size.
> >
> > The rest of the stack does not build such gso packets with payload len
> > < mss, so we should not have to add workarounds in the gso hot path
> > for this.
> >
> > Also no need to linearize this skb. I think that if the bpf program
> > would just clear the gso type, the packet would be sent correctly.
> > Unless I'm missing something.
>
> Does the checksum/len field in ip and tcp/udp header need adjusting
> before clearing gso type as the packet has became bigger?

gro takes care of this. see for instance inet_gro_complete for updates
to the ip header.

> Also, instead of testing skb->data_len, may test the skb->len?
>
> skb->len - (mac header + ip/ipv6 header + udp/tcp header) > mss + len_diff

Yes. Essentially doing the same calculation as the gso code that is
causing the packet to be dropped.

> >
> > But I don't mean to argue that it should do that in production.
> > Instead, not playing mss games would solve this and stay close to the
> > original datapath if no bpf program had been present. Including
> > maintaining the GSO invariant of sending out the same chain of packets
> > as received (bar the IPv6 to IPv4 change).
> >
> > This could be achieved by adding support for the flag
> > BPF_F_ADJ_ROOM_FIXED_GSO in the flags field of bpf_skb_change_proto.
> > And similar to bpf_skb_net_shrink:
> >
> > /* Due to header shrink, MSS can be upgraded. */
> > if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
> > skb_increase_gso_size(shinfo, len_diff);
> >
> > The other case, from IPv4 to IPv6 is more difficult to address, as not
> > reducing the MSS will result in packets exceeding MTU. That calls for
> > workarounds like MSS clamping. Anyway, that is out of scope here.
> >
> >
> >
> >>>
> >>>
> >>>>>
> >>>>> One simple solution if this packet no longer needs to be segmented
> >>>>> might be to reset the gso_type completely.
> >>>>
> >>>> I am not sure gso_type can be cleared even when GSO is needed.
> >>>>
> >>>>>
> >>>>> In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
> >>>>> converting from IPv6 to IPv4, fixed gso will end up building packets
> >>>>> that are slightly below the MTU. That opportunity cost is negligible
> >>>>> (especially with TSO). Unfortunately, I see that that flag is
> >>>>> available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.
> >>>>>
> >>>>>
> >>>>>>>> would increse the gso_size to 1392. tcp_gso_segment will get an error
> >>>>>>>> with 1380 <= 1392.
> >>>>>>>>
> >>>>>>>> Check for the size of GROed payload if it is really bigger than target
> >>>>>>>> mss when increase mss.
> >>>>>>>>
> >>>>>>>> Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> >>>>>>>> Signed-off-by: Dongseok Yi <[email protected]>
> >>>>>>>> ---
> >>>>>>>> net/core/filter.c | 4 +++-
> >>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>>>>>>> index 9323d34..3f79e3c 100644
> >>>>>>>> --- a/net/core/filter.c
> >>>>>>>> +++ b/net/core/filter.c
> >>>>>>>> @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> /* Due to IPv4 header, MSS can be upgraded. */
> >>>>>>>> - skb_increase_gso_size(shinfo, len_diff);
> >>>>>>>> + if (skb->data_len > len_diff)
> >>>>>>>
> >>>>>>> Could you elaborate some more on what this has to do with data_len specifically
> >>>>>>> here? I'm not sure I follow exactly your above commit description. Are you saying
> >>>>>>> that you're hitting in tcp_gso_segment():
> >>>>>>>
> >>>>>>> [...]
> >>>>>>> mss = skb_shinfo(skb)->gso_size;
> >>>>>>> if (unlikely(skb->len <= mss))
> >>>>>>> goto out;
> >>>>>>> [...]
> >>>>>>
> >>>>>> Yes, right
> >>>>>>
> >>>>>>>
> >>>>>>> Please provide more context on the bug, thanks!
> >>>>>>
> >>>>>> tcp_gso_segment():
> >>>>>> [...]
> >>>>>> __skb_pull(skb, thlen);
> >>>>>>
> >>>>>> mss = skb_shinfo(skb)->gso_size;
> >>>>>> if (unlikely(skb->len <= mss))
> >>>>>> [...]
> >>>>>>
> >>>>>> skb->len will have total GROed TCP payload size after __skb_pull.
> >>>>>> skb->len <= mss will not be happened in a normal GROed situation. But
> >>>>>> bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
> >>>>>> hit an error condition.
> >>>>>>
> >>>>>> We should ensure the following condition.
> >>>>>> total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
> >>>>>>
> >>>>>> Due to
> >>>>>> total GROed TCP payload = the original mss + skb->data_len
> >>>>>> IPv6 size - IPv4 size = len_diff
> >>>>>>
> >>>>>> Finally, we can get the condition.
> >>>>>> skb->data_len > len_diff
> >>>>>>
> >>>>>>>
> >>>>>>>> + skb_increase_gso_size(shinfo, len_diff);
> >>>>>>>> +
> >>>>>>>> /* Header must be checked, and gso_segs recomputed. */
> >>>>>>>> shinfo->gso_type |= SKB_GSO_DODGY;
> >>>>>>>> shinfo->gso_segs = 0;
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> >
> > .
> >
>

2021-05-07 11:41:31

by Dongseok Yi

[permalink] [raw]
Subject: RE: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Thu, May 06, 2021 at 09:53:45PM -0400, Willem de Bruijn wrote:
> On Thu, May 6, 2021 at 9:45 PM Yunsheng Lin <[email protected]> wrote:
> >
> > On 2021/5/7 9:25, Willem de Bruijn wrote:
> > >>>> head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
> > >>>> data_len could be 8 if server sent a small size packet and it is GROed
> > >>>> to head_skb.
> > >>>>
> > >>>> Please let me know if I am missing something.
> > >>>
> > >>> This is my understanding of the data path. This is a forwarding path
> > >>> for TCP traffic.
> > >>>
> > >>> GRO is enabled and will coalesce multiple segments into a single large
> > >>> packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
> > >>> + 20.
> > >>>
> > >>> Somewhere between GRO and GSO you have a BPF program that converts the
> > >>> IPv6 address to IPv4.
> > >>
> > >> Your understanding is right. The data path is GRO -> BPF 6 to 4 ->
> > >> GSO.
> > >>
> > >>>
> > >>> There is no concept of head_skb at the time of this BPF program. It is
> > >>> a single SKB, with an skb linear part and multiple data items in the
> > >>> frags (no frag_list).
> > >>
> > >> Sorry for the confusion. head_skb what I mentioned was a skb linear
> > >> part. I'm considering a single SKB with frags too.
> > >>
> > >>>
> > >>> When entering the GSO stack, this single skb now has a payload length
> > >>> < MSS. So it would just make a valid TCP packet on its own?
> > >>>
> > >>> skb_gro_len is only relevant inside the GRO stack. It internally casts
> > >>> the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
> > >>> reused for other purposes later by other layers of the datapath. It is
> > >>> not safe to read this inside bpf_skb_proto_6_to_4.
> > >>
> > >> The condition what I made uses skb->data_len not skb_gro_len. Does
> > >> skb->data_len have a different meaning on each layer? As I know,
> > >> data_len indicates the amount of frags or frag_list. skb->data_len
> > >> should be > 20 in the sample case because the payload size of the skb
> > >> linear part is the same with mss.
> > >
> > > Ah, got it.
> > >
> > > data_len is the length of the skb minus the length in the skb linear
> > > section (as seen in skb_headlen).
> > >
> > > So this gso skb consists of two segments, the first one entirely
> > > linear, the payload of the second is in skb_shinfo(skb)->frags[0].
> > >
> > > It is not guaranteed that gso skbs built from two individual skbs end
> > > up looking like that. Only protocol headers in the linear segment and
> > > the payload of both in frags is common.
> > >
> > >> We can modify netif_needs_gso as another option to hit
> > >> skb_needs_linearize in validate_xmit_skb. But I think we should compare
> > >> skb->gso_size and skb->data_len too to check if mss exceed a payload
> > >> size.
> > >
> > > The rest of the stack does not build such gso packets with payload len
> > > < mss, so we should not have to add workarounds in the gso hot path
> > > for this.
> > >
> > > Also no need to linearize this skb. I think that if the bpf program
> > > would just clear the gso type, the packet would be sent correctly.
> > > Unless I'm missing something.
> >
> > Does the checksum/len field in ip and tcp/udp header need adjusting
> > before clearing gso type as the packet has became bigger?
>
> gro takes care of this. see for instance inet_gro_complete for updates
> to the ip header.

I think clearing the gso type will get an error at tcp4_gso_segment
because netif_needs_gso returns true in validate_xmit_skb.

>
> > Also, instead of testing skb->data_len, may test the skb->len?
> >
> > skb->len - (mac header + ip/ipv6 header + udp/tcp header) > mss + len_diff
>
> Yes. Essentially doing the same calculation as the gso code that is
> causing the packet to be dropped.

BPF program is usually out of control. Can we take a general approach?
The below 2 cases has no issue when mss upgrading.
1) skb->data_len > mss + 20
2) skb->data_len < mss && skb->data_len > 20
The corner case is when
3) skb->data_len > mss && skb->data_len < mss + 20

But to cover #3 case, we should check the condition Yunsheng Lin said.
What if we do mss upgrading for both #1 and #2 cases only?

+ unsigned short off_len = skb->data_len > shinfo->gso_size ?
+ shinfo->gso_size : 0;
[...]
/* Due to IPv4 header, MSS can be upgraded. */
- skb_increase_gso_size(shinfo, len_diff);
+ if (skb->data_len - off_len > len_diff)
+ skb_increase_gso_size(shinfo, len_diff);

>
> > >
> > > But I don't mean to argue that it should do that in production.
> > > Instead, not playing mss games would solve this and stay close to the
> > > original datapath if no bpf program had been present. Including
> > > maintaining the GSO invariant of sending out the same chain of packets
> > > as received (bar the IPv6 to IPv4 change).
> > >
> > > This could be achieved by adding support for the flag
> > > BPF_F_ADJ_ROOM_FIXED_GSO in the flags field of bpf_skb_change_proto.
> > > And similar to bpf_skb_net_shrink:
> > >
> > > /* Due to header shrink, MSS can be upgraded. */
> > > if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
> > > skb_increase_gso_size(shinfo, len_diff);
> > >
> > > The other case, from IPv4 to IPv6 is more difficult to address, as not
> > > reducing the MSS will result in packets exceeding MTU. That calls for
> > > workarounds like MSS clamping. Anyway, that is out of scope here.
> > >
> > >
> > >
> > >>>
> > >>>
> > >>>>>
> > >>>>> One simple solution if this packet no longer needs to be segmented
> > >>>>> might be to reset the gso_type completely.
> > >>>>
> > >>>> I am not sure gso_type can be cleared even when GSO is needed.
> > >>>>
> > >>>>>
> > >>>>> In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
> > >>>>> converting from IPv6 to IPv4, fixed gso will end up building packets
> > >>>>> that are slightly below the MTU. That opportunity cost is negligible
> > >>>>> (especially with TSO). Unfortunately, I see that that flag is
> > >>>>> available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.
> > >>>>>
> > >>>>>
> > >>>>>>>> would increse the gso_size to 1392. tcp_gso_segment will get an error
> > >>>>>>>> with 1380 <= 1392.
> > >>>>>>>>
> > >>>>>>>> Check for the size of GROed payload if it is really bigger than target
> > >>>>>>>> mss when increase mss.
> > >>>>>>>>
> > >>>>>>>> Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> > >>>>>>>> Signed-off-by: Dongseok Yi <[email protected]>
> > >>>>>>>> ---
> > >>>>>>>> net/core/filter.c | 4 +++-
> > >>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
> > >>>>>>>>
> > >>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
> > >>>>>>>> index 9323d34..3f79e3c 100644
> > >>>>>>>> --- a/net/core/filter.c
> > >>>>>>>> +++ b/net/core/filter.c
> > >>>>>>>> @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > >>>>>>>> }
> > >>>>>>>>
> > >>>>>>>> /* Due to IPv4 header, MSS can be upgraded. */
> > >>>>>>>> - skb_increase_gso_size(shinfo, len_diff);
> > >>>>>>>> + if (skb->data_len > len_diff)
> > >>>>>>>
> > >>>>>>> Could you elaborate some more on what this has to do with data_len specifically
> > >>>>>>> here? I'm not sure I follow exactly your above commit description. Are you saying
> > >>>>>>> that you're hitting in tcp_gso_segment():
> > >>>>>>>
> > >>>>>>> [...]
> > >>>>>>> mss = skb_shinfo(skb)->gso_size;
> > >>>>>>> if (unlikely(skb->len <= mss))
> > >>>>>>> goto out;
> > >>>>>>> [...]
> > >>>>>>
> > >>>>>> Yes, right
> > >>>>>>
> > >>>>>>>
> > >>>>>>> Please provide more context on the bug, thanks!
> > >>>>>>
> > >>>>>> tcp_gso_segment():
> > >>>>>> [...]
> > >>>>>> __skb_pull(skb, thlen);
> > >>>>>>
> > >>>>>> mss = skb_shinfo(skb)->gso_size;
> > >>>>>> if (unlikely(skb->len <= mss))
> > >>>>>> [...]
> > >>>>>>
> > >>>>>> skb->len will have total GROed TCP payload size after __skb_pull.
> > >>>>>> skb->len <= mss will not be happened in a normal GROed situation. But
> > >>>>>> bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
> > >>>>>> hit an error condition.
> > >>>>>>
> > >>>>>> We should ensure the following condition.
> > >>>>>> total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
> > >>>>>>
> > >>>>>> Due to
> > >>>>>> total GROed TCP payload = the original mss + skb->data_len
> > >>>>>> IPv6 size - IPv4 size = len_diff
> > >>>>>>
> > >>>>>> Finally, we can get the condition.
> > >>>>>> skb->data_len > len_diff
> > >>>>>>
> > >>>>>>>
> > >>>>>>>> + skb_increase_gso_size(shinfo, len_diff);
> > >>>>>>>> +
> > >>>>>>>> /* Header must be checked, and gso_segs recomputed. */
> > >>>>>>>> shinfo->gso_type |= SKB_GSO_DODGY;
> > >>>>>>>> shinfo->gso_segs = 0;
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>
> > >
> > > .
> > >
> >

2021-05-07 12:45:45

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On 2021/5/7 16:25, Dongseok Yi wrote:
> On Thu, May 06, 2021 at 09:53:45PM -0400, Willem de Bruijn wrote:
>> On Thu, May 6, 2021 at 9:45 PM Yunsheng Lin <[email protected]> wrote:
>>>
>>> On 2021/5/7 9:25, Willem de Bruijn wrote:
>>>>>>> head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
>>>>>>> data_len could be 8 if server sent a small size packet and it is GROed
>>>>>>> to head_skb.
>>>>>>>
>>>>>>> Please let me know if I am missing something.
>>>>>>
>>>>>> This is my understanding of the data path. This is a forwarding path
>>>>>> for TCP traffic.
>>>>>>
>>>>>> GRO is enabled and will coalesce multiple segments into a single large
>>>>>> packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
>>>>>> + 20.
>>>>>>
>>>>>> Somewhere between GRO and GSO you have a BPF program that converts the
>>>>>> IPv6 address to IPv4.
>>>>>
>>>>> Your understanding is right. The data path is GRO -> BPF 6 to 4 ->
>>>>> GSO.
>>>>>
>>>>>>
>>>>>> There is no concept of head_skb at the time of this BPF program. It is
>>>>>> a single SKB, with an skb linear part and multiple data items in the
>>>>>> frags (no frag_list).
>>>>>
>>>>> Sorry for the confusion. head_skb what I mentioned was a skb linear
>>>>> part. I'm considering a single SKB with frags too.
>>>>>
>>>>>>
>>>>>> When entering the GSO stack, this single skb now has a payload length
>>>>>> < MSS. So it would just make a valid TCP packet on its own?
>>>>>>
>>>>>> skb_gro_len is only relevant inside the GRO stack. It internally casts
>>>>>> the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
>>>>>> reused for other purposes later by other layers of the datapath. It is
>>>>>> not safe to read this inside bpf_skb_proto_6_to_4.
>>>>>
>>>>> The condition what I made uses skb->data_len not skb_gro_len. Does
>>>>> skb->data_len have a different meaning on each layer? As I know,
>>>>> data_len indicates the amount of frags or frag_list. skb->data_len
>>>>> should be > 20 in the sample case because the payload size of the skb
>>>>> linear part is the same with mss.
>>>>
>>>> Ah, got it.
>>>>
>>>> data_len is the length of the skb minus the length in the skb linear
>>>> section (as seen in skb_headlen).
>>>>
>>>> So this gso skb consists of two segments, the first one entirely
>>>> linear, the payload of the second is in skb_shinfo(skb)->frags[0].
>>>>
>>>> It is not guaranteed that gso skbs built from two individual skbs end
>>>> up looking like that. Only protocol headers in the linear segment and
>>>> the payload of both in frags is common.
>>>>
>>>>> We can modify netif_needs_gso as another option to hit
>>>>> skb_needs_linearize in validate_xmit_skb. But I think we should compare
>>>>> skb->gso_size and skb->data_len too to check if mss exceed a payload
>>>>> size.
>>>>
>>>> The rest of the stack does not build such gso packets with payload len
>>>> < mss, so we should not have to add workarounds in the gso hot path
>>>> for this.
>>>>
>>>> Also no need to linearize this skb. I think that if the bpf program
>>>> would just clear the gso type, the packet would be sent correctly.
>>>> Unless I'm missing something.
>>>
>>> Does the checksum/len field in ip and tcp/udp header need adjusting
>>> before clearing gso type as the packet has became bigger?
>>
>> gro takes care of this. see for instance inet_gro_complete for updates
>> to the ip header.
>
> I think clearing the gso type will get an error at tcp4_gso_segment
> because netif_needs_gso returns true in validate_xmit_skb.

So the bpf_skb_proto_6_to_4() is called after validate_xmit_skb() and
before tcp4_gso_segment()?
If Yes, clearing the gso type here does not seem to help.

>
>>
>>> Also, instead of testing skb->data_len, may test the skb->len?
>>>
>>> skb->len - (mac header + ip/ipv6 header + udp/tcp header) > mss + len_diff
>>
>> Yes. Essentially doing the same calculation as the gso code that is
>> causing the packet to be dropped.
>
> BPF program is usually out of control. Can we take a general approach?
> The below 2 cases has no issue when mss upgrading.
> 1) skb->data_len > mss + 20
> 2) skb->data_len < mss && skb->data_len > 20
> The corner case is when
> 3) skb->data_len > mss && skb->data_len < mss + 20

As my understanding:

Usually skb_headlen(skb) >= (mac header + ip/ipv6 header + udp/tcp header),
other than that, there is no other guarantee as long as:
skb->len = skb_headlen(skb) + skb->data_len

So the cases should be:
1. skb->len - (mac header + ip/ipv6 header + udp/tcp header) > mss + len_diff
2. skb->len - (mac header + ip/ipv6 header + udp/tcp header) <= mss + len_diff

The corner case is case 2.

>
> But to cover #3 case, we should check the condition Yunsheng Lin said.
> What if we do mss upgrading for both #1 and #2 cases only?
>
> + unsigned short off_len = skb->data_len > shinfo->gso_size ?
> + shinfo->gso_size : 0;
> [...]
> /* Due to IPv4 header, MSS can be upgraded. */
> - skb_increase_gso_size(shinfo, len_diff);
> + if (skb->data_len - off_len > len_diff)
> + skb_increase_gso_size(shinfo, len_diff);
>
>>
>>>>
>>>> But I don't mean to argue that it should do that in production.
>>>> Instead, not playing mss games would solve this and stay close to the
>>>> original datapath if no bpf program had been present. Including
>>>> maintaining the GSO invariant of sending out the same chain of packets
>>>> as received (bar the IPv6 to IPv4 change).
>>>>
>>>> This could be achieved by adding support for the flag
>>>> BPF_F_ADJ_ROOM_FIXED_GSO in the flags field of bpf_skb_change_proto.
>>>> And similar to bpf_skb_net_shrink:
>>>>
>>>> /* Due to header shrink, MSS can be upgraded. */
>>>> if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
>>>> skb_increase_gso_size(shinfo, len_diff);
>>>>
>>>> The other case, from IPv4 to IPv6 is more difficult to address, as not
>>>> reducing the MSS will result in packets exceeding MTU. That calls for
>>>> workarounds like MSS clamping. Anyway, that is out of scope here.
>>>>
>>>>
>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> One simple solution if this packet no longer needs to be segmented
>>>>>>>> might be to reset the gso_type completely.
>>>>>>>
>>>>>>> I am not sure gso_type can be cleared even when GSO is needed.
>>>>>>>
>>>>>>>>
>>>>>>>> In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
>>>>>>>> converting from IPv6 to IPv4, fixed gso will end up building packets
>>>>>>>> that are slightly below the MTU. That opportunity cost is negligible
>>>>>>>> (especially with TSO). Unfortunately, I see that that flag is
>>>>>>>> available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> would increse the gso_size to 1392. tcp_gso_segment will get an error
>>>>>>>>>>> with 1380 <= 1392.
>>>>>>>>>>>
>>>>>>>>>>> Check for the size of GROed payload if it is really bigger than target
>>>>>>>>>>> mss when increase mss.
>>>>>>>>>>>
>>>>>>>>>>> Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
>>>>>>>>>>> Signed-off-by: Dongseok Yi <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>> net/core/filter.c | 4 +++-
>>>>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>>>>>>>> index 9323d34..3f79e3c 100644
>>>>>>>>>>> --- a/net/core/filter.c
>>>>>>>>>>> +++ b/net/core/filter.c
>>>>>>>>>>> @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> /* Due to IPv4 header, MSS can be upgraded. */
>>>>>>>>>>> - skb_increase_gso_size(shinfo, len_diff);
>>>>>>>>>>> + if (skb->data_len > len_diff)
>>>>>>>>>>
>>>>>>>>>> Could you elaborate some more on what this has to do with data_len specifically
>>>>>>>>>> here? I'm not sure I follow exactly your above commit description. Are you saying
>>>>>>>>>> that you're hitting in tcp_gso_segment():
>>>>>>>>>>
>>>>>>>>>> [...]
>>>>>>>>>> mss = skb_shinfo(skb)->gso_size;
>>>>>>>>>> if (unlikely(skb->len <= mss))
>>>>>>>>>> goto out;
>>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> Yes, right
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please provide more context on the bug, thanks!
>>>>>>>>>
>>>>>>>>> tcp_gso_segment():
>>>>>>>>> [...]
>>>>>>>>> __skb_pull(skb, thlen);
>>>>>>>>>
>>>>>>>>> mss = skb_shinfo(skb)->gso_size;
>>>>>>>>> if (unlikely(skb->len <= mss))
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> skb->len will have total GROed TCP payload size after __skb_pull.
>>>>>>>>> skb->len <= mss will not be happened in a normal GROed situation. But
>>>>>>>>> bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
>>>>>>>>> hit an error condition.
>>>>>>>>>
>>>>>>>>> We should ensure the following condition.
>>>>>>>>> total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
>>>>>>>>>
>>>>>>>>> Due to
>>>>>>>>> total GROed TCP payload = the original mss + skb->data_len
>>>>>>>>> IPv6 size - IPv4 size = len_diff
>>>>>>>>>
>>>>>>>>> Finally, we can get the condition.
>>>>>>>>> skb->data_len > len_diff
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> + skb_increase_gso_size(shinfo, len_diff);
>>>>>>>>>>> +
>>>>>>>>>>> /* Header must be checked, and gso_segs recomputed. */
>>>>>>>>>>> shinfo->gso_type |= SKB_GSO_DODGY;
>>>>>>>>>>> shinfo->gso_segs = 0;
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>> .
>>>>
>>>
>
>
> .
>

2021-05-07 15:20:30

by Dongseok Yi

[permalink] [raw]
Subject: RE: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Fri, May 07, 2021 at 05:11:20PM +0800, Yunsheng Lin wrote:
> On 2021/5/7 16:25, Dongseok Yi wrote:
> > On Thu, May 06, 2021 at 09:53:45PM -0400, Willem de Bruijn wrote:
> >> On Thu, May 6, 2021 at 9:45 PM Yunsheng Lin <[email protected]> wrote:
> >>>
> >>> On 2021/5/7 9:25, Willem de Bruijn wrote:
> >>>>>>> head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
> >>>>>>> data_len could be 8 if server sent a small size packet and it is GROed
> >>>>>>> to head_skb.
> >>>>>>>
> >>>>>>> Please let me know if I am missing something.
> >>>>>>
> >>>>>> This is my understanding of the data path. This is a forwarding path
> >>>>>> for TCP traffic.
> >>>>>>
> >>>>>> GRO is enabled and will coalesce multiple segments into a single large
> >>>>>> packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
> >>>>>> + 20.
> >>>>>>
> >>>>>> Somewhere between GRO and GSO you have a BPF program that converts the
> >>>>>> IPv6 address to IPv4.
> >>>>>
> >>>>> Your understanding is right. The data path is GRO -> BPF 6 to 4 ->
> >>>>> GSO.
> >>>>>
> >>>>>>
> >>>>>> There is no concept of head_skb at the time of this BPF program. It is
> >>>>>> a single SKB, with an skb linear part and multiple data items in the
> >>>>>> frags (no frag_list).
> >>>>>
> >>>>> Sorry for the confusion. head_skb what I mentioned was a skb linear
> >>>>> part. I'm considering a single SKB with frags too.
> >>>>>
> >>>>>>
> >>>>>> When entering the GSO stack, this single skb now has a payload length
> >>>>>> < MSS. So it would just make a valid TCP packet on its own?
> >>>>>>
> >>>>>> skb_gro_len is only relevant inside the GRO stack. It internally casts
> >>>>>> the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
> >>>>>> reused for other purposes later by other layers of the datapath. It is
> >>>>>> not safe to read this inside bpf_skb_proto_6_to_4.
> >>>>>
> >>>>> The condition what I made uses skb->data_len not skb_gro_len. Does
> >>>>> skb->data_len have a different meaning on each layer? As I know,
> >>>>> data_len indicates the amount of frags or frag_list. skb->data_len
> >>>>> should be > 20 in the sample case because the payload size of the skb
> >>>>> linear part is the same with mss.
> >>>>
> >>>> Ah, got it.
> >>>>
> >>>> data_len is the length of the skb minus the length in the skb linear
> >>>> section (as seen in skb_headlen).
> >>>>
> >>>> So this gso skb consists of two segments, the first one entirely
> >>>> linear, the payload of the second is in skb_shinfo(skb)->frags[0].
> >>>>
> >>>> It is not guaranteed that gso skbs built from two individual skbs end
> >>>> up looking like that. Only protocol headers in the linear segment and
> >>>> the payload of both in frags is common.
> >>>>
> >>>>> We can modify netif_needs_gso as another option to hit
> >>>>> skb_needs_linearize in validate_xmit_skb. But I think we should compare
> >>>>> skb->gso_size and skb->data_len too to check if mss exceed a payload
> >>>>> size.
> >>>>
> >>>> The rest of the stack does not build such gso packets with payload len
> >>>> < mss, so we should not have to add workarounds in the gso hot path
> >>>> for this.
> >>>>
> >>>> Also no need to linearize this skb. I think that if the bpf program
> >>>> would just clear the gso type, the packet would be sent correctly.
> >>>> Unless I'm missing something.
> >>>
> >>> Does the checksum/len field in ip and tcp/udp header need adjusting
> >>> before clearing gso type as the packet has became bigger?
> >>
> >> gro takes care of this. see for instance inet_gro_complete for updates
> >> to the ip header.
> >
> > I think clearing the gso type will get an error at tcp4_gso_segment
> > because netif_needs_gso returns true in validate_xmit_skb.
>
> So the bpf_skb_proto_6_to_4() is called after validate_xmit_skb() and
> before tcp4_gso_segment()?
> If Yes, clearing the gso type here does not seem to help.

The order what I checked is bpf_skb_proto_6_to_4() ->
validate_xmit_skb() -> tcp4_gso_segment().

>
> >
> >>
> >>> Also, instead of testing skb->data_len, may test the skb->len?
> >>>
> >>> skb->len - (mac header + ip/ipv6 header + udp/tcp header) > mss + len_diff
> >>
> >> Yes. Essentially doing the same calculation as the gso code that is
> >> causing the packet to be dropped.
> >
> > BPF program is usually out of control. Can we take a general approach?
> > The below 2 cases has no issue when mss upgrading.
> > 1) skb->data_len > mss + 20
> > 2) skb->data_len < mss && skb->data_len > 20
> > The corner case is when
> > 3) skb->data_len > mss && skb->data_len < mss + 20
>
> As my understanding:
>
> Usually skb_headlen(skb) >= (mac header + ip/ipv6 header + udp/tcp header),
> other than that, there is no other guarantee as long as:
> skb->len = skb_headlen(skb) + skb->data_len
>
> So the cases should be:
> 1. skb->len - (mac header + ip/ipv6 header + udp/tcp header) > mss + len_diff
> 2. skb->len - (mac header + ip/ipv6 header + udp/tcp header) <= mss + len_diff
>
> The corner case is case 2.

I agree. In addition,
skbs which hits skb_increase_gso_size in bpf_skb_proto_6_to_4 are all
IPv6 + TCP by (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) condition. So
(mac header + ip/ipv6 header + udp/tcp header) can be
(mac header + ipv6 header + tcp header). But I thick Willem de Bruijn
would not want to check such network payloads in the BPF step.

>
> >
> > But to cover #3 case, we should check the condition Yunsheng Lin said.
> > What if we do mss upgrading for both #1 and #2 cases only?
> >
> > + unsigned short off_len = skb->data_len > shinfo->gso_size ?
> > + shinfo->gso_size : 0;
> > [...]
> > /* Due to IPv4 header, MSS can be upgraded. */
> > - skb_increase_gso_size(shinfo, len_diff);
> > + if (skb->data_len - off_len > len_diff)
> > + skb_increase_gso_size(shinfo, len_diff);
> >
> >>
> >>>>
> >>>> But I don't mean to argue that it should do that in production.
> >>>> Instead, not playing mss games would solve this and stay close to the
> >>>> original datapath if no bpf program had been present. Including
> >>>> maintaining the GSO invariant of sending out the same chain of packets
> >>>> as received (bar the IPv6 to IPv4 change).
> >>>>
> >>>> This could be achieved by adding support for the flag
> >>>> BPF_F_ADJ_ROOM_FIXED_GSO in the flags field of bpf_skb_change_proto.
> >>>> And similar to bpf_skb_net_shrink:
> >>>>
> >>>> /* Due to header shrink, MSS can be upgraded. */
> >>>> if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
> >>>> skb_increase_gso_size(shinfo, len_diff);
> >>>>
> >>>> The other case, from IPv4 to IPv6 is more difficult to address, as not
> >>>> reducing the MSS will result in packets exceeding MTU. That calls for
> >>>> workarounds like MSS clamping. Anyway, that is out of scope here.
> >>>>
> >>>>
> >>>>
> >>>>>>
> >>>>>>
> >>>>>>>>
> >>>>>>>> One simple solution if this packet no longer needs to be segmented
> >>>>>>>> might be to reset the gso_type completely.
> >>>>>>>
> >>>>>>> I am not sure gso_type can be cleared even when GSO is needed.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> In general, I would advocate using BPF_F_ADJ_ROOM_FIXED_GSO. When
> >>>>>>>> converting from IPv6 to IPv4, fixed gso will end up building packets
> >>>>>>>> that are slightly below the MTU. That opportunity cost is negligible
> >>>>>>>> (especially with TSO). Unfortunately, I see that that flag is
> >>>>>>>> available for bpf_skb_adjust_room but not for bpf_skb_proto_6_to_4.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>> would increse the gso_size to 1392. tcp_gso_segment will get an error
> >>>>>>>>>>> with 1380 <= 1392.
> >>>>>>>>>>>
> >>>>>>>>>>> Check for the size of GROed payload if it is really bigger than target
> >>>>>>>>>>> mss when increase mss.
> >>>>>>>>>>>
> >>>>>>>>>>> Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> >>>>>>>>>>> Signed-off-by: Dongseok Yi <[email protected]>
> >>>>>>>>>>> ---
> >>>>>>>>>>> net/core/filter.c | 4 +++-
> >>>>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>>>>>>>>>> index 9323d34..3f79e3c 100644
> >>>>>>>>>>> --- a/net/core/filter.c
> >>>>>>>>>>> +++ b/net/core/filter.c
> >>>>>>>>>>> @@ -3308,7 +3308,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> /* Due to IPv4 header, MSS can be upgraded. */
> >>>>>>>>>>> - skb_increase_gso_size(shinfo, len_diff);
> >>>>>>>>>>> + if (skb->data_len > len_diff)
> >>>>>>>>>>
> >>>>>>>>>> Could you elaborate some more on what this has to do with data_len specifically
> >>>>>>>>>> here? I'm not sure I follow exactly your above commit description. Are you saying
> >>>>>>>>>> that you're hitting in tcp_gso_segment():
> >>>>>>>>>>
> >>>>>>>>>> [...]
> >>>>>>>>>> mss = skb_shinfo(skb)->gso_size;
> >>>>>>>>>> if (unlikely(skb->len <= mss))
> >>>>>>>>>> goto out;
> >>>>>>>>>> [...]
> >>>>>>>>>
> >>>>>>>>> Yes, right
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Please provide more context on the bug, thanks!
> >>>>>>>>>
> >>>>>>>>> tcp_gso_segment():
> >>>>>>>>> [...]
> >>>>>>>>> __skb_pull(skb, thlen);
> >>>>>>>>>
> >>>>>>>>> mss = skb_shinfo(skb)->gso_size;
> >>>>>>>>> if (unlikely(skb->len <= mss))
> >>>>>>>>> [...]
> >>>>>>>>>
> >>>>>>>>> skb->len will have total GROed TCP payload size after __skb_pull.
> >>>>>>>>> skb->len <= mss will not be happened in a normal GROed situation. But
> >>>>>>>>> bpf_skb_proto_6_to_4 would upgrade MSS by increasing gso_size, it can
> >>>>>>>>> hit an error condition.
> >>>>>>>>>
> >>>>>>>>> We should ensure the following condition.
> >>>>>>>>> total GROed TCP payload > the original mss + (IPv6 size - IPv4 size)
> >>>>>>>>>
> >>>>>>>>> Due to
> >>>>>>>>> total GROed TCP payload = the original mss + skb->data_len
> >>>>>>>>> IPv6 size - IPv4 size = len_diff
> >>>>>>>>>
> >>>>>>>>> Finally, we can get the condition.
> >>>>>>>>> skb->data_len > len_diff
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> + skb_increase_gso_size(shinfo, len_diff);
> >>>>>>>>>>> +
> >>>>>>>>>>> /* Header must be checked, and gso_segs recomputed. */
> >>>>>>>>>>> shinfo->gso_type |= SKB_GSO_DODGY;
> >>>>>>>>>>> shinfo->gso_segs = 0;
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>> .
> >>>>
> >>>
> >
> >
> > .
> >


2021-05-07 17:52:10

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Fri, May 7, 2021 at 4:25 AM Dongseok Yi <[email protected]> wrote:
>
> On Thu, May 06, 2021 at 09:53:45PM -0400, Willem de Bruijn wrote:
> > On Thu, May 6, 2021 at 9:45 PM Yunsheng Lin <[email protected]> wrote:
> > >
> > > On 2021/5/7 9:25, Willem de Bruijn wrote:
> > > >>>> head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
> > > >>>> data_len could be 8 if server sent a small size packet and it is GROed
> > > >>>> to head_skb.
> > > >>>>
> > > >>>> Please let me know if I am missing something.
> > > >>>
> > > >>> This is my understanding of the data path. This is a forwarding path
> > > >>> for TCP traffic.
> > > >>>
> > > >>> GRO is enabled and will coalesce multiple segments into a single large
> > > >>> packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
> > > >>> + 20.
> > > >>>
> > > >>> Somewhere between GRO and GSO you have a BPF program that converts the
> > > >>> IPv6 address to IPv4.
> > > >>
> > > >> Your understanding is right. The data path is GRO -> BPF 6 to 4 ->
> > > >> GSO.
> > > >>
> > > >>>
> > > >>> There is no concept of head_skb at the time of this BPF program. It is
> > > >>> a single SKB, with an skb linear part and multiple data items in the
> > > >>> frags (no frag_list).
> > > >>
> > > >> Sorry for the confusion. head_skb what I mentioned was a skb linear
> > > >> part. I'm considering a single SKB with frags too.
> > > >>
> > > >>>
> > > >>> When entering the GSO stack, this single skb now has a payload length
> > > >>> < MSS. So it would just make a valid TCP packet on its own?
> > > >>>
> > > >>> skb_gro_len is only relevant inside the GRO stack. It internally casts
> > > >>> the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
> > > >>> reused for other purposes later by other layers of the datapath. It is
> > > >>> not safe to read this inside bpf_skb_proto_6_to_4.
> > > >>
> > > >> The condition what I made uses skb->data_len not skb_gro_len. Does
> > > >> skb->data_len have a different meaning on each layer? As I know,
> > > >> data_len indicates the amount of frags or frag_list. skb->data_len
> > > >> should be > 20 in the sample case because the payload size of the skb
> > > >> linear part is the same with mss.
> > > >
> > > > Ah, got it.
> > > >
> > > > data_len is the length of the skb minus the length in the skb linear
> > > > section (as seen in skb_headlen).
> > > >
> > > > So this gso skb consists of two segments, the first one entirely
> > > > linear, the payload of the second is in skb_shinfo(skb)->frags[0].
> > > >
> > > > It is not guaranteed that gso skbs built from two individual skbs end
> > > > up looking like that. Only protocol headers in the linear segment and
> > > > the payload of both in frags is common.
> > > >
> > > >> We can modify netif_needs_gso as another option to hit
> > > >> skb_needs_linearize in validate_xmit_skb. But I think we should compare
> > > >> skb->gso_size and skb->data_len too to check if mss exceed a payload
> > > >> size.
> > > >
> > > > The rest of the stack does not build such gso packets with payload len
> > > > < mss, so we should not have to add workarounds in the gso hot path
> > > > for this.
> > > >
> > > > Also no need to linearize this skb. I think that if the bpf program
> > > > would just clear the gso type, the packet would be sent correctly.
> > > > Unless I'm missing something.
> > >
> > > Does the checksum/len field in ip and tcp/udp header need adjusting
> > > before clearing gso type as the packet has became bigger?
> >
> > gro takes care of this. see for instance inet_gro_complete for updates
> > to the ip header.
>
> I think clearing the gso type will get an error at tcp4_gso_segment
> because netif_needs_gso returns true in validate_xmit_skb.

Oh right. Whether a packet is gso is defined by gso_size being
non-zero, not by gso_type.

> >
> > > Also, instead of testing skb->data_len, may test the skb->len?
> > >
> > > skb->len - (mac header + ip/ipv6 header + udp/tcp header) > mss + len_diff
> >
> > Yes. Essentially doing the same calculation as the gso code that is
> > causing the packet to be dropped.
>
> BPF program is usually out of control. Can we take a general approach?
> The below 2 cases has no issue when mss upgrading.
> 1) skb->data_len > mss + 20
> 2) skb->data_len < mss && skb->data_len > 20
> The corner case is when
> 3) skb->data_len > mss && skb->data_len < mss + 20

Again, you cannot use skb->data_len alone to make inferences about the
size of the second packet.

>
> But to cover #3 case, we should check the condition Yunsheng Lin said.
> What if we do mss upgrading for both #1 and #2 cases only?
>
> + unsigned short off_len = skb->data_len > shinfo->gso_size ?
> + shinfo->gso_size : 0;
> [...]
> /* Due to IPv4 header, MSS can be upgraded. */
> - skb_increase_gso_size(shinfo, len_diff);
> + if (skb->data_len - off_len > len_diff)
> + skb_increase_gso_size(shinfo, len_diff);

That generates TCP packets with different MSS within the same stream.

My suggestion remains to just not change MSS at all. But this has to
be a new flag to avoid changing established behavior.

2021-05-10 02:25:55

by Dongseok Yi

[permalink] [raw]
Subject: RE: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Fri, May 07, 2021 at 09:50:03AM -0400, Willem de Bruijn wrote:
> On Fri, May 7, 2021 at 4:25 AM Dongseok Yi <[email protected]> wrote:
> >
> > On Thu, May 06, 2021 at 09:53:45PM -0400, Willem de Bruijn wrote:
> > > On Thu, May 6, 2021 at 9:45 PM Yunsheng Lin <[email protected]> wrote:
> > > >
> > > > On 2021/5/7 9:25, Willem de Bruijn wrote:
> > > > >>>> head_skb's data_len is the sum of skb_gro_len for each skb of the frags.
> > > > >>>> data_len could be 8 if server sent a small size packet and it is GROed
> > > > >>>> to head_skb.
> > > > >>>>
> > > > >>>> Please let me know if I am missing something.
> > > > >>>
> > > > >>> This is my understanding of the data path. This is a forwarding path
> > > > >>> for TCP traffic.
> > > > >>>
> > > > >>> GRO is enabled and will coalesce multiple segments into a single large
> > > > >>> packet. In bad cases, the coalesced packet payload is > MSS, but < MSS
> > > > >>> + 20.
> > > > >>>
> > > > >>> Somewhere between GRO and GSO you have a BPF program that converts the
> > > > >>> IPv6 address to IPv4.
> > > > >>
> > > > >> Your understanding is right. The data path is GRO -> BPF 6 to 4 ->
> > > > >> GSO.
> > > > >>
> > > > >>>
> > > > >>> There is no concept of head_skb at the time of this BPF program. It is
> > > > >>> a single SKB, with an skb linear part and multiple data items in the
> > > > >>> frags (no frag_list).
> > > > >>
> > > > >> Sorry for the confusion. head_skb what I mentioned was a skb linear
> > > > >> part. I'm considering a single SKB with frags too.
> > > > >>
> > > > >>>
> > > > >>> When entering the GSO stack, this single skb now has a payload length
> > > > >>> < MSS. So it would just make a valid TCP packet on its own?
> > > > >>>
> > > > >>> skb_gro_len is only relevant inside the GRO stack. It internally casts
> > > > >>> the skb->cb[] to NAPI_GRO_CB. This field is a scratch area that may be
> > > > >>> reused for other purposes later by other layers of the datapath. It is
> > > > >>> not safe to read this inside bpf_skb_proto_6_to_4.
> > > > >>
> > > > >> The condition what I made uses skb->data_len not skb_gro_len. Does
> > > > >> skb->data_len have a different meaning on each layer? As I know,
> > > > >> data_len indicates the amount of frags or frag_list. skb->data_len
> > > > >> should be > 20 in the sample case because the payload size of the skb
> > > > >> linear part is the same with mss.
> > > > >
> > > > > Ah, got it.
> > > > >
> > > > > data_len is the length of the skb minus the length in the skb linear
> > > > > section (as seen in skb_headlen).
> > > > >
> > > > > So this gso skb consists of two segments, the first one entirely
> > > > > linear, the payload of the second is in skb_shinfo(skb)->frags[0].
> > > > >
> > > > > It is not guaranteed that gso skbs built from two individual skbs end
> > > > > up looking like that. Only protocol headers in the linear segment and
> > > > > the payload of both in frags is common.
> > > > >
> > > > >> We can modify netif_needs_gso as another option to hit
> > > > >> skb_needs_linearize in validate_xmit_skb. But I think we should compare
> > > > >> skb->gso_size and skb->data_len too to check if mss exceed a payload
> > > > >> size.
> > > > >
> > > > > The rest of the stack does not build such gso packets with payload len
> > > > > < mss, so we should not have to add workarounds in the gso hot path
> > > > > for this.
> > > > >
> > > > > Also no need to linearize this skb. I think that if the bpf program
> > > > > would just clear the gso type, the packet would be sent correctly.
> > > > > Unless I'm missing something.
> > > >
> > > > Does the checksum/len field in ip and tcp/udp header need adjusting
> > > > before clearing gso type as the packet has became bigger?
> > >
> > > gro takes care of this. see for instance inet_gro_complete for updates
> > > to the ip header.
> >
> > I think clearing the gso type will get an error at tcp4_gso_segment
> > because netif_needs_gso returns true in validate_xmit_skb.
>
> Oh right. Whether a packet is gso is defined by gso_size being
> non-zero, not by gso_type.
>
> > >
> > > > Also, instead of testing skb->data_len, may test the skb->len?
> > > >
> > > > skb->len - (mac header + ip/ipv6 header + udp/tcp header) > mss + len_diff
> > >
> > > Yes. Essentially doing the same calculation as the gso code that is
> > > causing the packet to be dropped.
> >
> > BPF program is usually out of control. Can we take a general approach?
> > The below 2 cases has no issue when mss upgrading.
> > 1) skb->data_len > mss + 20
> > 2) skb->data_len < mss && skb->data_len > 20
> > The corner case is when
> > 3) skb->data_len > mss && skb->data_len < mss + 20
>
> Again, you cannot use skb->data_len alone to make inferences about the
> size of the second packet.

This approach is oriented a general way that does not make inferences
about the size of the second packet.

We can obviously increase the mss size when
1) skb->data_len > mss + 20
The issue will be fixed even if we consider the #1 condition.

But there is a precondition that mss < skb payload. If skb->data_len <
mss then skb_headlen(skb) contains the size of mss. So, we can check
the #2 condition too.
2) skb->data_len < mss && skb->data_len > 20

>
> >
> > But to cover #3 case, we should check the condition Yunsheng Lin said.
> > What if we do mss upgrading for both #1 and #2 cases only?
> >
> > + unsigned short off_len = skb->data_len > shinfo->gso_size ?
> > + shinfo->gso_size : 0;
> > [...]
> > /* Due to IPv4 header, MSS can be upgraded. */
> > - skb_increase_gso_size(shinfo, len_diff);
> > + if (skb->data_len - off_len > len_diff)
> > + skb_increase_gso_size(shinfo, len_diff);
>
> That generates TCP packets with different MSS within the same stream.
>
> My suggestion remains to just not change MSS at all. But this has to
> be a new flag to avoid changing established behavior.

I don't understand why the mss size should be kept in GSO step. Will
there be any issue with different mss?

In general, upgrading mss make sense when 6 to 4. The new flag would be
set by user to not change mss. What happened if user does not set the
flag? I still think we should fix the issue with a general approach. Or
can we remove the skb_increase_gso_size line?

2021-05-10 13:51:26

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

> > That generates TCP packets with different MSS within the same stream.
> >
> > My suggestion remains to just not change MSS at all. But this has to
> > be a new flag to avoid changing established behavior.
>
> I don't understand why the mss size should be kept in GSO step. Will
> there be any issue with different mss?

This issue has come up before and that has been the feedback from
TCP experts at one point.

> In general, upgrading mss make sense when 6 to 4. The new flag would be
> set by user to not change mss. What happened if user does not set the
> flag? I still think we should fix the issue with a general approach. Or
> can we remove the skb_increase_gso_size line?

Admins that insert such BPF packets should be aware of these issues.
And likely be using clamping. This is a known issue.

We arrived that the flag approach in bpf_skb_net_shrink. Extending
that to bpf_skb_change_proto would be consistent.

2021-05-10 14:11:33

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Mon, May 10, 2021 at 9:19 AM Willem de Bruijn
<[email protected]> wrote:
>
> > > That generates TCP packets with different MSS within the same stream.
> > >
> > > My suggestion remains to just not change MSS at all. But this has to
> > > be a new flag to avoid changing established behavior.
> >
> > I don't understand why the mss size should be kept in GSO step. Will
> > there be any issue with different mss?
>
> This issue has come up before and that has been the feedback from
> TCP experts at one point.
>
> > In general, upgrading mss make sense when 6 to 4. The new flag would be
> > set by user to not change mss. What happened if user does not set the
> > flag? I still think we should fix the issue with a general approach. Or
> > can we remove the skb_increase_gso_size line?
>
> Admins that insert such BPF packets should be aware of these issues.
> And likely be using clamping. This is a known issue.
>
> We arrived that the flag approach in bpf_skb_net_shrink. Extending
> that to bpf_skb_change_proto would be consistent.

As for more generic approach: does downgrading to non-TSO by clearing
gso_size work for this edge case?

2021-05-11 01:12:56

by Dongseok Yi

[permalink] [raw]
Subject: RE: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Mon, May 10, 2021 at 09:46:25AM -0400, Willem de Bruijn wrote:
> On Mon, May 10, 2021 at 9:19 AM Willem de Bruijn
> <[email protected]> wrote:
> >
> > > > That generates TCP packets with different MSS within the same stream.
> > > >
> > > > My suggestion remains to just not change MSS at all. But this has to
> > > > be a new flag to avoid changing established behavior.
> > >
> > > I don't understand why the mss size should be kept in GSO step. Will
> > > there be any issue with different mss?
> >
> > This issue has come up before and that has been the feedback from
> > TCP experts at one point.
> >
> > > In general, upgrading mss make sense when 6 to 4. The new flag would be
> > > set by user to not change mss. What happened if user does not set the
> > > flag? I still think we should fix the issue with a general approach. Or
> > > can we remove the skb_increase_gso_size line?
> >
> > Admins that insert such BPF packets should be aware of these issues.
> > And likely be using clamping. This is a known issue.
> >
> > We arrived that the flag approach in bpf_skb_net_shrink. Extending
> > that to bpf_skb_change_proto would be consistent.
>
> As for more generic approach: does downgrading to non-TSO by clearing
> gso_size work for this edge case?

It can hit __skb_linearize in validate_xmit_skb and frags will be
copied to a linear part. The linear part size can exceed the MTU of
skb->dev unexpectedly.

I will make another patch with the flag approach.

2021-05-11 06:52:49

by Dongseok Yi

[permalink] [raw]
Subject: [PATCH bpf v2] bpf: check BPF_F_ADJ_ROOM_FIXED_GSO when upgrading mss in 6 to 4

In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
coalesced packet payload can be > MSS, but < MSS + 20.
bpf_skb_proto_6_to_4 will increase the MSS and it can be > the payload
length. After then tcp_gso_segment checks for the payload length if it
is <= MSS. The condition is causing the packet to be dropped.

tcp_gso_segment():
[...]
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
goto out;
[...]

Allow to increase MSS when BPF_F_ADJ_ROOM_FIXED_GSO is not set.

Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
Signed-off-by: Dongseok Yi <[email protected]>
---
net/core/filter.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

v2:
per Willem de Bruijn request,
checked the flag instead of a generic approach.

diff --git a/net/core/filter.c b/net/core/filter.c
index cae56d0..a98b28d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3276,7 +3276,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
return 0;
}

-static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
+static int bpf_skb_proto_6_to_4(struct sk_buff *skb, u64 flags)
{
const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
u32 off = skb_mac_header_len(skb);
@@ -3305,7 +3305,8 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
}

/* Due to IPv4 header, MSS can be upgraded. */
- skb_increase_gso_size(shinfo, len_diff);
+ if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
+ skb_increase_gso_size(shinfo, len_diff);
/* Header must be checked, and gso_segs recomputed. */
shinfo->gso_type |= SKB_GSO_DODGY;
shinfo->gso_segs = 0;
@@ -3317,7 +3318,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
return 0;
}

-static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto)
+static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto, u64 flags)
{
__be16 from_proto = skb->protocol;

@@ -3327,7 +3328,7 @@ static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto)

if (from_proto == htons(ETH_P_IPV6) &&
to_proto == htons(ETH_P_IP))
- return bpf_skb_proto_6_to_4(skb);
+ return bpf_skb_proto_6_to_4(skb, flags);

return -ENOTSUPP;
}
@@ -3337,7 +3338,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
{
int ret;

- if (unlikely(flags))
+ if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO)))
return -EINVAL;

/* General idea is that this helper does the basic groundwork
@@ -3357,7 +3358,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
* that. For offloads, we mark packet as dodgy, so that headers
* need to be verified first.
*/
- ret = bpf_skb_proto_xlat(skb, proto);
+ ret = bpf_skb_proto_xlat(skb, proto, flags);
bpf_compute_data_pointers(skb);
return ret;
}
--
2.7.4

2021-05-11 17:42:25

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Mon, May 10, 2021 at 9:11 PM Dongseok Yi <[email protected]> wrote:
>
> On Mon, May 10, 2021 at 09:46:25AM -0400, Willem de Bruijn wrote:
> > On Mon, May 10, 2021 at 9:19 AM Willem de Bruijn
> > <[email protected]> wrote:
> > >
> > > > > That generates TCP packets with different MSS within the same stream.
> > > > >
> > > > > My suggestion remains to just not change MSS at all. But this has to
> > > > > be a new flag to avoid changing established behavior.
> > > >
> > > > I don't understand why the mss size should be kept in GSO step. Will
> > > > there be any issue with different mss?
> > >
> > > This issue has come up before and that has been the feedback from
> > > TCP experts at one point.
> > >
> > > > In general, upgrading mss make sense when 6 to 4. The new flag would be
> > > > set by user to not change mss. What happened if user does not set the
> > > > flag? I still think we should fix the issue with a general approach. Or
> > > > can we remove the skb_increase_gso_size line?
> > >
> > > Admins that insert such BPF packets should be aware of these issues.
> > > And likely be using clamping. This is a known issue.
> > >
> > > We arrived that the flag approach in bpf_skb_net_shrink. Extending
> > > that to bpf_skb_change_proto would be consistent.
> >
> > As for more generic approach: does downgrading to non-TSO by clearing
> > gso_size work for this edge case?
>
> It can hit __skb_linearize in validate_xmit_skb and frags will be
> copied to a linear part. The linear part size can exceed the MTU of
> skb->dev unexpectedly.

When does skb_needs_linearize return true here (besides lack of
scatter-gather support, which would also preclude TSO)?

> I will make another patch with the flag approach.
>

2021-05-11 17:46:15

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf v2] bpf: check BPF_F_ADJ_ROOM_FIXED_GSO when upgrading mss in 6 to 4

On Tue, May 11, 2021 at 2:51 AM Dongseok Yi <[email protected]> wrote:
>
> In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
> coalesced packet payload can be > MSS, but < MSS + 20.
> bpf_skb_proto_6_to_4 will increase the MSS and it can be > the payload
> length. After then tcp_gso_segment checks for the payload length if it
> is <= MSS. The condition is causing the packet to be dropped.
>
> tcp_gso_segment():
> [...]
> mss = skb_shinfo(skb)->gso_size;
> if (unlikely(skb->len <= mss))
> goto out;
> [...]
>
> Allow to increase MSS when BPF_F_ADJ_ROOM_FIXED_GSO is not set.
>
> Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> Signed-off-by: Dongseok Yi <[email protected]>
>
> ---

Thanks. Note that this feature does not preclude the alternatives
discussed, of converting the packet to non-TSO (by clearing gso_size)
or optionally modifying MSS (but that should get okay from TCP
experts).

I would target this for bpf-next and drop the Fixes. But that is
admittedly debatable.

> net/core/filter.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> v2:
> per Willem de Bruijn request,
> checked the flag instead of a generic approach.
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index cae56d0..a98b28d 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3276,7 +3276,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
> return 0;
> }
>
> -static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> +static int bpf_skb_proto_6_to_4(struct sk_buff *skb, u64 flags)
> {
> const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
> u32 off = skb_mac_header_len(skb);
> @@ -3305,7 +3305,8 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> }
>
> /* Due to IPv4 header, MSS can be upgraded. */
> - skb_increase_gso_size(shinfo, len_diff);
> + if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
> + skb_increase_gso_size(shinfo, len_diff);
> /* Header must be checked, and gso_segs recomputed. */
> shinfo->gso_type |= SKB_GSO_DODGY;
> shinfo->gso_segs = 0;
> @@ -3317,7 +3318,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> return 0;
> }
>
> -static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto)
> +static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto, u64 flags)
> {
> __be16 from_proto = skb->protocol;
>
> @@ -3327,7 +3328,7 @@ static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto)
>
> if (from_proto == htons(ETH_P_IPV6) &&
> to_proto == htons(ETH_P_IP))
> - return bpf_skb_proto_6_to_4(skb);
> + return bpf_skb_proto_6_to_4(skb, flags);
>
> return -ENOTSUPP;
> }
> @@ -3337,7 +3338,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
> {
> int ret;
>
> - if (unlikely(flags))
> + if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO)))
> return -EINVAL;

Once allowing this flag, please immediately support it for both
bpf_skb_proto_6_to_4 and bpf_skb_4_to_6.

We cannot do that later if we ignore the second case now.


> /* General idea is that this helper does the basic groundwork
> @@ -3357,7 +3358,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
> * that. For offloads, we mark packet as dodgy, so that headers
> * need to be verified first.
> */
> - ret = bpf_skb_proto_xlat(skb, proto);
> + ret = bpf_skb_proto_xlat(skb, proto, flags);
> bpf_compute_data_pointers(skb);
> return ret;
> }
> --
> 2.7.4
>

2021-05-12 00:47:02

by Dongseok Yi

[permalink] [raw]
Subject: RE: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4

On Tue, May 11, 2021 at 01:38:41PM -0400, Willem de Bruijn wrote:
> On Mon, May 10, 2021 at 9:11 PM Dongseok Yi <[email protected]> wrote:
> >
> > On Mon, May 10, 2021 at 09:46:25AM -0400, Willem de Bruijn wrote:
> > > On Mon, May 10, 2021 at 9:19 AM Willem de Bruijn
> > > <[email protected]> wrote:
> > > >
> > > > > > That generates TCP packets with different MSS within the same stream.
> > > > > >
> > > > > > My suggestion remains to just not change MSS at all. But this has to
> > > > > > be a new flag to avoid changing established behavior.
> > > > >
> > > > > I don't understand why the mss size should be kept in GSO step. Will
> > > > > there be any issue with different mss?
> > > >
> > > > This issue has come up before and that has been the feedback from
> > > > TCP experts at one point.
> > > >
> > > > > In general, upgrading mss make sense when 6 to 4. The new flag would be
> > > > > set by user to not change mss. What happened if user does not set the
> > > > > flag? I still think we should fix the issue with a general approach. Or
> > > > > can we remove the skb_increase_gso_size line?
> > > >
> > > > Admins that insert such BPF packets should be aware of these issues.
> > > > And likely be using clamping. This is a known issue.
> > > >
> > > > We arrived that the flag approach in bpf_skb_net_shrink. Extending
> > > > that to bpf_skb_change_proto would be consistent.
> > >
> > > As for more generic approach: does downgrading to non-TSO by clearing
> > > gso_size work for this edge case?
> >
> > It can hit __skb_linearize in validate_xmit_skb and frags will be
> > copied to a linear part. The linear part size can exceed the MTU of
> > skb->dev unexpectedly.
>
> When does skb_needs_linearize return true here (besides lack of
> scatter-gather support, which would also preclude TSO)?

As I know not every netdev support NETIF_F_SG. TSO requires SG.

/* TSO requires that SG is present as well. */
if ((features & NETIF_F_ALL_TSO) && !(features & NETIF_F_SG)) {
netdev_dbg(dev, "Dropping TSO features since no SG feature.\n");
features &= ~NETIF_F_ALL_TSO;
}

>
> > I will make another patch with the flag approach.
> >

2021-05-12 06:57:45

by Dongseok Yi

[permalink] [raw]
Subject: RE: [PATCH bpf v2] bpf: check BPF_F_ADJ_ROOM_FIXED_GSO when upgrading mss in 6 to 4

On Tue, May 11, 2021 at 01:42:46PM -0400, Willem de Bruijn wrote:
> On Tue, May 11, 2021 at 2:51 AM Dongseok Yi <[email protected]> wrote:
> >
> > In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
> > coalesced packet payload can be > MSS, but < MSS + 20.
> > bpf_skb_proto_6_to_4 will increase the MSS and it can be > the payload
> > length. After then tcp_gso_segment checks for the payload length if it
> > is <= MSS. The condition is causing the packet to be dropped.
> >
> > tcp_gso_segment():
> > [...]
> > mss = skb_shinfo(skb)->gso_size;
> > if (unlikely(skb->len <= mss))
> > goto out;
> > [...]
> >
> > Allow to increase MSS when BPF_F_ADJ_ROOM_FIXED_GSO is not set.
> >
> > Fixes: 6578171a7ff0 (bpf: add bpf_skb_change_proto helper)
> > Signed-off-by: Dongseok Yi <[email protected]>
> >
> > ---
>
> Thanks. Note that this feature does not preclude the alternatives
> discussed, of converting the packet to non-TSO (by clearing gso_size)
> or optionally modifying MSS (but that should get okay from TCP
> experts).
>
> I would target this for bpf-next and drop the Fixes. But that is
> admittedly debatable.

No problem. We can make a better decision under bpf-next.

>
> > net/core/filter.c | 13 +++++++------
> > 1 file changed, 7 insertions(+), 6 deletions(-)
> >
> > v2:
> > per Willem de Bruijn request,
> > checked the flag instead of a generic approach.
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index cae56d0..a98b28d 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3276,7 +3276,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
> > return 0;
> > }
> >
> > -static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > +static int bpf_skb_proto_6_to_4(struct sk_buff *skb, u64 flags)
> > {
> > const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
> > u32 off = skb_mac_header_len(skb);
> > @@ -3305,7 +3305,8 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > }
> >
> > /* Due to IPv4 header, MSS can be upgraded. */
> > - skb_increase_gso_size(shinfo, len_diff);
> > + if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
> > + skb_increase_gso_size(shinfo, len_diff);
> > /* Header must be checked, and gso_segs recomputed. */
> > shinfo->gso_type |= SKB_GSO_DODGY;
> > shinfo->gso_segs = 0;
> > @@ -3317,7 +3318,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> > return 0;
> > }
> >
> > -static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto)
> > +static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto, u64 flags)
> > {
> > __be16 from_proto = skb->protocol;
> >
> > @@ -3327,7 +3328,7 @@ static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto)
> >
> > if (from_proto == htons(ETH_P_IPV6) &&
> > to_proto == htons(ETH_P_IP))
> > - return bpf_skb_proto_6_to_4(skb);
> > + return bpf_skb_proto_6_to_4(skb, flags);
> >
> > return -ENOTSUPP;
> > }
> > @@ -3337,7 +3338,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
> > {
> > int ret;
> >
> > - if (unlikely(flags))
> > + if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO)))
> > return -EINVAL;
>
> Once allowing this flag, please immediately support it for both
> bpf_skb_proto_6_to_4 and bpf_skb_4_to_6.
>
> We cannot do that later if we ignore the second case now.

I will make v3 for both 6_to_4 and 4_to_6.

>
>
> > /* General idea is that this helper does the basic groundwork
> > @@ -3357,7 +3358,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
> > * that. For offloads, we mark packet as dodgy, so that headers
> > * need to be verified first.
> > */
> > - ret = bpf_skb_proto_xlat(skb, proto);
> > + ret = bpf_skb_proto_xlat(skb, proto, flags);
> > bpf_compute_data_pointers(skb);
> > return ret;
> > }
> > --
> > 2.7.4
> >

2021-05-12 07:42:22

by Dongseok Yi

[permalink] [raw]
Subject: [PATCH bpf-next v3] bpf: check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto

In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
coalesced packet payload can be > MSS, but < MSS + 20.
bpf_skb_proto_6_to_4 will upgrade the MSS and it can be > the payload
length. After then tcp_gso_segment checks for the payload length if it
is <= MSS. The condition is causing the packet to be dropped.

tcp_gso_segment():
[...]
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
goto out;
[...]

Allow to upgrade/downgrade MSS only when BPF_F_ADJ_ROOM_FIXED_GSO is
not set.

Signed-off-by: Dongseok Yi <[email protected]>
---
net/core/filter.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

v2:
per Willem de Bruijn request,
checked the flag instead of a generic approach.

v3:
per Willem de Bruijn request,
moved to bpf-next, supported for both 6_to_4 and 4_to_6.

diff --git a/net/core/filter.c b/net/core/filter.c
index cae56d0..582ac19 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3235,7 +3235,7 @@ static int bpf_skb_net_hdr_pop(struct sk_buff *skb, u32 off, u32 len)
return ret;
}

-static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
+static int bpf_skb_proto_4_to_6(struct sk_buff *skb, u64 flags)
{
const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
u32 off = skb_mac_header_len(skb);
@@ -3264,7 +3264,9 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
}

/* Due to IPv6 header, MSS needs to be downgraded. */
- skb_decrease_gso_size(shinfo, len_diff);
+ if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
+ skb_decrease_gso_size(shinfo, len_diff);
+
/* Header must be checked, and gso_segs recomputed. */
shinfo->gso_type |= SKB_GSO_DODGY;
shinfo->gso_segs = 0;
@@ -3276,7 +3278,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
return 0;
}

-static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
+static int bpf_skb_proto_6_to_4(struct sk_buff *skb, u64 flags)
{
const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
u32 off = skb_mac_header_len(skb);
@@ -3305,7 +3307,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
}

/* Due to IPv4 header, MSS can be upgraded. */
- skb_increase_gso_size(shinfo, len_diff);
+ if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
+ skb_increase_gso_size(shinfo, len_diff);
+
/* Header must be checked, and gso_segs recomputed. */
shinfo->gso_type |= SKB_GSO_DODGY;
shinfo->gso_segs = 0;
@@ -3317,17 +3321,17 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
return 0;
}

-static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto)
+static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto, u64 flags)
{
__be16 from_proto = skb->protocol;

if (from_proto == htons(ETH_P_IP) &&
to_proto == htons(ETH_P_IPV6))
- return bpf_skb_proto_4_to_6(skb);
+ return bpf_skb_proto_4_to_6(skb, flags);

if (from_proto == htons(ETH_P_IPV6) &&
to_proto == htons(ETH_P_IP))
- return bpf_skb_proto_6_to_4(skb);
+ return bpf_skb_proto_6_to_4(skb, flags);

return -ENOTSUPP;
}
@@ -3337,7 +3341,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
{
int ret;

- if (unlikely(flags))
+ if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO)))
return -EINVAL;

/* General idea is that this helper does the basic groundwork
@@ -3357,7 +3361,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
* that. For offloads, we mark packet as dodgy, so that headers
* need to be verified first.
*/
- ret = bpf_skb_proto_xlat(skb, proto);
+ ret = bpf_skb_proto_xlat(skb, proto, flags);
bpf_compute_data_pointers(skb);
return ret;
}
--
2.7.4

2021-05-12 14:17:20

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH bpf-next v3] bpf: check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto

On Wed, May 12, 2021 at 3:41 AM Dongseok Yi <[email protected]> wrote:
>
> In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
> coalesced packet payload can be > MSS, but < MSS + 20.
> bpf_skb_proto_6_to_4 will upgrade the MSS and it can be > the payload
> length. After then tcp_gso_segment checks for the payload length if it
> is <= MSS. The condition is causing the packet to be dropped.
>
> tcp_gso_segment():
> [...]
> mss = skb_shinfo(skb)->gso_size;
> if (unlikely(skb->len <= mss))
> goto out;
> [...]
>
> Allow to upgrade/downgrade MSS only when BPF_F_ADJ_ROOM_FIXED_GSO is
> not set.
>
> Signed-off-by: Dongseok Yi <[email protected]>

Acked-by: Willem de Bruijn <[email protected]>

2021-05-19 18:33:37

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH bpf-next v3] bpf: check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto

Hello:

This patch was applied to bpf/bpf-next.git (refs/heads/master):

On Wed, 12 May 2021 16:27:33 +0900 you wrote:
> In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
> coalesced packet payload can be > MSS, but < MSS + 20.
> bpf_skb_proto_6_to_4 will upgrade the MSS and it can be > the payload
> length. After then tcp_gso_segment checks for the payload length if it
> is <= MSS. The condition is causing the packet to be dropped.
>
> tcp_gso_segment():
> [...]
> mss = skb_shinfo(skb)->gso_size;
> if (unlikely(skb->len <= mss))
> goto out;
> [...]
>
> [...]

Here is the summary with links:
- [bpf-next,v3] bpf: check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto
https://git.kernel.org/bpf/bpf-next/c/fa7b83bf3b15

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html