2020-12-11 11:12:46

by Alexander H Duyck

[permalink] [raw]
Subject: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED

From: Alexander Duyck <[email protected]>

In the case of a fastopen SYN there are cases where it may trigger either a
ICMP_TOOBIG message in the case of IPv6 or a fragmentation request in the
case of IPv4. This results in the socket stalling for a second or more as
it does not respond to the message by retransmitting the SYN frame.

Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that
makes use of the entire MTU. In the case of fastopen it does, and an
additional complication is that the retransmit queue doesn't contain the
original frames. As a result when tcp_simple_retransmit is called and
walks the list of frames in the queue it may not mark the frames as lost
because both the SYN and the data packet each individually are smaller than
the MSS size after the adjustment. This results in the socket being stalled
until the retransmit timer kicks in and forces the SYN frame out again
without the data attached.

In order to resolve this we need to mark the SYN frame as lost if it is the
first packet in the queue. Doing this allows the socket to recover much
more quickly without the retransmit timeout stall.

Signed-off-by: Alexander Duyck <[email protected]>
---
include/net/tcp.h | 1 +
net/ipv4/tcp_input.c | 8 ++++++++
net/ipv4/tcp_ipv4.c | 6 ++++++
net/ipv6/tcp_ipv6.c | 4 ++++
4 files changed, 19 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d4ef5bf94168..6181ad98727a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2062,6 +2062,7 @@ void tcp_init(void);

/* tcp_recovery.c */
void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb);
+void tcp_mark_syn_lost(struct sock *sk);
void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced);
extern s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb,
u32 reo_wnd);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 389d1b340248..d0c5248bc4e1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1066,6 +1066,14 @@ void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
}
}

+void tcp_mark_syn_lost(struct sock *sk)
+{
+ struct sk_buff *skb = tcp_rtx_queue_head(sk);
+
+ if (skb && TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)
+ tcp_mark_skb_lost(sk, skb);
+}
+
/* Updates the delivered and delivered_ce counts */
static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
bool ece_ack)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8391aa29e7a4..ad62fe029646 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -546,6 +546,12 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
if (sk->sk_state == TCP_LISTEN)
goto out;

+ /* fastopen SYN may have triggered the fragmentation
+ * request. Mark the SYN or SYN/ACK as lost.
+ */
+ if (sk->sk_state == TCP_SYN_SENT)
+ tcp_mark_syn_lost(sk);
+
tp->mtu_info = info;
if (!sock_owned_by_user(sk)) {
tcp_v4_mtu_reduced(sk);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 992cbf3eb9e3..d7b1346863e3 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -443,6 +443,10 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
if (!ip6_sk_accept_pmtu(sk))
goto out;

+ /* fastopen SYN may have triggered TOOBIG, mark it lost. */
+ if (sk->sk_state == TCP_SYN_SENT)
+ tcp_mark_syn_lost(sk);
+
tp->mtu_info = ntohl(info);
if (!sock_owned_by_user(sk))
tcp_v6_mtu_reduced(sk);



2020-12-11 12:03:55

by Eric Dumazet

[permalink] [raw]
Subject: Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED

On Fri, Dec 11, 2020 at 2:55 AM Alexander Duyck
<[email protected]> wrote:
>
> From: Alexander Duyck <[email protected]>
>
> In the case of a fastopen SYN there are cases where it may trigger either a
> ICMP_TOOBIG message in the case of IPv6 or a fragmentation request in the
> case of IPv4. This results in the socket stalling for a second or more as
> it does not respond to the message by retransmitting the SYN frame.
>
> Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
> ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that
> makes use of the entire MTU. In the case of fastopen it does, and an
> additional complication is that the retransmit queue doesn't contain the
> original frames. As a result when tcp_simple_retransmit is called and
> walks the list of frames in the queue it may not mark the frames as lost
> because both the SYN and the data packet each individually are smaller than
> the MSS size after the adjustment. This results in the socket being stalled
> until the retransmit timer kicks in and forces the SYN frame out again
> without the data attached.
>
> In order to resolve this we need to mark the SYN frame as lost if it is the
> first packet in the queue. Doing this allows the socket to recover much
> more quickly without the retransmit timeout stall.
>
> Signed-off-by: Alexander Duyck <[email protected]>


I do not think it is net candidate, but net-next

Yuchung might correct me, but I think TCP Fastopen standard was very
conservative about payload len in the SYN packet

So receiving an ICMP was never considered.

> ---
> include/net/tcp.h | 1 +
> net/ipv4/tcp_input.c | 8 ++++++++
> net/ipv4/tcp_ipv4.c | 6 ++++++
> net/ipv6/tcp_ipv6.c | 4 ++++
> 4 files changed, 19 insertions(+)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index d4ef5bf94168..6181ad98727a 100644
> --- a/include/net/tcp.h


> +++ b/net/ipv4/tcp_ipv4.c
> @@ -546,6 +546,12 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
> if (sk->sk_state == TCP_LISTEN)
> goto out;
>
> + /* fastopen SYN may have triggered the fragmentation
> + * request. Mark the SYN or SYN/ACK as lost.
> + */
> + if (sk->sk_state == TCP_SYN_SENT)
> + tcp_mark_syn_lost(sk);

This is going to crash in some cases, you do not know if you own the socket.
(Look a few lines below)

> +
> tp->mtu_info = info;
> if (!sock_owned_by_user(sk)) {
> tcp_v4_mtu_reduced(sk);
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 992cbf3eb9e3..d7b1346863e3 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -443,6 +443,10 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
> if (!ip6_sk_accept_pmtu(sk))
> goto out;
>
> + /* fastopen SYN may have triggered TOOBIG, mark it lost. */
> + if (sk->sk_state == TCP_SYN_SENT)
> + tcp_mark_syn_lost(sk);


Same issue here.

> +
> tp->mtu_info = ntohl(info);
> if (!sock_owned_by_user(sk))
> tcp_v6_mtu_reduced(sk);
>
>

2020-12-11 13:42:09

by kernel test robot

[permalink] [raw]
Subject: Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED

Hi Alexander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]

url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/tcp-Mark-fastopen-SYN-packet-as-lost-when-receiving-ICMP_TOOBIG-ICMP_FRAG_NEEDED/20201211-100032
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git d9838b1d39283c1200c13f9076474c7624b8ec34
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/01abda6be2a196620ae2057c3e654edc82beb144
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Alexander-Duyck/tcp-Mark-fastopen-SYN-packet-as-lost-when-receiving-ICMP_TOOBIG-ICMP_FRAG_NEEDED/20201211-100032
git checkout 01abda6be2a196620ae2057c3e654edc82beb144
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sh

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "tcp_mark_syn_lost" [net/ipv6/ipv6.ko] undefined!
ERROR: modpost: "clk_set_min_rate" [sound/soc/atmel/snd-soc-mchp-spdifrx.ko] undefined!
ERROR: modpost: "__delay" [drivers/net/mdio/mdio-cavium.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (1.66 kB)
.config.gz (52.34 kB)
Download all attachments

2020-12-11 18:40:01

by Eric Dumazet

[permalink] [raw]
Subject: Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED

On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck
<[email protected]> wrote:

> That's fine. I can target this for net-next. I had just selected net
> since I had considered it a fix, but I suppose it could be considered
> a behavioral change.

We are very late in the 5.10 cycle, and we never handled ICMP in this
state, so net-next is definitely better.

Note that RFC 7413 states in 4.1.3 :

The client MUST cache cookies from servers for later Fast Open
connections. For a multihomed client, the cookies are dependent on
the client and server IP addresses. Hence, the client should cache
at most one (most recently received) cookie per client and server IP
address pair.

When caching cookies, we recommend that the client also cache the
Maximum Segment Size (MSS) advertised by the server. The client can
cache the MSS advertised by the server in order to determine the
maximum amount of data that the client can fit in the SYN packet in
subsequent TFO connections. Caching the server MSS is useful
because, with Fast Open, a client sends data in the SYN packet before
the server announces its MSS in the SYN-ACK packet. If the client
sends more data in the SYN packet than the server will accept, this
will likely require the client to retransmit some or all of the data.
Hence, caching the server MSS can enhance performance.

Without a cached server MSS, the amount of data in the SYN packet is
limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220
bytes for IPv6 [RFC2460]. Even if the client complies with this
limit when sending the SYN, it is known that an IPv4 receiver
advertising an MSS less than 536 bytes can receive a segment larger
than it is expecting.

If the cached MSS is larger than the typical size (1460 bytes for
IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet
may cause problems that offset the performance benefit of Fast Open.
For example, the unusually large SYN may trigger IP fragmentation and
may confuse firewalls or middleboxes, causing SYN retransmission and
other side effects. Therefore, the client MAY limit the cached MSS
to 1460 bytes for IPv4 or 1440 for IPv6.


Relying on ICMP is fragile, since they can be filtered in some way.

2020-12-11 18:59:01

by Alexander H Duyck

[permalink] [raw]
Subject: Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED

On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <[email protected]> wrote:
>
> On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck
> <[email protected]> wrote:
>
> > That's fine. I can target this for net-next. I had just selected net
> > since I had considered it a fix, but I suppose it could be considered
> > a behavioral change.
>
> We are very late in the 5.10 cycle, and we never handled ICMP in this
> state, so net-next is definitely better.
>
> Note that RFC 7413 states in 4.1.3 :
>
> The client MUST cache cookies from servers for later Fast Open
> connections. For a multihomed client, the cookies are dependent on
> the client and server IP addresses. Hence, the client should cache
> at most one (most recently received) cookie per client and server IP
> address pair.
>
> When caching cookies, we recommend that the client also cache the
> Maximum Segment Size (MSS) advertised by the server. The client can
> cache the MSS advertised by the server in order to determine the
> maximum amount of data that the client can fit in the SYN packet in
> subsequent TFO connections. Caching the server MSS is useful
> because, with Fast Open, a client sends data in the SYN packet before
> the server announces its MSS in the SYN-ACK packet. If the client
> sends more data in the SYN packet than the server will accept, this
> will likely require the client to retransmit some or all of the data.
> Hence, caching the server MSS can enhance performance.
>
> Without a cached server MSS, the amount of data in the SYN packet is
> limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220
> bytes for IPv6 [RFC2460]. Even if the client complies with this
> limit when sending the SYN, it is known that an IPv4 receiver
> advertising an MSS less than 536 bytes can receive a segment larger
> than it is expecting.
>
> If the cached MSS is larger than the typical size (1460 bytes for
> IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet
> may cause problems that offset the performance benefit of Fast Open.
> For example, the unusually large SYN may trigger IP fragmentation and
> may confuse firewalls or middleboxes, causing SYN retransmission and
> other side effects. Therefore, the client MAY limit the cached MSS
> to 1460 bytes for IPv4 or 1440 for IPv6.
>
>
> Relying on ICMP is fragile, since they can be filtered in some way.

In this case I am not relying on the ICMP, but thought that since I
have it I should make use of it. WIthout the ICMP we would still just
be waiting on the retransmit timer.

The problem case has a v6-in-v6 tunnel between the client and the
endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS
which works fine until they actually go to send a large packet between
the two. At that point the tunnel is triggering an ICMP_TOOBIG and the
endpoint is stalling since the MSS is dropped to 1400, but the SYN and
data payload were already smaller than that so no retransmits are
being triggered. This results in TFO being 1s slower than non-TFO
because of the failure to trigger the retransmit for the frame that
violated the PMTU. The patch is meant to get the two back into
comparable times.

2020-12-12 18:57:26

by Alexander H Duyck

[permalink] [raw]
Subject: Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED

On Thu, Dec 10, 2020 at 10:24 PM Eric Dumazet <[email protected]> wrote:
>
> On Fri, Dec 11, 2020 at 2:55 AM Alexander Duyck
> <[email protected]> wrote:
> >
> > From: Alexander Duyck <[email protected]>
> >
> > In the case of a fastopen SYN there are cases where it may trigger either a
> > ICMP_TOOBIG message in the case of IPv6 or a fragmentation request in the
> > case of IPv4. This results in the socket stalling for a second or more as
> > it does not respond to the message by retransmitting the SYN frame.
> >
> > Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
> > ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that
> > makes use of the entire MTU. In the case of fastopen it does, and an
> > additional complication is that the retransmit queue doesn't contain the
> > original frames. As a result when tcp_simple_retransmit is called and
> > walks the list of frames in the queue it may not mark the frames as lost
> > because both the SYN and the data packet each individually are smaller than
> > the MSS size after the adjustment. This results in the socket being stalled
> > until the retransmit timer kicks in and forces the SYN frame out again
> > without the data attached.
> >
> > In order to resolve this we need to mark the SYN frame as lost if it is the
> > first packet in the queue. Doing this allows the socket to recover much
> > more quickly without the retransmit timeout stall.
> >
> > Signed-off-by: Alexander Duyck <[email protected]>
>
>
> I do not think it is net candidate, but net-next
>
> Yuchung might correct me, but I think TCP Fastopen standard was very
> conservative about payload len in the SYN packet
>
> So receiving an ICMP was never considered.

That's fine. I can target this for net-next. I had just selected net
since I had considered it a fix, but I suppose it could be considered
a behavioral change.

> > ---
> > include/net/tcp.h | 1 +
> > net/ipv4/tcp_input.c | 8 ++++++++
> > net/ipv4/tcp_ipv4.c | 6 ++++++
> > net/ipv6/tcp_ipv6.c | 4 ++++
> > 4 files changed, 19 insertions(+)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index d4ef5bf94168..6181ad98727a 100644
> > --- a/include/net/tcp.h
>
>
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -546,6 +546,12 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
> > if (sk->sk_state == TCP_LISTEN)
> > goto out;
> >
> > + /* fastopen SYN may have triggered the fragmentation
> > + * request. Mark the SYN or SYN/ACK as lost.
> > + */
> > + if (sk->sk_state == TCP_SYN_SENT)
> > + tcp_mark_syn_lost(sk);
>
> This is going to crash in some cases, you do not know if you own the socket.
> (Look a few lines below)

Okay, I will look into moving this down into the block below since I
assume if it is owned by user we cannot make these changes.

> > +
> > tp->mtu_info = info;
> > if (!sock_owned_by_user(sk)) {
> > tcp_v4_mtu_reduced(sk);
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index 992cbf3eb9e3..d7b1346863e3 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -443,6 +443,10 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
> > if (!ip6_sk_accept_pmtu(sk))
> > goto out;
> >
> > + /* fastopen SYN may have triggered TOOBIG, mark it lost. */
> > + if (sk->sk_state == TCP_SYN_SENT)
> > + tcp_mark_syn_lost(sk);
>
>
> Same issue here.

I'll move this one too.

> > +
> > tp->mtu_info = ntohl(info);
> > if (!sock_owned_by_user(sk))
> > tcp_v6_mtu_reduced(sk);
> >
> >

2020-12-13 01:13:58

by Eric Dumazet

[permalink] [raw]
Subject: Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED

On Fri, Dec 11, 2020 at 6:15 PM Alexander Duyck
<[email protected]> wrote:
>
> On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <[email protected]> wrote:
> >
> > On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck
> > <[email protected]> wrote:
> >
> > > That's fine. I can target this for net-next. I had just selected net
> > > since I had considered it a fix, but I suppose it could be considered
> > > a behavioral change.
> >
> > We are very late in the 5.10 cycle, and we never handled ICMP in this
> > state, so net-next is definitely better.
> >
> > Note that RFC 7413 states in 4.1.3 :
> >
> > The client MUST cache cookies from servers for later Fast Open
> > connections. For a multihomed client, the cookies are dependent on
> > the client and server IP addresses. Hence, the client should cache
> > at most one (most recently received) cookie per client and server IP
> > address pair.
> >
> > When caching cookies, we recommend that the client also cache the
> > Maximum Segment Size (MSS) advertised by the server. The client can
> > cache the MSS advertised by the server in order to determine the
> > maximum amount of data that the client can fit in the SYN packet in
> > subsequent TFO connections. Caching the server MSS is useful
> > because, with Fast Open, a client sends data in the SYN packet before
> > the server announces its MSS in the SYN-ACK packet. If the client
> > sends more data in the SYN packet than the server will accept, this
> > will likely require the client to retransmit some or all of the data.
> > Hence, caching the server MSS can enhance performance.
> >
> > Without a cached server MSS, the amount of data in the SYN packet is
> > limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220
> > bytes for IPv6 [RFC2460]. Even if the client complies with this
> > limit when sending the SYN, it is known that an IPv4 receiver
> > advertising an MSS less than 536 bytes can receive a segment larger
> > than it is expecting.
> >
> > If the cached MSS is larger than the typical size (1460 bytes for
> > IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet
> > may cause problems that offset the performance benefit of Fast Open.
> > For example, the unusually large SYN may trigger IP fragmentation and
> > may confuse firewalls or middleboxes, causing SYN retransmission and
> > other side effects. Therefore, the client MAY limit the cached MSS
> > to 1460 bytes for IPv4 or 1440 for IPv6.
> >
> >
> > Relying on ICMP is fragile, since they can be filtered in some way.
>
> In this case I am not relying on the ICMP, but thought that since I
> have it I should make use of it. WIthout the ICMP we would still just
> be waiting on the retransmit timer.
>
> The problem case has a v6-in-v6 tunnel between the client and the
> endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS
> which works fine until they actually go to send a large packet between
> the two. At that point the tunnel is triggering an ICMP_TOOBIG and the
> endpoint is stalling since the MSS is dropped to 1400, but the SYN and
> data payload were already smaller than that so no retransmits are
> being triggered. This results in TFO being 1s slower than non-TFO
> because of the failure to trigger the retransmit for the frame that
> violated the PMTU. The patch is meant to get the two back into
> comparable times.

Okay... Have you studied why tcp_v4_mtu_reduced() (and IPv6 equivalent)
code does not yet handle the retransmit in TCP_SYN_SENT state ?

2020-12-13 03:52:08

by Alexander H Duyck

[permalink] [raw]
Subject: Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED

On Fri, Dec 11, 2020 at 11:18 AM Eric Dumazet <[email protected]> wrote:
>
> On Fri, Dec 11, 2020 at 6:15 PM Alexander Duyck
> <[email protected]> wrote:
> >
> > On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <[email protected]> wrote:
> > >
> > > On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck
> > > <[email protected]> wrote:
> > >
> > > > That's fine. I can target this for net-next. I had just selected net
> > > > since I had considered it a fix, but I suppose it could be considered
> > > > a behavioral change.
> > >
> > > We are very late in the 5.10 cycle, and we never handled ICMP in this
> > > state, so net-next is definitely better.
> > >
> > > Note that RFC 7413 states in 4.1.3 :
> > >
> > > The client MUST cache cookies from servers for later Fast Open
> > > connections. For a multihomed client, the cookies are dependent on
> > > the client and server IP addresses. Hence, the client should cache
> > > at most one (most recently received) cookie per client and server IP
> > > address pair.
> > >
> > > When caching cookies, we recommend that the client also cache the
> > > Maximum Segment Size (MSS) advertised by the server. The client can
> > > cache the MSS advertised by the server in order to determine the
> > > maximum amount of data that the client can fit in the SYN packet in
> > > subsequent TFO connections. Caching the server MSS is useful
> > > because, with Fast Open, a client sends data in the SYN packet before
> > > the server announces its MSS in the SYN-ACK packet. If the client
> > > sends more data in the SYN packet than the server will accept, this
> > > will likely require the client to retransmit some or all of the data.
> > > Hence, caching the server MSS can enhance performance.
> > >
> > > Without a cached server MSS, the amount of data in the SYN packet is
> > > limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220
> > > bytes for IPv6 [RFC2460]. Even if the client complies with this
> > > limit when sending the SYN, it is known that an IPv4 receiver
> > > advertising an MSS less than 536 bytes can receive a segment larger
> > > than it is expecting.
> > >
> > > If the cached MSS is larger than the typical size (1460 bytes for
> > > IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet
> > > may cause problems that offset the performance benefit of Fast Open.
> > > For example, the unusually large SYN may trigger IP fragmentation and
> > > may confuse firewalls or middleboxes, causing SYN retransmission and
> > > other side effects. Therefore, the client MAY limit the cached MSS
> > > to 1460 bytes for IPv4 or 1440 for IPv6.
> > >
> > >
> > > Relying on ICMP is fragile, since they can be filtered in some way.
> >
> > In this case I am not relying on the ICMP, but thought that since I
> > have it I should make use of it. WIthout the ICMP we would still just
> > be waiting on the retransmit timer.
> >
> > The problem case has a v6-in-v6 tunnel between the client and the
> > endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS
> > which works fine until they actually go to send a large packet between
> > the two. At that point the tunnel is triggering an ICMP_TOOBIG and the
> > endpoint is stalling since the MSS is dropped to 1400, but the SYN and
> > data payload were already smaller than that so no retransmits are
> > being triggered. This results in TFO being 1s slower than non-TFO
> > because of the failure to trigger the retransmit for the frame that
> > violated the PMTU. The patch is meant to get the two back into
> > comparable times.
>
> Okay... Have you studied why tcp_v4_mtu_reduced() (and IPv6 equivalent)
> code does not yet handle the retransmit in TCP_SYN_SENT state ?

The problem lies in tcp_simple_retransmit(). Specifically the loop at
the start of the function goes to check the retransmit queue to see if
there are any packets larger than MSS and finds none since we don't
place the SYN w/ data in there and instead have a separate SYN and
data packet.

I'm debating if I should take an alternative approach and modify the
loop at the start of tcp_simple_transmit to add a check for a SYN
packet, tp->syn_data being set, and then comparing the next frame
length + MAX_TCP_HEADER_OPTIONS versus mss.

2020-12-13 05:30:21

by Yuchung Cheng

[permalink] [raw]
Subject: Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED

On Fri, Dec 11, 2020 at 1:51 PM Alexander Duyck
<[email protected]> wrote:
>
> On Fri, Dec 11, 2020 at 11:18 AM Eric Dumazet <[email protected]> wrote:
> >
> > On Fri, Dec 11, 2020 at 6:15 PM Alexander Duyck
> > <[email protected]> wrote:
> > >
> > > On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <[email protected]> wrote:
> > > >
> > > > On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck
> > > > <[email protected]> wrote:
> > > >
> > > > > That's fine. I can target this for net-next. I had just selected net
> > > > > since I had considered it a fix, but I suppose it could be considered
> > > > > a behavioral change.
> > > >
> > > > We are very late in the 5.10 cycle, and we never handled ICMP in this
> > > > state, so net-next is definitely better.
> > > >
> > > > Note that RFC 7413 states in 4.1.3 :
> > > >
> > > > The client MUST cache cookies from servers for later Fast Open
> > > > connections. For a multihomed client, the cookies are dependent on
> > > > the client and server IP addresses. Hence, the client should cache
> > > > at most one (most recently received) cookie per client and server IP
> > > > address pair.
> > > >
> > > > When caching cookies, we recommend that the client also cache the
> > > > Maximum Segment Size (MSS) advertised by the server. The client can
> > > > cache the MSS advertised by the server in order to determine the
> > > > maximum amount of data that the client can fit in the SYN packet in
> > > > subsequent TFO connections. Caching the server MSS is useful
> > > > because, with Fast Open, a client sends data in the SYN packet before
> > > > the server announces its MSS in the SYN-ACK packet. If the client
> > > > sends more data in the SYN packet than the server will accept, this
> > > > will likely require the client to retransmit some or all of the data.
> > > > Hence, caching the server MSS can enhance performance.
> > > >
> > > > Without a cached server MSS, the amount of data in the SYN packet is
> > > > limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220
> > > > bytes for IPv6 [RFC2460]. Even if the client complies with this
> > > > limit when sending the SYN, it is known that an IPv4 receiver
> > > > advertising an MSS less than 536 bytes can receive a segment larger
> > > > than it is expecting.
> > > >
> > > > If the cached MSS is larger than the typical size (1460 bytes for
> > > > IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet
> > > > may cause problems that offset the performance benefit of Fast Open.
> > > > For example, the unusually large SYN may trigger IP fragmentation and
> > > > may confuse firewalls or middleboxes, causing SYN retransmission and
> > > > other side effects. Therefore, the client MAY limit the cached MSS
> > > > to 1460 bytes for IPv4 or 1440 for IPv6.
> > > >
> > > >
> > > > Relying on ICMP is fragile, since they can be filtered in some way.
> > >
> > > In this case I am not relying on the ICMP, but thought that since I
> > > have it I should make use of it. WIthout the ICMP we would still just
> > > be waiting on the retransmit timer.
> > >
> > > The problem case has a v6-in-v6 tunnel between the client and the
> > > endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS
> > > which works fine until they actually go to send a large packet between
> > > the two. At that point the tunnel is triggering an ICMP_TOOBIG and the
> > > endpoint is stalling since the MSS is dropped to 1400, but the SYN and
> > > data payload were already smaller than that so no retransmits are
> > > being triggered. This results in TFO being 1s slower than non-TFO
> > > because of the failure to trigger the retransmit for the frame that
> > > violated the PMTU. The patch is meant to get the two back into
> > > comparable times.
> >
> > Okay... Have you studied why tcp_v4_mtu_reduced() (and IPv6 equivalent)
> > code does not yet handle the retransmit in TCP_SYN_SENT state ?
>
> The problem lies in tcp_simple_retransmit(). Specifically the loop at
> the start of the function goes to check the retransmit queue to see if
> there are any packets larger than MSS and finds none since we don't
> place the SYN w/ data in there and instead have a separate SYN and
> data packet.
>
> I'm debating if I should take an alternative approach and modify the
> loop at the start of tcp_simple_transmit to add a check for a SYN
> packet, tp->syn_data being set, and then comparing the next frame
> length + MAX_TCP_HEADER_OPTIONS versus mss.
Thanks for bringing up this tricky issue. The root cause seems to be
the special arrangement of storing SYN-data as one-(pure)-SYN and one
non-SYN data segment. Given tcp_simple_transmit probably is not called
frequently, your alternative approach sounds more appealing to me.

Replacing that strange syn|data arrangement for TFO has been on my
wish list for a long time... Ideally it's better to just store the
SYN+data and just carve out the SYN for retransmit.