Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
req->ts_recent. In this case, theoretically the req should not be ignored.
But in fact, it could be ignored, if peer->tcp_ts is so small that the
difference between this two number is larger than 2 to the power of 31.
I understand that under this situation, timestamp does not make sense any more,
because it actually comes from difference machines. However, if anyone
ever need to do the same investigation which I have done, this will
save some time for him.
Signed-off-by: Zhang Le <[email protected]>
---
net/ipv4/tcp_ipv4.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8f8527d..1eb4974 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
peer->v4daddr == saddr) {
inet_peer_refcheck(peer);
if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
- (s32)(peer->tcp_ts - req->ts_recent) >
- TCP_PAWS_WINDOW) {
+ ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
+ peer->tcp_ts > req->ts_recent)) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
goto drop_and_release;
}
--
1.7.3.2
Le dimanche 14 novembre 2010 à 15:35 +0800, Zhang Le a écrit :
> Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
> req->ts_recent. In this case, theoretically the req should not be ignored.
>
> But in fact, it could be ignored, if peer->tcp_ts is so small that the
> difference between this two number is larger than 2 to the power of 31.
>
> I understand that under this situation, timestamp does not make sense any more,
> because it actually comes from difference machines. However, if anyone
> ever need to do the same investigation which I have done, this will
> save some time for him.
>
> Signed-off-by: Zhang Le <[email protected]>
> ---
> net/ipv4/tcp_ipv4.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 8f8527d..1eb4974 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> peer->v4daddr == saddr) {
> inet_peer_refcheck(peer);
> if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
> - (s32)(peer->tcp_ts - req->ts_recent) >
> - TCP_PAWS_WINDOW) {
> + ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
> + peer->tcp_ts > req->ts_recent)) {
> NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
> goto drop_and_release;
> }
This seems very wrong to me.
Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
to help. And it might break some working setups, because of wrap around.
Really, if you have multiple clients behind a common NAT, you cannot use
this code at all, since NAT doesnt usually change TCP timestamps.
What about following patch instead ?
[PATCH] doc: extend tcp_tw_recycle documentation
tcp_tw_recycle should not be used on a server if there is a chance
clients are behind a same NAT. Document this fact before too many users
discover this too late.
Signed-off-by: Eric Dumazet <[email protected]>
---
Documentation/networking/ip-sysctl.txt | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index c7165f4..406f0d5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
tcp_tw_recycle - BOOLEAN
Enable fast recycling TIME-WAIT sockets. Default value is 0.
It should not be changed without advice/request of technical
- experts.
+ experts. If you set it to 1, make sure you dont miss connections
+ attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
+ In particular, this might break if several clients are behind
+ a common NAT device, since their TCP timestamp wont be changed
+ by the NAT. tcp_tw_recycle should be used with care, most
+ probably in private networks.
tcp_tw_reuse - BOOLEAN
Allow to reuse TIME-WAIT sockets for new connections when it is
On 09:52 Sun 14 Nov , Eric Dumazet wrote:
> Le dimanche 14 novembre 2010 ? 15:35 +0800, Zhang Le a ?crit :
> > Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
> > req->ts_recent. In this case, theoretically the req should not be ignored.
> >
> > But in fact, it could be ignored, if peer->tcp_ts is so small that the
> > difference between this two number is larger than 2 to the power of 31.
> >
> > I understand that under this situation, timestamp does not make sense any more,
> > because it actually comes from difference machines. However, if anyone
> > ever need to do the same investigation which I have done, this will
> > save some time for him.
> >
> > Signed-off-by: Zhang Le <[email protected]>
> > ---
> > net/ipv4/tcp_ipv4.c | 4 ++--
> > 1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 8f8527d..1eb4974 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> > peer->v4daddr == saddr) {
> > inet_peer_refcheck(peer);
> > if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
> > - (s32)(peer->tcp_ts - req->ts_recent) >
> > - TCP_PAWS_WINDOW) {
> > + ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
> > + peer->tcp_ts > req->ts_recent)) {
> > NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
> > goto drop_and_release;
> > }
>
> This seems very wrong to me.
>
> Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
> to help. And it might break some working setups, because of wrap around.
Yeah, you are right. And sorry for overlooking this.
I should have reviewed time_{before,after}'s implementation before posting this.
So it seems we can't do anything to improve this except to add some warning in
documentation. Maybe some comments in the code too.
>
> Really, if you have multiple clients behind a common NAT, you cannot use
> this code at all, since NAT doesnt usually change TCP timestamps.
>
> What about following patch instead ?
>
> [PATCH] doc: extend tcp_tw_recycle documentation
>
> tcp_tw_recycle should not be used on a server if there is a chance
> clients are behind a same NAT. Document this fact before too many users
> discover this too late.
>
> Signed-off-by: Eric Dumazet <[email protected]>
> ---
> Documentation/networking/ip-sysctl.txt | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index c7165f4..406f0d5 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
> tcp_tw_recycle - BOOLEAN
> Enable fast recycling TIME-WAIT sockets. Default value is 0.
> It should not be changed without advice/request of technical
> - experts.
> + experts. If you set it to 1, make sure you dont miss connections
> + attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
> + In particular, this might break if several clients are behind
> + a common NAT device, since their TCP timestamp wont be changed
> + by the NAT. tcp_tw_recycle should be used with care, most
> + probably in private networks.
>
> tcp_tw_reuse - BOOLEAN
> Allow to reuse TIME-WAIT sockets for new connections when it is
>
>
--
Zhang, Le
Gentoo/Loongson Developer
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973
From: Eric Dumazet <[email protected]>
Date: Sun, 14 Nov 2010 09:52:25 +0100
> Really, if you have multiple clients behind a common NAT, you cannot use
> this code at all, since NAT doesnt usually change TCP timestamps.
NAT is %100 incompatible with TW recycling, full stop.
There is no maybe, or maybe not.
If you are behind NAT you must not turn this feature on, ever.
On 11:55 Sun 14 Nov , David Miller wrote:
> From: Eric Dumazet <[email protected]>
> Date: Sun, 14 Nov 2010 09:52:25 +0100
>
> > Really, if you have multiple clients behind a common NAT, you cannot use
> > this code at all, since NAT doesnt usually change TCP timestamps.
>
> NAT is %100 incompatible with TW recycling, full stop.
>
> There is no maybe, or maybe not.
>
> If you are behind NAT you must not turn this feature on, ever.
Sorry, this question may be OT on this list, but I am just curious:
Is there any other OS has implemented this feature like Linux?
To be very specific, by this feature, I mean rejecting old duplicates based
on per-host cache of last timestamp received from any connections.
As suggested in RFC1323 Appendix B.2 (b).
Does anyone, by any chance, know the answer? Thanks in advance!
--
ZHANG, Le
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973