Received: by 10.192.165.148 with SMTP id m20csp585300imm; Wed, 9 May 2018 19:09:13 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpMUzXB9+AjUX6vx9yjc6FHX7NJ1/hN3dWybCsDjYMZJzV2+Y5PNci+tyJdD/tazOr01Odw X-Received: by 10.98.17.82 with SMTP id z79mr45471107pfi.135.1525918153431; Wed, 09 May 2018 19:09:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525918153; cv=none; d=google.com; s=arc-20160816; b=OvxPW8hFHsTqc+scMD5UhTPMtGCzX2xkj6IuazZeufxLU/iDFbvbGUJsYQ90hnn9rf v88jsF5eyK3Ix590TJ0e/4Vaz2+DLJLy4C2FrNyvzZF2fJpFa51yY8HxmRBYkwpmQk/1 hooeJBM2EHR/FUbyO2DkuQcFdBihex+fgMFfsQsbJ5FRcrqiNNSEUhpXpX/d5O+JrlnV JX9jRjIPc+hjS08QZxs/JgRWzkrQMi7/g3bKXixx+TV9cOvEwu9BdOLAj/i50fp54rIm 9nsJe8PybmAdRZhtCSWgkh+8B3ygT/aKeZJ9R3mNk53CehupFRHvy6PMQ27UmFS89c4x M82Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=7khdOySY3QXtguPfMqBo0TtzXL+z5LT3eQmGqkzj58A=; b=NzSPVY3FOVgiu707DE63wmuXYBZTDxOxsVU9Phnv981Q2tdqGtc8yhjRXUc4D/AXkR rCS6Hc8qNsAj3Qk0MdJKuN0x5VwBN5E+Asx2jPj0wRgcqNNyW1+ByNq+wdnkuElzT1PI povCW7o7Om4XVHZCtLKvFOE1Uu/R5gTwmUr9kOBknxnwK5PQr1D+hqqELed4HAWSTwOQ h5q4aBKdbCwUEZ9w8ff/OClvaHn+NyUMbbUS/D1WM+piYiVDpo1sMw4x8RS9E154pZRn f1snmaKWiKuCh3whNlvCiHoX2RKONjhkD0YlUYmWb6EwM2d8d7GaKZyMz5bI43vGb2BH BphQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=B50rWGXb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i6-v6si1478238pgq.416.2018.05.09.19.08.58; Wed, 09 May 2018 19:09:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=B50rWGXb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933833AbeEJCIb (ORCPT + 99 others); Wed, 9 May 2018 22:08:31 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:44102 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756336AbeEJCI3 (ORCPT ); Wed, 9 May 2018 22:08:29 -0400 Received: by mail-pf0-f196.google.com with SMTP id q22-v6so281842pff.11; Wed, 09 May 2018 19:08:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=7khdOySY3QXtguPfMqBo0TtzXL+z5LT3eQmGqkzj58A=; b=B50rWGXbtRi0Sod3khc5KQqY8Bga2lOvE6JikoV52jR+No0Tc44zUAG7JggsAMkyF3 Kwq2sARK3WPjJH5RLYKh1yGGwlXARFqSOyBUvYHrj9Izf9bJZyOfxozApBIJZdQTEK1X HrJObNSidbBdYsT1xE/S6XevWmILj9syim7a01Uvbx/sXBc3Ez5gjvPebzGedi2QIAxq 6L/PqwXuqmcmUyf2H5YhpuqRuob3ciw60FRczWDx3NQ+0AZl3yJF6BhFUEiE8G+WvkpW 7kfAzKYndlqWDQUVmy+GOWFgQCMroSrzVPMJu8W5aq+yzWLHEJwE+1g2P41YQA8Alfh+ NfZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=7khdOySY3QXtguPfMqBo0TtzXL+z5LT3eQmGqkzj58A=; b=iWc8a3qtamBVwUSutsmBnGKJ8Kcz4AmqT484rlOVP1YE+pnlBqU5w3SODPE/PFWXNY R3dy09OdSZZDJRr2DTFJaVSjCjcscJXnDCAL43H1ltOurrZn/hdZKiMNxoKjRiVvlSaz vAYOq917eHZQ2DBhipcuXQ1TeCI0smKvV4tq4thTknytxGXno3F20OqtJUPDXBpv0Duh EaGe3r3G81NWJAT7pNlfweCY53oKHz2TWhyJyXSns12UaoPA949xwYlgCvhYYnfEmZgI qxcPm5y7TRRgzarFUdV9i20/WnZfm+rWPy2QKzzYbJ/oB07obA0vaw+ogyr1nQ3agNlp 6x5g== X-Gm-Message-State: ALQs6tDjWw5dbo2ghbj4sVIMrrHCXg3bSiTbbfM2+UAB23NnI/RVnUi8 gAfOHuWpd4asfcMHrvlGwT4= X-Received: by 2002:a65:4907:: with SMTP id p7-v6mr37508919pgs.139.1525918108976; Wed, 09 May 2018 19:08:28 -0700 (PDT) Received: from 192-168-1-116.tpgi.com.com (110-175-8-199.static.tpgi.com.au. [110.175.8.199]) by smtp.gmail.com with ESMTPSA id i69sm10405833pfk.84.2018.05.09.19.08.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 09 May 2018 19:08:27 -0700 (PDT) From: Jon Maxwell To: davem@davemloft.net Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, jmaxwell@redhat.com Subject: [PATCH net-next] tcp: Add mark for TIMEWAIT sockets Date: Thu, 10 May 2018 12:07:39 +1000 Message-Id: <20180510020739.8599-1-jmaxwell37@gmail.com> X-Mailer: git-send-email 2.13.6 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Aidan McGurn from Openwave Mobility systems reported the following bug: "Marked routing is broken on customer deployment. Its effects are large increase in Uplink retransmissions caused by the client never receiving the final ACK to their FINACK - this ACK misses the mark and routes out of the incorrect route." Currently marks are added to sk_buffs for replies when the "fwmark_reflect" sysctl is enabled. But not for TIME_WAIT sockets where the original socket had sk->sk_mark set via setsockopt(SO_MARK..). Fix this in IPv4/v6 by adding tw->tw_mark for TIME_WAIT sockets. Copy the the original sk->sk_mark in __inet_twsk_hashdance() to the new tw->tw_mark location. Then copy this into ctl_sk->sk_mark so that the skb gets sent with the correct mark. Do the same for resets. Give the "fwmark_reflect" sysctl precedence over sk->sk_mark so that netfilter rules are still honored. Signed-off-by: Jon Maxwell --- include/net/inet_timewait_sock.h | 1 + net/ipv4/ip_output.c | 3 ++- net/ipv4/tcp_ipv4.c | 18 ++++++++++++++++-- net/ipv4/tcp_minisocks.c | 1 + net/ipv6/tcp_ipv6.c | 8 +++++++- 5 files changed, 27 insertions(+), 4 deletions(-) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index c7be1ca8e562..659d8ed5a3bc 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -62,6 +62,7 @@ struct inet_timewait_sock { #define tw_dr __tw_common.skc_tw_dr int tw_timeout; + __u32 tw_mark; volatile unsigned char tw_substate; unsigned char tw_rcv_wscale; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 95adb171f852..cca4412dc4cb 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1539,6 +1539,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, struct sk_buff *nskb; int err; int oif; + __u32 mark = IP4_REPLY_MARK(net, skb->mark); if (__ip_options_echo(net, &replyopts.opt.opt, skb, sopt)) return; @@ -1561,7 +1562,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, oif = skb->skb_iif; flowi4_init_output(&fl4, oif, - IP4_REPLY_MARK(net, skb->mark), + mark ? (mark) : sk->sk_mark, RT_TOS(arg->tos), RT_SCOPE_UNIVERSE, ip_hdr(skb)->protocol, ip_reply_arg_flowi_flags(arg), diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f70586b50838..fbee36579c83 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -621,6 +621,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) struct sock *sk1 = NULL; #endif struct net *net; + struct sock *ctl_sk; /* Never send a reset in response to a reset. */ if (th->rst) @@ -723,11 +724,17 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) arg.tos = ip_hdr(skb)->tos; arg.uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL); local_bh_disable(); - ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk), + ctl_sk = *this_cpu_ptr(net->ipv4.tcp_sk); + if (sk && sk->sk_state == TCP_TIME_WAIT) + ctl_sk->sk_mark = inet_twsk(sk)->tw_mark; + else if (sk && sk_fullsock(sk)) + ctl_sk->sk_mark = sk->sk_mark; + ip_send_unicast_reply(ctl_sk, skb, &TCP_SKB_CB(skb)->header.h4.opt, ip_hdr(skb)->saddr, ip_hdr(skb)->daddr, &arg, arg.iov[0].iov_len); + ctl_sk->sk_mark = 0; __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); __TCP_INC_STATS(net, TCP_MIB_OUTRSTS); local_bh_enable(); @@ -759,6 +766,7 @@ static void tcp_v4_send_ack(const struct sock *sk, } rep; struct net *net = sock_net(sk); struct ip_reply_arg arg; + struct sock *ctl_sk; memset(&rep.th, 0, sizeof(struct tcphdr)); memset(&arg, 0, sizeof(arg)); @@ -809,11 +817,17 @@ static void tcp_v4_send_ack(const struct sock *sk, arg.tos = tos; arg.uid = sock_net_uid(net, sk_fullsock(sk) ? sk : NULL); local_bh_disable(); - ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk), + ctl_sk = *this_cpu_ptr(net->ipv4.tcp_sk); + if (sk && sk->sk_state == TCP_TIME_WAIT) + ctl_sk->sk_mark = inet_twsk(sk)->tw_mark; + else if (sk && sk_fullsock(sk)) + ctl_sk->sk_mark = sk->sk_mark; + ip_send_unicast_reply(ctl_sk, skb, &TCP_SKB_CB(skb)->header.h4.opt, ip_hdr(skb)->saddr, ip_hdr(skb)->daddr, &arg, arg.iov[0].iov_len); + ctl_sk->sk_mark = 0; __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); local_bh_enable(); } diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 57b5468b5139..f867658b4b30 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -263,6 +263,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo) struct inet_sock *inet = inet_sk(sk); tw->tw_transparent = inet->transparent; + tw->tw_mark = sk->sk_mark; tw->tw_rcv_wscale = tp->rx_opt.rcv_wscale; tcptw->tw_rcv_nxt = tp->rcv_nxt; tcptw->tw_snd_nxt = tp->snd_nxt; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 6d664d83cd16..a6f876125091 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -803,6 +803,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 unsigned int tot_len = sizeof(struct tcphdr); struct dst_entry *dst; __be32 *topt; + __u32 mark = IP6_REPLY_MARK(net, skb->mark); if (tsecr) tot_len += TCPOLEN_TSTAMP_ALIGNED; @@ -871,11 +872,16 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 fl6.flowi6_oif = oif; } - fl6.flowi6_mark = IP6_REPLY_MARK(net, skb->mark); + if (sk && sk->sk_state == TCP_TIME_WAIT) + ctl_sk->sk_mark = inet_twsk(sk)->tw_mark; + else if (sk && sk_fullsock(sk)) + ctl_sk->sk_mark = sk->sk_mark; + fl6.flowi6_mark = mark ? (mark) : ctl_sk->sk_mark; fl6.fl6_dport = t1->dest; fl6.fl6_sport = t1->source; fl6.flowi6_uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL); security_skb_classify_flow(skb, flowi6_to_flowi(&fl6)); + ctl_sk->sk_mark = 0; /* Pass a socket to ip6_dst_lookup either it is for RST * Underlying function will use this to retrieve the network -- 2.13.6