Received: by 10.192.165.148 with SMTP id m20csp4948736imm; Tue, 24 Apr 2018 11:01:51 -0700 (PDT) X-Google-Smtp-Source: AIpwx48FuBiXPutnAkiTuVhg50rV9S+GjBMSCA1n7uJpXZcT1fsfF+xx3TXVvhXz8K/tOHhcFMOn X-Received: by 2002:a17:902:bf49:: with SMTP id u9-v6mr25996597pls.133.1524592910975; Tue, 24 Apr 2018 11:01:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524592910; cv=none; d=google.com; s=arc-20160816; b=Uf5ggM0ytu0TAvJAF/v2rnoJkIFqXOTNCpNDRIyTT125+8zXLUvutieWuZ6xFCJ4qb KIESoZR1q4NKy53OYauLvbhOTWaBPXu3bk0gujKmZcz0vsrT+qjICrlqZ0vuvOsMPJ+t hRZPCEbpZesGit8Q5oBYI5XLGQztQ24jlkIrdWfHMxq4YQdOLogi09FFYfG7adsXWFDh 89I+LYaCHj1SxJ/5xwPQlyA5PvE/bplp13oSQ+ViJv5uc4MUFTh+qpG1jl0wBCoyk9Vv ZCAvEzQeB5jS0f7QKgyxraTuodha83UkiVvJJcQwhk5FD0R4+upbL8JdaYJldu78JUZp xfvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=tl7UV4bBo6figvrBuoFERsu9cV+5CU61lh9nKHBeRf8=; b=XEORaKL95ibOEYMH1v7UuiKHxPJLziMx/IvH50+HVTh93cO5+MwgQBPbpOai5zmHJI MKuKW8Af9huxvUA09ZmZ+Ywr38FDjCVe0f6M6ZEqxy/NlLXmLM/d0mXKYwi2eyjcw02l QsI/o39bMQygxcdB091/kMFumy2/GNAyag6QYVAUglZAHZCmTph0IiNW/2pT4DY2P369 xUyff/2Zqtk7qhqakeE86a/CvSTTa5edYuGxxVx6oG1bFKoy+BVh2GzjfxQ0cx09hC/c P1qQsZhC+v8mtqKY/hQDAUxdR7pqFsVIKiAb/yzq1KfeYEDUepCdx7M5w494oJX0rOjb NUjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=q8cJ4imJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l12si12048830pgc.438.2018.04.24.11.01.36; Tue, 24 Apr 2018 11:01:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=q8cJ4imJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751567AbeDXSAG (ORCPT + 99 others); Tue, 24 Apr 2018 14:00:06 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:36185 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751003AbeDXSAD (ORCPT ); Tue, 24 Apr 2018 14:00:03 -0400 Received: by mail-wm0-f67.google.com with SMTP id n10so2307844wmc.1; Tue, 24 Apr 2018 11:00:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=tl7UV4bBo6figvrBuoFERsu9cV+5CU61lh9nKHBeRf8=; b=q8cJ4imJkVeWR5+EwyOClAzGQlUy71wUcz50bxh7OCqnwmTTCtqv7IOQmEYeyeY9ak +mUVlxpsI3J6RHQ+8GAJi7btAsUql6Ib0BN45QHW5p41whbh0r94oR+sQ/4Oh7WGJqdk IEFq7x23dbarR85Ajx0daiyGEqZIYHCb1CxzX+Ba63xZYR0IPtxfZ+x8GudGjObU7sFS 6t4rmGj5p4EQKalVhay/kh+Vs62AFRAAWXbLu1tw5pH7HFAM4gRRC9lwoKrRgOK9/EES wwSPffkdfvzvR5fc/syAMMJMaaalS20poHWEfYYOn/hi2H5UcRZm3I4t+v2N5+PufIvq SfEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=tl7UV4bBo6figvrBuoFERsu9cV+5CU61lh9nKHBeRf8=; b=RJjvK9++Ip8RsxBvDflxA262P2k6l01OvXAMdVS97QaMYBd4k6EbAzyr+EMoE4vknA hglywAFI+okgJCV2s6arLMj93DuqT4ZMZA0S32WslL35+JNIq8Fr+o2SAIhW/1rs52gR sqohgJhYoWuv8dsZ0jOwqboUu4HLB1nfVft9De0/BN/mb1BXoP0vOWPL+4q/sRibHi38 rWMD2hXfeIqUQBHtJIcWr97VILecbStlN6h3n0HOiHrzsx5G+KvYiPkipv0hme8/MZ0z qxUpZs9MhQluhPTfMeXGsH6DxbG0ctQH5UmUai9e9ia276fJwtrNE89MEsgI/cWLIwmX qw1A== X-Gm-Message-State: ALQs6tD98wQMWuI0KYzBcFTy0VrXNmir8za/oz6Qx76Ok9zlKRR6IG3o 7ymdX9irazlXfnh9v0daarY= X-Received: by 10.28.180.8 with SMTP id d8mr12527250wmf.48.1524592801626; Tue, 24 Apr 2018 11:00:01 -0700 (PDT) Received: from localhost.localdomain ([192.135.27.140]) by smtp.gmail.com with ESMTPSA id p33-v6sm21747598wrc.14.2018.04.24.11.00.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 24 Apr 2018 11:00:00 -0700 (PDT) From: Ahmed Abdelsalam To: davem@davemloft.net, dav.lebrun@gmail.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Ahmed Abdelsalam Subject: [net-next v2] ipv6: sr: Compute flowlabel for outer IPv6 header of seg6 encap mode Date: Tue, 24 Apr 2018 19:59:55 +0200 Message-Id: <1524592795-1467-1-git-send-email-amsalam20@gmail.com> X-Mailer: git-send-email 2.1.4 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ECMP (equal-cost multipath) hashes are typically computed on the packets' 5-tuple(src IP, dst IP, src port, dst port, L4 proto). For encapsulated packets, the L4 data is not readily available and ECMP hashing will often revert to (src IP, dst IP). This will lead to traffic polarization on a single ECMP path, causing congestion and waste of network capacity. In IPv6, the 20-bit flow label field is also used as part of the ECMP hash. In the lack of L4 data, the hashing will be on (src IP, dst IP, flow label). Having a non-zero flow label is thus important for proper traffic load balancing when L4 data is unavailable (i.e., when packets are encapsulated). Currently, the seg6_do_srh_encap() function extracts the original packet's flow label and set it as the outer IPv6 flow label. There are two issues with this behaviour: a) There is no guarantee that the inner flow label is set by the source. b) If the original packet is not IPv6, the flow label will be set to zero (e.g., IPv4 or L2 encap). This patch adds a function, named seg6_make_flowlabel(), that computes a flow label from a given skb. It supports IPv6, IPv4 and L2 payloads, and leverages the per namespace 'seg6_flowlabel" sysctl value. The currently support behaviours are as follows: -1 set flowlabel to zero. 0 copy flowlabel from Inner paceket in case of Inner IPv6 (Set flowlabel to 0 in case IPv4/L2) 1 Compute the flowlabel using seg6_make_flowlabel() This patch has been tested for IPv6, IPv4, and L2 traffic. Signed-off-by: Ahmed Abdelsalam --- include/net/netns/ipv6.h | 1 + net/ipv6/seg6_iptunnel.c | 24 ++++++++++++++++++++++-- net/ipv6/sysctl_net_ipv6.c | 8 ++++++++ 3 files changed, 31 insertions(+), 2 deletions(-) diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h index 97b3a54..c978a31 100644 --- a/include/net/netns/ipv6.h +++ b/include/net/netns/ipv6.h @@ -43,6 +43,7 @@ struct netns_sysctl_ipv6 { int max_hbh_opts_cnt; int max_dst_opts_len; int max_hbh_opts_len; + int seg6_flowlabel; }; struct netns_ipv6 { diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c index 5fe1394..3d9cd86 100644 --- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -91,6 +91,24 @@ static void set_tun_src(struct net *net, struct net_device *dev, rcu_read_unlock(); } +/* Compute flowlabel for outer IPv6 header */ +__be32 seg6_make_flowlabel(struct net *net, struct sk_buff *skb, + struct ipv6hdr *inner_hdr) +{ + int do_flowlabel = net->ipv6.sysctl.seg6_flowlabel; + __be32 flowlabel = 0; + u32 hash; + + if (do_flowlabel > 0) { + hash = skb_get_hash(skb); + rol32(hash, 16); + flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK; + } else if (!do_flowlabel && skb->protocol == htons(ETH_P_IPV6)) { + flowlabel = ip6_flowlabel(inner_hdr); + } + return flowlabel; +} + /* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto) { @@ -99,6 +117,7 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto) struct ipv6hdr *hdr, *inner_hdr; struct ipv6_sr_hdr *isrh; int hdrlen, tot_len, err; + __be32 flowlabel; hdrlen = (osrh->hdrlen + 1) << 3; tot_len = hdrlen + sizeof(*hdr); @@ -119,12 +138,13 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto) * decapsulation will overwrite inner hlim with outer hlim */ + flowlabel = seg6_make_flowlabel(net, skb, inner_hdr); if (skb->protocol == htons(ETH_P_IPV6)) { ip6_flow_hdr(hdr, ip6_tclass(ip6_flowinfo(inner_hdr)), - ip6_flowlabel(inner_hdr)); + flowlabel); hdr->hop_limit = inner_hdr->hop_limit; } else { - ip6_flow_hdr(hdr, 0, 0); + ip6_flow_hdr(hdr, 0, flowlabel); hdr->hop_limit = ip6_dst_hoplimit(skb_dst(skb)); } diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c index 6fbdef6..e15cd37 100644 --- a/net/ipv6/sysctl_net_ipv6.c +++ b/net/ipv6/sysctl_net_ipv6.c @@ -152,6 +152,13 @@ static struct ctl_table ipv6_table_template[] = { .extra1 = &zero, .extra2 = &one, }, + { + .procname = "seg6_flowlabel", + .data = &init_net.ipv6.sysctl.seg6_flowlabel, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, { } }; @@ -217,6 +224,7 @@ static int __net_init ipv6_sysctl_net_init(struct net *net) ipv6_table[12].data = &net->ipv6.sysctl.max_dst_opts_len; ipv6_table[13].data = &net->ipv6.sysctl.max_hbh_opts_len; ipv6_table[14].data = &net->ipv6.sysctl.multipath_hash_policy, + ipv6_table[15].data = &net->ipv6.sysctl.seg6_flowlabel; ipv6_route_table = ipv6_route_sysctl_init(net); if (!ipv6_route_table) -- 2.1.4