Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752595AbdFQDMh convert rfc822-to-8bit (ORCPT ); Fri, 16 Jun 2017 23:12:37 -0400 Received: from cmccmta1.chinamobile.com ([221.176.66.79]:57418 "EHLO cmccmta1.chinamobile.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752391AbdFQDMg (ORCPT ); Fri, 16 Jun 2017 23:12:36 -0400 X-RM-TRANSID: 2ee159449e1c09e-080b6 X-RM-SPAM-FLAG: 00000000 X-RM-TRANSID: 2ee959449e1ada3-1a578 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH v2] ip6_tunnel: Correct tos value in collect_md mode From: =?utf-8?B?5Lil5rW35Y+M?= In-Reply-To: <5943EECC.3050809@iogearbox.net> Date: Sat, 17 Jun 2017 11:12:25 +0800 Cc: Peter Dawson , "David S. Miller" , Alexey Kuznetsov , James Morris , Patrick McHardy , Linux Kernel Network Developers , LKML Content-Transfer-Encoding: 8BIT Message-Id: <64E4965D-093F-4703-9E01-E05739200980@cmss.chinamobile.com> References: <1497493829-13050-1-git-send-email-yanhaishuang@cmss.chinamobile.com> <20170615135432.78442af5@gmail.com> <5943EECC.3050809@iogearbox.net> To: Daniel Borkmann X-Mailer: Apple Mail (2.3273) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3651 Lines: 84 > On 16 Jun 2017, at 10:44 PM, Daniel Borkmann wrote: > > On 06/15/2017 05:54 AM, Peter Dawson wrote: >> On Thu, 15 Jun 2017 10:30:29 +0800 >> Haishuang Yan wrote: >> >>> Same as ip_gre, geneve and vxlan, use key->tos as tos value. >>> >>> CC: Peter Dawson >>> Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on >>> encapsulated packets”) >>> Suggested-by: Daniel Borkmann >>> Signed-off-by: Haishuang Yan >>> >>> --- >>> Changes since v2: >>> * Add fixes information >>> * mask key->tos with RT_TOS() suggested by Daniel >>> --- >>> net/ipv6/ip6_tunnel.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c >>> index ef99d59..6400726 100644 >>> --- a/net/ipv6/ip6_tunnel.c >>> +++ b/net/ipv6/ip6_tunnel.c >>> @@ -1249,7 +1249,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, >>> fl6.flowi6_proto = IPPROTO_IPIP; >>> fl6.daddr = key->u.ipv6.dst; >>> fl6.flowlabel = key->label; >>> - dsfield = ip6_tclass(key->label); >>> + dsfield = RT_TOS(key->tos); >>> } else { >>> if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) >>> encap_limit = t->parms.encap_limit; >>> @@ -1320,7 +1320,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, >>> fl6.flowi6_proto = IPPROTO_IPV6; >>> fl6.daddr = key->u.ipv6.dst; >>> fl6.flowlabel = key->label; >>> - dsfield = ip6_tclass(key->label); >>> + dsfield = RT_TOS(key->tos); >>> } else { >>> offset = ip6_tnl_parse_tlv_enc_lim(skb, skb_network_header(skb)); >>> /* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head */ >> >> I don't think it is correct to apply RT_TOS >> >> Here is my understanding based on the RFCs. >> >> IPv4/6 Header:0 |0 1 2 3 |0 1 2 3 |0 1 2 3 |0 1 2 3 | >> RFC2460(IPv6) |Version | Traffic Class | | >> RFC2474(IPv6) |Version | DSCP |ECN| | >> RFC2474(IPv4) |Version | IHL | DSCP |ECN| >> RFC1349(IPv4) |Version | IHL | PREC | TOS |X| >> RFC791 (IPv4) |Version | IHL | TOS | >> >> u8 key->tos stores the full 8bits of Traffic class from an IPv6 header and; >> u8 key->tos stores the full 8bits of TOS(RFC791) from an IPv4 header >> u8 ip6_tclass will return the full 8bits of Traffic Class from an IPv6 flowlabel >> >> RT_TOS will return the RFC1349 4bit TOS field. >> >> Applying RT_TOS to a key->tos will result in lost information and the inclusion of 1 bit of ECN if the original field was a DSCP+ECN. >> >> Based on this understanding of the RFCs (but not years of experience) and since RFC1349 has been obsoleted by RFC2474 I think the use of RT_TOS should be deprecated. >> >> This being said, dsfield = ip6_tclass(key->label) = key->tos isn't fully correct either because the result will contain the ECN bits as well as the DSCP. >> >> I agree that code should be consistent, but not where there is a potential issue. > > Yeah, you're right. Looks like initial dsfield = key->tos diff was > the better choice then, sorry for my confusing comment. > > For example, bpf_skb_set_tunnel_key() helper that populates the collect > metadata as one user of this infra masks the key->label so that it really > only holds the label meaning previous dsfield = ip6_tclass(key->label) > will always be 0 in that case unlike key->tos that actually gets populated > and would propagate it. > Okay, I will change the commit back to initial version, thanks everyone.