Received: by 10.213.65.68 with SMTP id h4csp1611572imn; Thu, 15 Mar 2018 04:58:08 -0700 (PDT) X-Google-Smtp-Source: AG47ELt4CULas2qT6XNAlvauL3uAmliOmja8To8h69nrkuo1bEBvNCaBvJQE9fLHaL1d6VLdClXO X-Received: by 10.101.92.196 with SMTP id b4mr6413588pgt.27.1521115088673; Thu, 15 Mar 2018 04:58:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521115088; cv=none; d=google.com; s=arc-20160816; b=mzTjzjhUu17ohlzT1yHbyFyKmDe38+RkQMdyphSGpxxJMzL170Q3ZkwUBIIxLtDF8P +rjw9xIPBXAFCy9oP6PdEeAaWqGH1/LfPpucJTJA7CJ3mCtfWYLthcZdgHjjVgVRKyjk fzX77qKISsyR9qeQOsSqyJfoNPq0tn+U9VrXzKh7ACduNP+A8TokagxzOlrate6W82gF HrQHaGcaEOkpuzBw/zd+e8pj4pyzz5AkfNOWdCKOSVX2NTPfeWOujilCFfeU540gNk5f EJ2Cphd5UmvajdjIF6z9LX7JH3GqnoMEkBLv61EniaWYp3Yfm/WL8dUfCShnxY+uTL/G qLJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=SDIV1KKxl5zI31W1QRtKWLm6kZ6mdZ9BX6DZFzI4710=; b=XlmDC81kMXajau9xDbg51XsstZVHCqhhsL4jTN63ILmnG+dg+Y6BKg7GGINDlj6F7b 3yR9MbsUltScIu/fddM1WDJErLv4p1XA5H7qcYlLh39izCFqEzr9f1jfdXpK+DqyFdqz H2zdr9DxyL2p1n/Nsq5g+drToQIq9GuXEfWGcN56q9wiK1dhyJ3FbyVhdW/Rc3Yk0EXl 8NKhsrEQau7uW0ztcPoEUlLGHfmmXqO3CtYcISJx0FsAVMnPpPcovtKk7LlVJFcvS/sV 86EqavvqmRx1a8iRt8Uo+V6SoUcoJLe9OKD5rPpMKbUmcVV4oymTWs7pKbYRDX9+YXmp yilA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f19-v6si3894373plj.292.2018.03.15.04.57.51; Thu, 15 Mar 2018 04:58:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751996AbeCOL4W (ORCPT + 99 others); Thu, 15 Mar 2018 07:56:22 -0400 Received: from www62.your-server.de ([213.133.104.62]:42992 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751353AbeCOL4U (ORCPT ); Thu, 15 Mar 2018 07:56:20 -0400 Received: from [62.202.221.10] (helo=linux.home) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-SHA:256) (Exim 4.85_2) (envelope-from ) id 1ewRUU-0004WD-VV; Thu, 15 Mar 2018 12:56:15 +0100 Subject: Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns To: Shmulik Ladkani , Liran Alon Cc: davem@davemloft.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, idan.brown@oracle.com, Yuval Shaia References: <1520953642-8145-1-git-send-email-liran.alon@oracle.com> <20180315112150.58586758@halley> From: Daniel Borkmann Message-ID: Date: Thu, 15 Mar 2018 12:56:13 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20180315112150.58586758@halley> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.99.3/24395/Thu Mar 15 09:14:06 2018) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/15/2018 10:21 AM, Shmulik Ladkani wrote: > Hi, > > On Tue, 13 Mar 2018 17:07:22 +0200 Liran Alon wrote: >> Before this commit, dev_forward_skb() always cleared packet's >> per-network-namespace info. Even if the packet doesn't cross >> network namespaces. >> >> The comment above dev_forward_skb() describes that this is done >> because the receiving device may be in another network namespace. >> However, this case can easily be tested for and therefore we can >> scrub packet's per-network-namespace info only when receiving device >> is indeed in another network namespace. >> >> Therefore, this commit changes ____dev_forward_skb() to tell >> skb_scrub_packet() that skb has crossed network-namespace only in case >> transmitting device (skb->dev) network namespace is different then >> receiving device (dev) network namespace. > > Assuming the premise of this commit is correct, note it may not act as > intended for xnet situation in ipvlan_process_multicast, snip: > > nskb->dev = ipvlan->dev; > if (tx_pkt) > ret = dev_forward_skb(ipvlan->dev, nskb); > else > ret = netif_rx(nskb); > > as 'dev' gets already assigned to nskb prior dev_forward_skb (hence in > ____dev_forward_skb both dev and skb->dev are the same). > Fortunately every ipvlan_multicast_enqueue call is preceded by a forced > scrub; It would be future-proof to not assign nskb->dev in the > dev_forward_skb case (assign it only in the netif_rx case). > > > Regarding the premise of this commit, this "reduces" the > ipvs/orphan/mark scrubbing in the following *non* xnet situations: > > 1. mac2vlan port xmit to other macvlan ports in Bridge Mode > 2. similarly for ipvlan > 3. veth xmit > 4. l2tp_eth_dev_recv > 5. bpf redirect/clone_redirect ingress actions > > Regarding l2tp recv, this commit seems to align the srubbing behavior > with ip tunnels (full scrub only if crossing netns, see ip_tunnel_rcv). > > Regarding veth xmit, it does makes sense to preserve the fields if not > crossing netns. This is also the case when one uses tc mirred. > > Regarding bpf redirect, well, it depends on the expectations of each bpf > program. > I'd argue that preserving the fields (at least the mark field) in the > *non* xnet makes sense and provides more information and therefore more > capabilities; Alas this might change behavior already being relied on. > > Maybe Daniel can comment on the matter. Overall I think it might be nice to not need scrubbing skb in such cases, although my concern would be that this has potential to break existing setups when they would expect mark being zero on other veth peer in any case since it's the behavior for a long time already. The safer option would be to have some sort of explicit opt-in e.g. on link creation to let the skb->mark pass through unscrubbed. This would definitely be a useful option e.g. when mark is set in the netns facing veth via clsact/egress on xmit and when the container is unprivileged anyway. Thanks, Daniel