Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp525352lqo; Thu, 16 May 2024 13:09:59 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXBs0fCJn01nTCyrM0UhpoIfKF+U+W18lZTPL8UsA2UVZo7m5c3PXE9mzqO1tc6tECGsf3wQRLrLxeieKUZ7cGsRzItx9Vvo8DkSA/uhg== X-Google-Smtp-Source: AGHT+IHr2c7VLq6TafQHNtdqi3jqyGiBBtbTuGpIC6DZPX+K4fIASr1B0aqURFrePw+r3AbNaChV X-Received: by 2002:a50:c048:0:b0:574:ec30:6c94 with SMTP id 4fb4d7f45d1cf-574ec30710cmr5558990a12.19.1715890198819; Thu, 16 May 2024 13:09:58 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715890198; cv=pass; d=google.com; s=arc-20160816; b=Zh9BIgMTZHotAiGUqYGchqyA9EDmMXU8iFEABz7EvCYyBsJYe/thQmdccd1DpKgVWb kHNciBS+/Yb3vN8QxdIPoxtgUa2JQYuOUYrdtn6nSMJ6e4mRzbwORol4Y33icHg3XIZW nG6BRwny1FtTZj1hXs8As+9SJx4ypsPFQ7wn1xZusxsBMa5MuAgTOmdNSkJpAnZJmD+W I8+Ec06yXZ+z2/25k8jDzFHAiT9ysS6aYprljTgmrkYZMC9s7h7YcKANLzM39u2+cjCf lxp918gQ9QlHTsN6pzVMHuh2ba9PWSjshnEr4S+8isxHVRtqD6dVQB5wI/zK8d8RV0K8 LDLQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=enjpFwxkMXeucLrOY4GgEKclmokFGniBJ7j/L6UJXVo=; fh=eZNKPekGACxR1mDwixUoVK0UeCSiOwrnrPrGvA0LjC4=; b=Jk+AGm3FdRYHvQtXhMbQWj9SOMmtuq2+uvwYNgZtgY9yJku4HEf0ha/pb/5yAsew7O QQlBsomZG/wOZyrrN4DDSr4PdJvRxHwIAM5YwEBYVjFi2dzipCdYsBkGyYgsgnhtiyPC GjXaqAhnQS2IH1+Ra/6p+93YmWQXVLNLGOCc0oizDWH41eDP23sKGzcbvo5qAFFuy7wB bWZbq4k7RSZxgd+6fHGnouUdEiNiRH/2iPhDzs4qooKKLfRgADqgyIeqsiW3JYbTmG1w 78M3SCMvqmegEpfBXPhm934OR3V+uHJP/RwwN/WNPWpAsohEZdApksuTZu46Z35Tplv1 8eYA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gnUI6+OC; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-181556-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-181556-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id 4fb4d7f45d1cf-5733c3784f5si8645470a12.628.2024.05.16.13.09.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 May 2024 13:09:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-181556-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gnUI6+OC; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-181556-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-181556-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 4E0681F21A7C for ; Thu, 16 May 2024 20:09:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 95745157A53; Thu, 16 May 2024 20:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gnUI6+OC" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDFE6147C72 for ; Thu, 16 May 2024 20:09:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715890189; cv=none; b=ZlXwyX2Jdswi5NFTb7Pcl5SWYDeoqf4LQLSLRDKD+/54VxpBRnxd+FiQFy+f0Hx/MZdfNUBPIcgs8EM7i+Aq2xNXwT8XBRfmoVAV4DVJFXy90+yxBEnoRCkVdmdcG6aZqEYg3iHU6QnGi/AeLRWBrgHgKlKx54/kfviSNUubtNA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715890189; c=relaxed/simple; bh=EUpklSxQlvL3tJx+fqvIYMYmd+ITdKsrjLKSxUT+MoQ=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Ukv/OfES8bHt3VpEQI/X+lX6ohFuQUYar/kSKfBmU9yiK5cqM7n+TugX/he3xt3JWZLwIycvjTTNHtHjVIaPTjMUBxSB+XrLnfQ05Q5vJ2cnm1g0x8LlxM4eGUueMHNXMXmF0BBx3CDyjXOHBroMbt02SNH3rj0UaKHCztYhJHk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=gnUI6+OC; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1715890186; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=enjpFwxkMXeucLrOY4GgEKclmokFGniBJ7j/L6UJXVo=; b=gnUI6+OCf4VaoCIlLtfEDSy24gSS7ZoqwiRH/4Hurs1LgqHAmKg4Yhf8AmqOhD5k12UEft 3esxWUwnUifkDWqRLBCYKdXlkYuAvS85xvvEFdUd8bLkCd//KdtLeyuTjTr1saHcHPgk1u XPjFxVY+NK5ZJg99un4dvhM9Pt89Qd0= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-120-tBxs6JIdNQqniI-x0ABOzg-1; Thu, 16 May 2024 16:09:43 -0400 X-MC-Unique: tBxs6JIdNQqniI-x0ABOzg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C670D3806711; Thu, 16 May 2024 20:09:42 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.17.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id D1BF9C15BB1; Thu, 16 May 2024 20:09:41 +0000 (UTC) From: Aaron Conole To: netdev@vger.kernel.org Cc: dev@openvswitch.org, linux-kernel@vger.kernel.org, Pravin B Shelar , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Gross , Eelco Chaudron , Ilya Maximets , Simon Horman , Jaime Caamano Subject: [PATCH v2 net] openvswitch: Set the skbuff pkt_type for proper pmtud support. Date: Thu, 16 May 2024 16:09:41 -0400 Message-ID: <20240516200941.16152-1-aconole@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.8 Open vSwitch is originally intended to switch at layer 2, only dealing with Ethernet frames. With the introduction of l3 tunnels support, it crossed into the realm of needing to care a bit about some routing details when making forwarding decisions. If an oversized packet would need to be fragmented during this forwarding decision, there is a chance for pmtu to get involved and generate a routing exception. This is gated by the skbuff->pkt_type field. When a flow is already loaded into the openvswitch module this field is set up and transitioned properly as a packet moves from one port to another. In the case that a packet execute is invoked after a flow is newly installed this field is not properly initialized. This causes the pmtud mechanism to omit sending the required exception messages across the tunnel boundary and a second attempt needs to be made to make sure that the routing exception is properly setup. To fix this, we set the outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get to the openvswitch module via a port device or packet command. Even for bridge ports as users, the pkt_type needs to be reset when doing the transmit as the packet is truly outgoing and routing needs to get involved post packet transformations, in the case of VXLAN/GENEVE/udp-tunnel packets. In general, the pkt_type on output gets ignored, since we go straight to the driver, but in the case of tunnel ports they go through IP routing layer. This issue is periodically encountered in complex setups, such as large openshift deployments, where multiple sets of tunnel traversal occurs. A way to recreate this is with the ovn-heater project that can setup a networking environment which mimics such large deployments. We need larger environments for this because we need to ensure that flow misses occur. In these environment, without this patch, we can see: ./ovn_cluster.sh start podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200 podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache podman exec ovn-chassis-1 ip netns exec sw01p1 \ ping 21.0.0.3 -M do -s 1300 -c2 PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142) --- 21.0.0.3 ping statistics --- ... Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not sent into the server. With this patch, setting the pkt_type, we see the following: podman exec ovn-chassis-1 ip netns exec sw01p1 \ ping 21.0.0.3 -M do -s 1300 -c2 PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222) ping: local error: message too long, mtu=1222 --- 21.0.0.3 ping statistics --- ... In this case, the first ping request receives the FRAG_NEEDED message and a local routing exception is created. Tested-by: Jaime Caamano Reported-at: https://issues.redhat.com/browse/FDP-164 Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.") Signed-off-by: Aaron Conole --- v1->v2: Include a comment as requested by Eelco, and add some details about bridge port packets. net/openvswitch/actions.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 6fcd7e2ca81fe..9642255808247 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -936,6 +936,12 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, pskb_trim(skb, ovs_mac_header_len(key)); } + /* Need to set the pkt_type to involve the routing layer. The + * packet movement through the OVS datapath doesn't generally + * use routing, but this is needed for tunnel cases. + */ + skb->pkt_type = PACKET_OUTGOING; + if (likely(!mru || (skb->len <= mru + vport->dev->hard_header_len))) { ovs_vport_send(vport, skb, ovs_key_mac_proto(key)); -- 2.45.0