Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp1348659lqp; Fri, 22 Mar 2024 12:06:24 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXw+cQGkbVMHdeMgNgIQ7nQ32Dc2OUIVuSbFfrDNmmZIrfXXZPcwuBcXG7hHlrB0HkHMvb91MA+srqXvUYxuEqoN0I0VSVJkjr+zIVPNg== X-Google-Smtp-Source: AGHT+IFZN0UXaREsoHQU1QulvnQ647CdtZ3auSwbfLF/8i4tQi/uph0WN5zAWSFOr2/RSSPzJ92c X-Received: by 2002:a05:6e02:216a:b0:368:5172:17a6 with SMTP id s10-20020a056e02216a00b00368517217a6mr480728ilv.8.1711134384629; Fri, 22 Mar 2024 12:06:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711134384; cv=pass; d=google.com; s=arc-20160816; b=RJRf1wc36i6lkVFZH/lELTZjv94cKVVHy3WxehYGIbcrmT9JOdc2aoaafQdHvcQsUT IvBUhYNJ0qgXhk8mEIERD1fjw1vngNrav4ZXefKf49AtAk4oQh/x5uG6dfkAZoIpXb01 bNWAOpPWUC26T83Exk8npeBZaOsl4oU4LZcj2f2tMitxQErYosgo85Ld7nedCNu/EQmb NHKjw+5Fdi+zciIqiIAWVUSf9qgean8UV+yePxYAw29cEjZdHKP/zL7w+rSruI3ALiFB oRV/k4T3qIhzYjSAVAY/m+RDJ/6dDPS3RB3CEoregHddOh80KurMfPNHuA39vTInI1MU UowQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=bYv+g1I7ptnV6etPECbKRE2B/hEhZ4xk0E1ZHt/7Sok=; fh=y79d+FFUBJZ+gNLLrTM3YY2EQPcxpfaSaitLX7PogZo=; b=r8AdGcMJ7A0HR0rb/AewRl5j4fFxChyP8SK+iIDnXEVhx0TSlbdkE3Ynwib2btUPZf zp5ZaIwuobEY+QSRGHTHUa2QzERiIiKYvSI/hmGUBln0SJaMP3lDSiKjmo7t+f2coYaJ XS1GUN89ZS9BtLU2ZPWFxUoYAJssouHTxAwjw0BGs3qQI8EvRNtejOjihK39is5QvYtQ XHR18YPyMPmuG6ekgjrnBbHxVhQNjD/ZMSWsF92OgvkIHZVNmOWJ2QWlDiBetnkp2zpY dwzE2LuHwgo0pX3lKrk4aXWcBm7VmkI8ec+MVNGuNZD0nxvTH7H8VArIKZgx1IT7dii6 7dMQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cKqi9Wug; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-111977-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-111977-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id w27-20020a63475b000000b005dc15e82c4fsi2573423pgk.170.2024.03.22.12.06.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Mar 2024 12:06:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-111977-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cKqi9Wug; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-111977-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-111977-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 4BB98284C65 for ; Fri, 22 Mar 2024 19:06:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3A09376F1B; Fri, 22 Mar 2024 19:06:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cKqi9Wug" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 661893EA6F for ; Fri, 22 Mar 2024 19:06:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711134377; cv=none; b=cmo1+lBp1VZCNK3LnQQvikx47dWRscNSYj/p2/7RiBP9XawYjXdmokJzGPfeymEeHntG+CBirqigXeDpz8o+FtdqqC/i5zETpmUZu1AvuqEhk/4yZzk9ndDl/0rT/B09lueW+Uv7rJhwGOSD6Pm/n53mu5tOfnXHq1y22nOfuBA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711134377; c=relaxed/simple; bh=G8IGxwSbCA7VtBX8XkVa1ambOdnemi9fnCfUORqa5dI=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=XU0Uw3P22rUiUgohrZF6s9lnaP3KSbZiMlCbecFmm15DXCaeuedYOAmxut5+RVyYO261h0eyo3ULy5B3AlfM1GWTAwLnZscnyZcgKYboDEj+FMhtnF1RIeeWHQjx4DV20EuT8PPKRjRLGma1eynwGhLa3igZb/6zDh45D2IMITM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cKqi9Wug; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1711134374; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=bYv+g1I7ptnV6etPECbKRE2B/hEhZ4xk0E1ZHt/7Sok=; b=cKqi9WugA5z+pTC1coQeI24eWnC7pjnpFAL/CMDqn2poK7/ZyPAm2skDZH6T/roxCn2dXC KXl9ZUOtmzbthDrHrlM/3Yxmn7IAKajJuVExUgwN9Yriyj0BGJYYveA+1LaMs+YXPssonF BjpbSSgsyeqizdQKY8+k30JI5h8G9xQ= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-131-gBjFs-OwO1uIF-FNgKwg-Q-1; Fri, 22 Mar 2024 15:06:12 -0400 X-MC-Unique: gBjFs-OwO1uIF-FNgKwg-Q-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 214A7280480F; Fri, 22 Mar 2024 19:06:12 +0000 (UTC) Received: from RHTPC1VM0NT.redhat.com (unknown [10.22.33.162]) by smtp.corp.redhat.com (Postfix) with ESMTP id 55FDA492BC6; Fri, 22 Mar 2024 19:06:11 +0000 (UTC) From: Aaron Conole To: netdev@vger.kernel.org Cc: Pravin B Shelar , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , dev@openvswitch.org, linux-kernel@vger.kernel.org, Numan Siddique Subject: [PATCH net] openvswitch: Set the skbuff pkt_type for proper pmtud support. Date: Fri, 22 Mar 2024 15:06:03 -0400 Message-ID: <20240322190603.251831-1-aconole@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Open vSwitch is originally intended to switch at layer 2, only dealing with Ethernet frames. With the introduction of l3 tunnels support, it crossed into the realm of needing to care a bit about some routing details when making forwarding decisions. If an oversized packet would need to be fragmented during this forwarding decision, there is a chance for pmtu to get involved and generate a routing exception. This is gated by the skbuff->pkt_type field. When a flow is already loaded into the openvswitch module this field is set up and transitioned properly as a packet moves from one port to another. In the case that a packet execute is invoked after a flow is newly installed this field is not properly initialized. This causes the pmtud mechanism to omit sending the required exception messages across the tunnel boundary and a second attempt needs to be made to make sure that the routing exception is properly setup. To fix this, we set the outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get to the openvswitch module via a port device or packet command. This issue is periodically encountered in complex setups, such as large openshift deployments, where multiple sets of tunnel traversal occurs. A way to recreate this is with the ovn-heater project that can setup a networking environment which mimics such large deployments. In that environment, without this patch, we can see: ./ovn_cluster.sh start podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200 podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 -c2 PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142) --- 21.0.0.3 ping statistics --- 2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 1017ms Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not sent into the server. With this patch, setting the pkt_type, we see the following: podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 -c2 PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222) ping: local error: message too long, mtu=1222 --- 21.0.0.3 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1061ms In this case, the first ping request receives the FRAG_NEEDED message and a local routing exception is created. Reported-at: https://issues.redhat.com/browse/FDP-164 Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.") Signed-off-by: Aaron Conole --- NOTE: An alternate approach would be to add a netlink attribute to preserve pkt_type across the kernel->user boundary, but that does require some userspace cooperation. net/openvswitch/actions.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 6fcd7e2ca81fe..952c6292100d0 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -936,6 +936,8 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, pskb_trim(skb, ovs_mac_header_len(key)); } + skb->pkt_type = PACKET_OUTGOING; + if (likely(!mru || (skb->len <= mru + vport->dev->hard_header_len))) { ovs_vport_send(vport, skb, ovs_key_mac_proto(key)); -- 2.41.0