Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp1206186rdb; Mon, 2 Oct 2023 02:25:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEOI38WBEoL4jlYIJdUNjQ0t9n3X1ZdQRYBH3xavQXjSZiIVseokeyQZChsrRbRrBxCZUpt X-Received: by 2002:a17:902:d4ce:b0:1c7:56d8:905d with SMTP id o14-20020a170902d4ce00b001c756d8905dmr4694658plg.60.1696238757223; Mon, 02 Oct 2023 02:25:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696238757; cv=none; d=google.com; s=arc-20160816; b=jHlm+8t7+s20chmcfeI+wnfyooKmJni2s889qM2xnnvMnSAQ6jlz4nxPBOl/CuWS/W DbTKOvhCStzuZ9wXemx/Htlg/X8wmZma6WsRErv1Ou3T6yhyJBoHrce0KtOOBoDyrcVB B5bK6zpJvcfcEPcEiiXhF42uKiFProFln5adt4F4iaKJEsSUXe1EtwLCBzu7FK/u2zlU 2zxVP3dkRm45vCOR6P0G5dKvP+GxSIPN0Iv0dPWIHUPX4lgN8PsYt0zHRNG/H7K3NnDE H0ylgHZubyySi4OIcQJDDF5UpbZhe/DcZpEoFQe0DB033swzjSmPrGkyf5TecraXjmj0 mWtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=QXLzjMcryls/2eT3X3jgRrj4fG0OIMuUHXC/0M6YwKs=; fh=s8ZFtdc+Zp96ARNjWibGRWyeHLrurjYgzR8IDe6OEls=; b=m7eZjTTMeDH1Kr8U7nuZXiwU4FXwg5idnLH66rORvpVsZxx87FVhTS56KWiiEt692C 1IAhHmiSlzW/GZiKl9Xwg7CWqkfSH53yh7Wk5GFtSoT4SzyBP+oBkexE/oByu2AOuWwk K3hA7wq+0WM/muTP1ZuWD2SZ/oUfC2sDFyEADwMZkMIRs/uTDwf7Qv0TJJhj1GrJIhy7 TnqNnGzFRkQnsSBH9rEzkJn/M2ufnGxdAWTUaCjS86xyIlxYLt+hZjFVtcAFK81TWhaK mZvEI3wCfy6BGgvH7KylnQ81kmHvKgXretvghtpmeU8TZwK86jo8cciisyDXI7QC+YGq 4NrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id lf6-20020a170902fb4600b001b878f9e11csi26266442plb.54.2023.10.02.02.25.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 02:25:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id A7A6C80784F0; Mon, 2 Oct 2023 02:20:43 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236036AbjJBJUX (ORCPT + 99 others); Mon, 2 Oct 2023 05:20:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236037AbjJBJUW (ORCPT ); Mon, 2 Oct 2023 05:20:22 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0325B99; Mon, 2 Oct 2023 02:20:19 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1qnF62-000861-Nl; Mon, 02 Oct 2023 11:20:10 +0200 Date: Mon, 2 Oct 2023 11:20:10 +0200 From: Florian Westphal To: Henrik =?iso-8859-15?Q?Lindstr=F6m?= Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: macvtap performs IP defragmentation, causing MTU problems for virtual machines Message-ID: <20231002092010.GA30843@breakpoint.cc> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 02 Oct 2023 02:20:43 -0700 (PDT) Henrik Lindstr?m wrote: > I found this old thread describing why macvlan does this: > https://lore.kernel.org/netdev/4E8C89EE.3090600@candelatech.com/ > Interestingly, the problem described in that thread seems to be more > general than macvlans, and i can still reproduce it by simply having > multiple physical interfaces. > So it looks like macvlans are being special-cased right now, as a > workaround for a more general defragmentation problem? Looks like it, maybe Eric remembers details here. AFAIU however this issue isn't specific to macvlan, looks like some people insist that receiving a fragmented multicast packet on n devices means we should pass n defragmented packets up to the stack (we don't; ip defrag will discard "duplicates"). There is a vif identifier for l3mdev sake (that did not exist back then), we could use that as a discriminator for mcast case. Something like this (totally untested): diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -479,11 +479,29 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, return err; } +static int ip_defrag_vif(const struct sk_buff *skb, const struct net_device *dev) +{ + int vif = l3mdev_master_ifindex_rcu(dev); + + if (vif) + return vif; + + /* some folks insist that receiving a fragmented mcast dgram on n devices shall + * result in n defragmented packets. + */ + if (skb->pkt_type == PACKET_BROADCAST || skb->pkt_type == PACKET_MULTICAST) { + if (dev) + vif = dev->ifindex; + } + + return 0; +} + /* Process an incoming IP datagram fragment. */ int ip_defrag(struct net *net, struct sk_buff *skb, u32 user) { struct net_device *dev = skb->dev ? : skb_dst(skb)->dev; - int vif = l3mdev_master_ifindex_rcu(dev); + int vif = ip_defrag_vif(skb, dev); struct ipq *qp; __IP_INC_STATS(net, IPSTATS_MIB_REASMREQDS); ... which should allow to remove the macvlan defrag step.