Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp885245ybj; Thu, 7 May 2020 09:48:13 -0700 (PDT) X-Google-Smtp-Source: APiQypKFRzeW1IqRkYQ8bhDYkWr02FvLXqqnDEUpM3D8YfedaMrAI0FFm00xZ1+JacP1BdaZsK7m X-Received: by 2002:a17:906:7743:: with SMTP id o3mr13291418ejn.120.1588870092955; Thu, 07 May 2020 09:48:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588870092; cv=none; d=google.com; s=arc-20160816; b=fozr+V+8/MmH1FrAi25TnIvrjOmgeCo3zcZ/6IHVXp/S9X9MW4OmAnaWEzFFIbRwhw fVeTSpQCHi53dH4Zy25NKRjsX3TBwclmV0PJUeDT3sW9hjFyFTuyox9JuVTwb9HShDGn 1iLcizra3tx/wh0sX4/n0hU7wEEXRuDmd8/FXSFHUr0Qo05MX3h8F2DmCwzVGNESaaBV M/mTFyCEYeobltt+faHpd/rNAx1fjOSBQv1YifhgEF0iLuS0fyzgeAc68s0llMGk65JV II9drCloEgM0lNXjUXPa6F2oSEgE5PVXkKQqU3tr6ZbRQIoIpvjEkcP+2BtYyZ7y3I3o CJcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=xTRCZ4tW93FlQDi5UVI2IwLTt/gRd10gzUrxWk69L20=; b=b5RT0pClgWV29bfmsRtcnkonBJ1OpHo87+u3hUNvq6bozOkOHxq+1UGlZlxFKkSUl6 +ZxK+5SmpiuMnC4D2WC43OAuucG59ucOEhdbdD6j5Q06OUXsP7p77mkFVXIQICFzvZgb OwmOze27pPaxkD+5izBiLWx+onj6wf9J39IfMB/6eSHCCQK0UF90+1fqsMa63GsPEM5s T8npQqIVU+N70Tk2glWBBj+smWA3GdY1L0drwDnZKYxadlUL2iMvbZ3A4ZWvjDuODACJ pcAfGReaqVD7xVmxhV5zntmIIQVUVJrqQErOyVODfEV9uLNMsHW4sEHcoRGyMxYhrp51 tGIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=tNVqnuOW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o20si3601170edr.279.2020.05.07.09.47.48; Thu, 07 May 2020 09:48:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=tNVqnuOW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727030AbgEGQqS (ORCPT + 99 others); Thu, 7 May 2020 12:46:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726393AbgEGQqR (ORCPT ); Thu, 7 May 2020 12:46:17 -0400 Received: from mail-il1-x144.google.com (mail-il1-x144.google.com [IPv6:2607:f8b0:4864:20::144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CE6DC05BD0A for ; Thu, 7 May 2020 09:46:17 -0700 (PDT) Received: by mail-il1-x144.google.com with SMTP id s10so3359207iln.11 for ; Thu, 07 May 2020 09:46:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xTRCZ4tW93FlQDi5UVI2IwLTt/gRd10gzUrxWk69L20=; b=tNVqnuOWgFXs94wWijiHSZqW2RCSvX1eMyUhS1UUdpeRxO1VQfci5pvCuieJC03TjQ 1v4TjWXve+qPDiY1IEF3ZfFRcMpna9fmYEg+45XaGG+FNcXoqF+jB7tmFWn64eqtOd/d kmCPIh21DTj45KeMrYpCuoW2CzXU46fDOVwrkNZ54WB/PzQkiw6Rdygu9pfJtNEKV3oR sos0yT6Zpz8TDhgCETAfpxk3vG2ztkba+7DmW3Xe9dP1PO61gQa9Tem/fXOJbmFhsK0S DFBgvvH6gcIvHhpXnyTH9LNbbiUk+WVHqKyVh+tsFUZ9QkhA0Gto7h3mL9Y4H5tsqlu+ ynNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xTRCZ4tW93FlQDi5UVI2IwLTt/gRd10gzUrxWk69L20=; b=gEbfBDVhjm4tVqxonYM6oe0kLrfecQKzW2UoybAY8vzI7rydne6fiCPa+DiBJVn+0y A9ihGCRs2TFYOxmAhvTUL/W20tpsbIbjOkd0WLFsAvsJ17LPsHfDUsPfspkAoKgjBx/G xepsasYM3JpYK8H2Y4T5Adh4i8xiNqnFX2ouyveH3g+AT+dBxR7aM/BinAu/qyUvcSjE oOljta8WjFxZfLJwn8RSQAcKRCgUppDK6+nwie0KFc6FrRBH0CpRnyIjjTSr2lozcPjU BqxLskAgmqSeY9Pkq7FIHkYARZwu+d7UNrBEmsEkBKjPD6eXZQVprQEvEXM48I3GU/Rt EySg== X-Gm-Message-State: AGi0PubCmFTyKzUtcCOeZLJh72FdU969OOIPxDeidZaIdAVpkg3UH74R S9mxUIFpjPy4W/WlZNNl2D8NA/nRoPIeZcPiQAvaZw== X-Received: by 2002:a92:2912:: with SMTP id l18mr15729465ilg.28.1588869976087; Thu, 07 May 2020 09:46:16 -0700 (PDT) MIME-Version: 1.0 References: <20200507023606.111650-1-zenczykowski@gmail.com> In-Reply-To: From: =?UTF-8?Q?Maciej_=C5=BBenczykowski?= Date: Thu, 7 May 2020 09:46:03 -0700 Message-ID: Subject: Re: [PATCH v3] net: bpf: permit redirect from ingress L3 to egress L2 devices at near max mtu To: Daniel Borkmann Cc: Alexei Starovoitov , Linux Network Development Mailing List , Linux Kernel Mailing List , BPF Mailing List , "David S . Miller" , Jakub Kicinski Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (a) not clear why the max is SKB_MAX_ALLOC in the first place (this is PAGE_SIZE << 2, ie. 16K on x86), while lo mtu is 64k (b) hmm, if we're not redirecting, then exceeding the ingress device's mtu doesn't seem to be a problem. Indeed AFAIK this can already happen, some devices will round mtu up when they configure the device mru buffers. (ie. you configure L3 mtu 1500, they treat that as L2 1536 or 1532 [-4 fcs], simply because 3 * 512 is a nice amount of buffers, or they'll accept not only 1514 L2, but also 1518 L2 or even 1522 L2 for VLAN and Q-IN-Q -- even if the packets aren't actually VLAN'ed, so your non VLAN mru might be 1504 or 1508) Indeed my corp dell workstation with some standard 1 gigabit motherboard nic has a standard default mtu of 1500, and I've seen it receive L3 mtu 1520 packets (apparently due to misconfiguration in our hardware [cisco? juniper?] ipv4->ipv6 translator which can take 1500 mtu ipv4 packets and convert them to 1520 mtu ipv6 packets without fragmenting or generating icmp too big errors). While it's obviously wrong, it does just work (the network paths themselves are also obviously 1520 clean). (c) If we are redirecting we'll eventually (after bpf program returns) hit dev_queue_xmit(), and shouldn't that be what returns an error? btw. is_skb_forwardable() actually tests - device is up && (packet is gso || skb->len < dev->mtu + dev->hard_header_len + VLAN_HLEN); which is also wrong and in 2 ways, cause VLAN_HLEN makes no sense on non ethernet, and the __bpf_skb_max_len function doesn't account for VLAN... (which possibly has implications if you try to redirect to a vlan interface) --- I think having an mtu check is useful, but I think the mtu should be selectable by the bpf program. Because it might not even be device mtu at all, it might be path mtu which we should be testing against. It should also be checked for gso frames, since the max post segmentation size should be enforced. --- I agree we should expose dev->mtu (and dev->hard_header_len and hatype) I'll mull this over a bit more, but I'm not convinced this patch isn't ok as is. There just is probably more we should do.