Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp832876rwl; Wed, 29 Mar 2023 09:02:59 -0700 (PDT) X-Google-Smtp-Source: AKy350Z7XLvEyjCLTlTDlotRABfiIo3qrspXdynEYpfkC5QGVdP9TqFYVGlNmtDz57KI9UlS0Vhh X-Received: by 2002:a17:903:41cb:b0:1a1:d215:ef0c with SMTP id u11-20020a17090341cb00b001a1d215ef0cmr22628172ple.16.1680105779532; Wed, 29 Mar 2023 09:02:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680105779; cv=none; d=google.com; s=arc-20160816; b=DeGhY8RM3a2PxZAx5wlfXe1L1kVrJWbyAp6kILXlKQNocOU+Er8TeHwsDw+xLpc9rZ 5uAsewGBoSKaeR9vHpRo4DIMgvdm3IGSpx0k+UFDbioxT1ZJldYb9AmseUKoC3n4XSaT l0g327yIZj0eXm61wLRxzqg1uCqbKkj2DzlNK2lrzAmwB+4MRQoMFHyY7V6ette/R+6n 5ah6AX/xqY5XG0hcA+eWPKYWsgXR1L/yDwSYoqSuXo/N60aSfWgrYsqdc2m8lKdgpCTS VNPIT3CXbv4FlNWUf9msxGva3DwQ09RCuzUrV0EmNoiPHmVVJEKIG2JMAN9V/OnJsFlo 7maw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=O9rVM+TyMjTEamPO1YLxI72VkrMEpvNCPXT/D1jIETM=; b=oM/fTFXILMwQum7QPuH3oq5kuOAZ5i7lKHqXiwdLAEtFnLrnh+JH1VHm5YYIbfcyUv tu+0DRGZHT77YG65j8bpgupcHqt21LZFCtPcL6c+uZ/O50y5d4W11yfN1NTfg42K3L3E +3GSJMe8XXfGpVax0G7e0Ko+sWc96KBSfVEyJNhnZVuB0Q0Ou5WAIaM6aSl+XsrYhfiM EUnHELJr5heJ4uKit84kSkvZuzkBWTVXVWU8ljVAOP9LM4TeIsoFeY3EibzV08ZyGUnZ 8vGMY0H6VM1wnoA7fT6murOY+Afj7iYJ1i2eMqbbgqj8KCLqfJy1uwcDPSzAdl+hAW7W sauA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dectris.com header.s=google header.b=XElgWmxa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=dectris.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k19-20020a63f013000000b004fcda251792si31638505pgh.201.2023.03.29.09.02.46; Wed, 29 Mar 2023 09:02:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@dectris.com header.s=google header.b=XElgWmxa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=dectris.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230376AbjC2QAW (ORCPT + 99 others); Wed, 29 Mar 2023 12:00:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231246AbjC2P7g (ORCPT ); Wed, 29 Mar 2023 11:59:36 -0400 Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A5836EB8 for ; Wed, 29 Mar 2023 08:59:02 -0700 (PDT) Received: by mail-ed1-x530.google.com with SMTP id x3so65239647edb.10 for ; Wed, 29 Mar 2023 08:59:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dectris.com; s=google; t=1680105541; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=O9rVM+TyMjTEamPO1YLxI72VkrMEpvNCPXT/D1jIETM=; b=XElgWmxaLhWe5F03cqiSoKI5bsUuTFyQ+Omq+unoOkkS+T83i7Pjlb4L8E3+ae9LnX FmEl9W3QfJ+4JLTMi2uGEKY4oZYK3OWdsd4EzdiU9RAOcJgrr4KVwihtVUD1H95qPzjz BQl6Y6HRA0H+bj9LIN7FpbHJPOb6pZ9HmVQ2g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680105541; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=O9rVM+TyMjTEamPO1YLxI72VkrMEpvNCPXT/D1jIETM=; b=dR+7jChsDOSm9c+n2yjhqF69xZjhZ8vyGogIA/8tXK3OudEcsLXSv9+aBIFe8mhLch A5QEKBp6a9PhoHRFAP95BPLQfT51W2thgxJ8s3gbw9e0zbysnVJTaUE/LM49biCAKdX4 zrIQhXy42TcRd0aivOEacEejs1ksHXFuO9xFPnynuqUToH0GBH5xiBlNMupHPMnu/tqM 72tm7zXhwMmw0bbQEMwPuv2exZ/qXgewv013H+Aictvndfoo6Wub/RSHaNUj7kvTHElQ B1MFgkM/UFZUkFTF4VoD+cNUe6XYegh84uz2nsxv3ghYvW9/W/xnrytnzemfezI9BOe4 WFXQ== X-Gm-Message-State: AAQBX9cLe5QcrOmXq8PNq9wloCBG2Cj02X+l490lPgDTV7TQmEFEqh3i uCZKXTJT5ysPrN+kRU+46jBSZD/V513G7Ne9Egjs2A== X-Received: by 2002:a17:907:a49:b0:931:6f5b:d27d with SMTP id be9-20020a1709070a4900b009316f5bd27dmr10361097ejc.0.1680105540984; Wed, 29 Mar 2023 08:59:00 -0700 (PDT) MIME-Version: 1.0 References: <20230319195656.326701-1-kal.conley@dectris.com> <20230319195656.326701-2-kal.conley@dectris.com> In-Reply-To: From: Kal Cutter Conley Date: Wed, 29 Mar 2023 18:03:35 +0200 Message-ID: Subject: Re: [PATCH bpf-next 1/3] xsk: Support UMEM chunk_size > PAGE_SIZE To: Magnus Karlsson Cc: =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > Add core AF_XDP support for chunk sizes larger than PAGE_SIZE. This > > enables sending/receiving jumbo ethernet frames up to the theoretical > > maxiumum of 64 KiB. For chunk sizes > PAGE_SIZE, the UMEM is required > > to consist of HugeTLB VMAs (and be hugepage aligned). Initially, only > > XDP_COPY mode is usuable pending future driver work. > > nit: useable Fixed in v2. > > > For consistency, check for HugeTLB pages during UMEM registration. This > > implies that hugepages are required for XDP_COPY mode despite DMA not > > being used. This restriction is desirable since it ensures user software > > can take advantage of future driver support. > > > > Even in HugeTLB mode, continue to do page accounting using order-0 > > (4 KiB) pages. This minimizes the size of this change and reduces the > > risk of impacting driver code. Taking full advantage of hugepages for > > accounting should improve XDP performance in the general case. > > Thank you Kal for working on this. Interesting stuff. > > First some general comments and questions on the patch set: > > * Please document this new feature in Documentation/networking/af_xdp.rst Fixed in v2. > * Have you verified the SKB path for Rx? Tx was exercised by running l2fwd. This patchset allows sending/receiving 9000 MTU packets with xdpsock (slightly modified). The benchmark numbers show the results for rxdrop (-r). > * Have you checked that an XDP program can access the full >4K packet? > The xdp_buff has no problem with this as the buffer is consecutive, > but just wondering if there is some other check or limit in there? > Jesper and Toke will likely know, so roping them in. Yes, the full packet can be accessed from a SEC("xdp") BPF program (only tested in SKB mode). > * Would be interesting to know your thoughts about taking this to > zero-copy mode too. It would be good if you could support all modes > from the get go, instead of partial support for some unknown amount of > time and then zero-copy support. Partial support makes using the > feature more cumbersome when an app is deployed on various systems. > The continuity checking code you have at the end of the patch is a > step in the direction of zero-copy support, it seems. I think this patchset is enough to support zero-copy as long as the driver allows it. Currently, no drivers will work out of the box AFAIK since they all validate the chunk_size or the MTU size. I would absolutely love for drivers to support this. Hopefully this patchset is enough inspiration? :-) Do you think it's absolutely necessary to have driver ZC support ready to land this? > * What happens if I try to run this in zero-copy mode with a chunk_size > 4K? AFAIK drivers check for this and throw an error. Maybe there are some drivers that don't check this properly? > * There are some compilation errors to fix from the kernel test robot Fixed in v2. > > require_hugetlb would be a clearer name Fixed in v2. Renamed to `need_hugetlb`. > > next_mapping? n as a name is not very descriptive. Fixed in v2. Renamed to `stride`. > > > u32 i; > > > > - for (i = 0; i < dma_map->dma_pages_cnt - 1; i++) { > > - if (dma_map->dma_pages[i] + PAGE_SIZE == dma_map->dma_pages[i + 1]) > > + for (i = 0; i + n < dma_map->dma_pages_cnt; i++) { > > + if (dma_map->dma_pages[i] + page_size == dma_map->dma_pages[i + n]) > > dma_map->dma_pages[i] |= XSK_NEXT_PG_CONTIG_MASK; > > else > > dma_map->dma_pages[i] &= ~XSK_NEXT_PG_CONTIG_MASK; > > } > > + for (; i < dma_map->dma_pages_cnt; i++) > > + dma_map->dma_pages[i] &= ~XSK_NEXT_PG_CONTIG_MASK; > > Is this not too conservative? If your umem consists of two huge pages > mappings but they are not mapped consecutively in physical memory, you > are going to mark all the chunks as non-consecutive. Would it not be > better to just look chunk_size ahead of you instead of page_size > above? The only thing you care about is that the chunk you are in is > in consecutive physical memory, and that is strictly only true for > zero-copy mode. So this seems to be in preparation for zero-copy mode. > It is slightly too conservative. I have updated the logic a bit in v2. If the packet doesn't cross a page boundary, then this array is not read anyway. Thanks! Kal