Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp6779592rwr; Tue, 25 Apr 2023 03:48:10 -0700 (PDT) X-Google-Smtp-Source: AKy350benf+/IqJA+B5gntcqZJ+Np1yXCgyfc6LRs8tZHteM8/teB+A2tp7+GpCm7fz1uSmxKcZP X-Received: by 2002:a05:6a20:a698:b0:ef:cb4c:c23e with SMTP id ba24-20020a056a20a69800b000efcb4cc23emr19355131pzb.29.1682419690434; Tue, 25 Apr 2023 03:48:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682419690; cv=none; d=google.com; s=arc-20160816; b=lNf3vE0sZb8zq++02nVE2qCfaLjcox8w7lobdkrK3oOz/ocfXGWHzN43FUSvI3Fz+v wJoFPcGlOsjkisbXhml1Q/IZpFbzHDM3KgAT71WcMRGf1xiZqHxrJuqX9tMmlTV7Q6ef /BTkjAaIE0VXToOcSvHxkmwb+uKCo7XF/1ontjRc5HI+roWUvtwT0bATzoKu+0wyUYgl 8ED5OMd4GAdAKJkFbFVPfqyw3oKIUDoUP/cAHeQs4YSadSIoC1z09IhQ2SilBmWGAvup 15DJLCZg5t0HPILkPmFwOM1Ebb9bTzt7smB6ySd8L6t/kJqJjCCFtk9/iQByq+HSiJ4/ XMPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=RQbVV5oFtqD8s7pQD8RC1iJUAO1cx6YB2hV383hEmHM=; b=ELIZyhRAAobQ6NNTxdzesDHPTSWAi0aVT5HwWudr3fOrr2VUScACzMLWt+qyQQnXjL 5RxQgVFmhhq5H0PqyxRskOfsgNP05esIUTJwbOVepvTtr8IMX4hYPZvyQWnQenPghpNv u1vuTj80UdCBmV1owrcEqx0jCPP1nzGPciG5jCzd/p7buxnkUMRCybUixLZl7WMoIr4T CoHuXAL9ujqnB4pexxMY0UecylyLzSZy3uWHGHjaftyGFzFXuBhl2e0wtV00/2p3pQLP NtSjHcouAHhoGs7zwc0wQjW8TG/WYqIiMhd0A9ANOQg8xDcZhoMrlE62W1sZW24xKwN+ JTPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dectris.com header.s=google header.b=X4A8yPy4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=dectris.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bk13-20020a056a02028d00b00528513d5e9dsi3643232pgb.109.2023.04.25.03.47.57; Tue, 25 Apr 2023 03:48:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@dectris.com header.s=google header.b=X4A8yPy4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=dectris.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233730AbjDYKiG (ORCPT + 99 others); Tue, 25 Apr 2023 06:38:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233315AbjDYKiE (ORCPT ); Tue, 25 Apr 2023 06:38:04 -0400 Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E274FF for ; Tue, 25 Apr 2023 03:38:02 -0700 (PDT) Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-506bdf29712so41918749a12.0 for ; Tue, 25 Apr 2023 03:38:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dectris.com; s=google; t=1682419081; x=1685011081; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=RQbVV5oFtqD8s7pQD8RC1iJUAO1cx6YB2hV383hEmHM=; b=X4A8yPy4O2ZWaGepFwmkfkj1x8IgV6AClLH9YT3M08LijBuX1Wju1dKF5BSj7Y+JbW p7m1SeiLUhcFtnvDxUdGMc3DHlcN2LpXwQjYjgK/7HkhMRHS8gjrTn0II4P3qUDt/h7S TaByY7OWxSVqunYKKUmdIp55FUSn7ppwFn0ew= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682419081; x=1685011081; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RQbVV5oFtqD8s7pQD8RC1iJUAO1cx6YB2hV383hEmHM=; b=gQz5W2RVMIEkSIxlSs6cQkO4c+5Ae2JyGYyEbtllFzac3PTEW50OF1jDoceUqkvh4T AkdxV1AIAPzliHc1+B1erAxQTuIl93x5PAjopHHX5NQCMSTjf5YCy+o+cszXZi88x8IW j+3BAr+oRJ4go+VtKvRNWF4QYSp+9t5v3Tf+HSCsMchu24ilq5xJM/FH8ZsOAHmuHong WjQJMMdzzcOwX8jqqwg4wK1Bw9rv/nrPZ8MP/MzSae7T7LzAVFs3kzGPeqRrpc4uffJj Ve8XMx+imlbEYl7LjuLUWl4TlSTiAWx8PmYOw1BGDAz1Q4c4rS09AWf5LaEU/2inWj/c wCcg== X-Gm-Message-State: AAQBX9dkqMZ8ZV6Qu2NzpZJN3fRJz1v62uWGgs8usl8fjpS8+BShBTkm Cu6VJT6e3Ra7eGlNsBx9ymagbCw4X0Y56ea4qMgK6w== X-Received: by 2002:a17:906:9399:b0:94e:ec0f:5f70 with SMTP id l25-20020a170906939900b0094eec0f5f70mr12147954ejx.10.1682419080863; Tue, 25 Apr 2023 03:38:00 -0700 (PDT) MIME-Version: 1.0 References: <20230423075335.92597-1-kal.conley@dectris.com> <6446d34f9568_338f220872@john.notmuch> In-Reply-To: From: Kal Cutter Conley Date: Tue, 25 Apr 2023 12:37:49 +0200 Message-ID: Subject: Re: [PATCH] xsk: Use pool->dma_pages to check for DMA To: Maciej Fijalkowski Cc: John Fastabend , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Jonathan Lemon , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Okay, 2-3% but with what settings? rxdrop for unaligned mode? what chunk > size etc? We need this kind of info, "compiles to more efficient code" > from original commit message is too generic and speculative to me. 2-3% of > perf diff against specific xdpsock setup is real improvement and is a > strong argument for getting this patch as-is, by its own. I don't have the exact numbers anymore. I measured a performance difference (up to 2-3%) with different settings including rxdrop and unaligned mode. The exact settings you have in the commit message from the linked patch. I didn't go into details in the commit message because I thought this change would be a slam dunk. I don't think there is any reason to believe the code is slower with this patch. If anything, it should generally be faster. At the very least it will lead to more efficient code in terms of size since dma_pages_cnt is no longer read. Also I think the code is more readable with this patch. > > > > > > > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h > > > > index d318c769b445..a8d7b8a3688a 100644 > > > > --- a/include/net/xsk_buff_pool.h > > > > +++ b/include/net/xsk_buff_pool.h > > > > @@ -180,7 +180,7 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool, > > > > if (likely(!cross_pg)) > > > > return false; > > > > > > > > - return pool->dma_pages_cnt && > > > > + return pool->dma_pages && > > > > !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK); > > > > } > > > > I would consider the above code part of the "fast path". It may be > > executed approximately once per frame in unaligned mode. > > > > > This seems to be used in the setup/tear-down paths so your optimizing > > > a control side. Is there a fast path with this code? I walked the > > > ice driver. If its just setup code we should do whatever is more > > > readable. > > > > It is not only used in setup/tear-down paths (see above). > > Additionally, I believe the code is also _more_ readable with this > > patch applied. In particular, this patch reduces cognitive complexity > > since people (and compilers) reading the code don't need to > > additionally think about pool->dma_pages_cnt. > > John was referring to xp_dma_unmap() with the comment above which indeed > is a teardown path, so probably this doesn't matter from performance > perspective and you could avoid this chunk from your patch. The setup/tear-down lines were also changed to keep the code consistent. It doesn't make sense to sometimes check dma_pages_cnt and other times dma_pages. > > > > > > Both the _alloc_ cases read neighboring free_heads_cnt so your saving a load I guess? > > > This is so deep into micro-optimizing I'm curious if you could measure it? > > > > It is saving a load which also reduces code size. This will affect > > other decisions such as what to inline. Also in the linked patchset, > > dma_pages and dma_pages_cnt do not share a cache line (on x86_64). > > Yes I believe that the with your patch on unaligned mode by touching the > dma_pages you're warming the relevant cache line for your setup. dma_pages is touched anyway right after. That is the point of this patch: since dma_pages already needs to be loaded into a register, just check that instead of loading an additional field possibly from a different cache line.