Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp6974508rwi; Mon, 24 Oct 2022 08:18:36 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7npcGJkPP5c/UGy6Hb84fNVVUyKi+quYPE6VuMlTX7GI39FU0+HrAzcKr3FDZBHLsO/8e5 X-Received: by 2002:a05:6402:278d:b0:461:60d4:8ea4 with SMTP id b13-20020a056402278d00b0046160d48ea4mr14416457ede.419.1666624716196; Mon, 24 Oct 2022 08:18:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666624716; cv=none; d=google.com; s=arc-20160816; b=PSLSv8SdTYAkmOnLL2dIZ0TWgINFijVLR8S18QuCtS8uQtUB9rToOBc9KHMcKIb7ka GaCjgUvKTLnIXOnVlU2Op+rUJEdcP+9BbPAvXM8ZZeZLH7ubziuzW6zonQ69l2c7R7Yr jIwuPjXiDUb8WRMI+HBieFqBLDPOtrVaij3Yi98rPwZ4OqkGK1Uu5oH0ChlV+6Lw9F4c 8Uu8903Lqo/zeFIaKNq2ZH3SshyDidEn7npMriu3sjYEEoIcCZMOykVt3JQizmcb3scf SG2JCvTqYidwNIZqlfNq1UTKyz6BGhuvi42jwhMi6b5059Aeqq/pkKtAFE2hVGEpvaGt 2YEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=yfeolYeHYx3/HPx4JqElwd49ZJcvZ32N+wXhU0UNJtw=; b=DF9jvfrnJo/XhT1wTFuvxKlw5VcjNvzkRBcyr/JC1wo9lLdkrgs73oU0xpKL4V09Oe 7EkwoDMC3DVeqWP5FU94aGKeYPPxXr4GaDLOVsveElEATlv2e8RIeV+Eoed/8BpLcUCg gWnUJ5wTutt7pI0eFrXqLdtIUu+QGW94tet5z3Hl++s/bBSY8jsx/hpLoVf4MT+qqEhj 7A/gppkrTuT5QTEWsDCJqsJLCJW+ec98gi0Xr5Dz4QgVKRHXowlPrpicPZu9IQt5Gu9e f9mz7WP0W1XzyzcCygA9QlM4smVXJDuDvgQa3tRNXfeYStbqNBVcT52+BzKqtr9vK9k9 w5xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=2fIG9WRI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u3-20020a170906b10300b0078d930212c0si9654ejy.347.2022.10.24.08.18.11; Mon, 24 Oct 2022 08:18:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=2fIG9WRI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235275AbiJXOS6 (ORCPT + 99 others); Mon, 24 Oct 2022 10:18:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235373AbiJXOOn (ORCPT ); Mon, 24 Oct 2022 10:14:43 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 799795F75; Mon, 24 Oct 2022 05:54:38 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 539BEB8172E; Mon, 24 Oct 2022 12:40:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB3C0C433C1; Mon, 24 Oct 2022 12:40:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1666615201; bh=huePQwevaGT6+HUAOdvqS3WDJkZYWypGr7OZu4Qs4+E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=2fIG9WRIASR/9hTCSFbMge/INEZdxf+4+OWFI64xERxPlKJRu375ElAim2qQH4ct2 GhEARVhoxbECpPPlGkT8xM3G0E9s+bhSjHR0scNPRcLI0TmxOGiQLVPbm2PGDPypXM fQQOCVmifYdaWZA+oQCpK2lsaCcleWda5cd9yW/A= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Magnus Karlsson , Maciej Fijalkowski , Daniel Borkmann , Sasha Levin Subject: [PATCH 5.15 159/530] xsk: Fix backpressure mechanism on Tx Date: Mon, 24 Oct 2022 13:28:23 +0200 Message-Id: <20221024113052.266687883@linuxfoundation.org> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221024113044.976326639@linuxfoundation.org> References: <20221024113044.976326639@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Maciej Fijalkowski [ Upstream commit c00c4461689e15ac2cc3b9a595a54e4d8afd3d77 ] Commit d678cbd2f867 ("xsk: Fix handling of invalid descriptors in XSK TX batching API") fixed batch API usage against set of descriptors with invalid ones but introduced a problem when AF_XDP SW rings are smaller than HW ones. Mismatch of reported Tx'ed frames between HW generator and user space app was observed. It turned out that backpressure mechanism became a bottleneck when the amount of produced descriptors to CQ is lower than what we grabbed from XSK Tx ring. Say that 512 entries had been taken from XSK Tx ring but we had only 490 free entries in CQ. Then callsite (ZC driver) will produce only 490 entries onto HW Tx ring but 512 entries will be released from Tx ring and this is what will be seen by the user space. In order to fix this case, mix XSK Tx/CQ ring interractions by moving around internal functions and changing call order: * pull out xskq_prod_nb_free() from xskq_prod_reserve_addr_batch() up to xsk_tx_peek_release_desc_batch(); ** move xskq_cons_release_n() into xskq_cons_read_desc_batch() After doing so, algorithm can be described as follows: 1. lookup Tx entries 2. use value from 1. to reserve space in CQ (*) 3. Read from Tx ring as much descriptors as value from 2 3a. release descriptors from XSK Tx ring (**) 4. Finally produce addresses to CQ Fixes: d678cbd2f867 ("xsk: Fix handling of invalid descriptors in XSK TX batching API") Signed-off-by: Magnus Karlsson Signed-off-by: Maciej Fijalkowski Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220830121705.8618-1-maciej.fijalkowski@intel.com Signed-off-by: Sasha Levin --- net/xdp/xsk.c | 22 +++++++++++----------- net/xdp/xsk_queue.h | 22 ++++++++++------------ 2 files changed, 21 insertions(+), 23 deletions(-) --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -370,16 +370,15 @@ static u32 xsk_tx_peek_release_fallback( return nb_pkts; } -u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 max_entries) +u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 nb_pkts) { struct xdp_sock *xs; - u32 nb_pkts; rcu_read_lock(); if (!list_is_singular(&pool->xsk_tx_list)) { /* Fallback to the non-batched version */ rcu_read_unlock(); - return xsk_tx_peek_release_fallback(pool, max_entries); + return xsk_tx_peek_release_fallback(pool, nb_pkts); } xs = list_first_or_null_rcu(&pool->xsk_tx_list, struct xdp_sock, tx_list); @@ -388,12 +387,7 @@ u32 xsk_tx_peek_release_desc_batch(struc goto out; } - max_entries = xskq_cons_nb_entries(xs->tx, max_entries); - nb_pkts = xskq_cons_read_desc_batch(xs->tx, pool, max_entries); - if (!nb_pkts) { - xs->tx->queue_empty_descs++; - goto out; - } + nb_pkts = xskq_cons_nb_entries(xs->tx, nb_pkts); /* This is the backpressure mechanism for the Tx path. Try to * reserve space in the completion queue for all packets, but @@ -401,12 +395,18 @@ u32 xsk_tx_peek_release_desc_batch(struc * packets. This avoids having to implement any buffering in * the Tx path. */ - nb_pkts = xskq_prod_reserve_addr_batch(pool->cq, pool->tx_descs, nb_pkts); + nb_pkts = xskq_prod_nb_free(pool->cq, nb_pkts); if (!nb_pkts) goto out; - xskq_cons_release_n(xs->tx, max_entries); + nb_pkts = xskq_cons_read_desc_batch(xs->tx, pool, nb_pkts); + if (!nb_pkts) { + xs->tx->queue_empty_descs++; + goto out; + } + __xskq_cons_release(xs->tx); + xskq_prod_write_addr_batch(pool->cq, pool->tx_descs, nb_pkts); xs->sk.sk_write_space(&xs->sk); out: --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -201,6 +201,11 @@ static inline bool xskq_cons_read_desc(s return false; } +static inline void xskq_cons_release_n(struct xsk_queue *q, u32 cnt) +{ + q->cached_cons += cnt; +} + static inline u32 xskq_cons_read_desc_batch(struct xsk_queue *q, struct xsk_buff_pool *pool, u32 max) { @@ -222,6 +227,8 @@ static inline u32 xskq_cons_read_desc_ba cached_cons++; } + /* Release valid plus any invalid entries */ + xskq_cons_release_n(q, cached_cons - q->cached_cons); return nb_entries; } @@ -287,11 +294,6 @@ static inline void xskq_cons_release(str q->cached_cons++; } -static inline void xskq_cons_release_n(struct xsk_queue *q, u32 cnt) -{ - q->cached_cons += cnt; -} - static inline bool xskq_cons_is_full(struct xsk_queue *q) { /* No barriers needed since data is not accessed */ @@ -353,21 +355,17 @@ static inline int xskq_prod_reserve_addr return 0; } -static inline u32 xskq_prod_reserve_addr_batch(struct xsk_queue *q, struct xdp_desc *descs, - u32 max) +static inline void xskq_prod_write_addr_batch(struct xsk_queue *q, struct xdp_desc *descs, + u32 nb_entries) { struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; - u32 nb_entries, i, cached_prod; - - nb_entries = xskq_prod_nb_free(q, max); + u32 i, cached_prod; /* A, matches D */ cached_prod = q->cached_prod; for (i = 0; i < nb_entries; i++) ring->desc[cached_prod++ & q->ring_mask] = descs[i].addr; q->cached_prod = cached_prod; - - return nb_entries; } static inline int xskq_prod_reserve_desc(struct xsk_queue *q,