Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp929896rwd; Tue, 16 May 2023 09:23:25 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5zWzrz0axGK/B+E3CiI5gm7y4avSrakWcpCuM+SvP9sbEz9csN0efWcUBZT+grLb7sUe2+ X-Received: by 2002:a05:6a20:2452:b0:105:e453:a71 with SMTP id t18-20020a056a20245200b00105e4530a71mr9568614pzc.39.1684254205412; Tue, 16 May 2023 09:23:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684254205; cv=none; d=google.com; s=arc-20160816; b=yK0UqL+Mbzt+uklnWdhI6TKiZ+T+IJlxjBfjdNrsp26AJfawW7Tw7MIWKYuqOlEhnY RYiOUsCDB39qj182gf1bTvD4Dd+CLIO8Nzmv3XsUUb2hJXdX4Wg4XCFGllKEJITeFdi3 K7PR+eFi5RnfhIbVEt1vtHjrVeZk8qYS0mRtOJ6pikRmV2aRlVsSfyNk5GytWfxez8zF UsvdqngDOqlaBb1DlihT8w19b44KgBw2O4nLr/1zuI22Qz8pWzkEnr7oPy6SpVEO+my2 244x+ik8BP3wRD+qZPyWRyEJWdGyobUMePyG8AGz043dlq34yelVrfNjzkMVd1RYLW08 euSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=xnVTmFJ4/aoSQWGLJbbwXMo9SQSsbwdosTq54BV36lk=; b=VcMFxRGbmigKTPgmHv9yBANQZsQ5d8y1IQWO8z2U8jGZS0P2u9EAlQRaM4mooYsH4c 4J16u33pLOUe+NovfMlTvIYbI4jFXyMsXhTjvPMg/s7blOSwl7qpZwcVBhezJVKxG8PM LJh9WOwFaWmH3pJiVgtCqDak7AxjRVwiOkRJL2EP8w1ilOS8xfPVQbWpqLWULgVDnCpF eJNFv00jxz36OtX2TwZMPpsnALEBBOMs+0TT9Fe/fm98E72o0EafZCJtjkS7hypvT+E0 4vbjshZ4ye+BHaqcqjkcYphM0B+SY4rnkOesjrPTDUdz9do6GHoCQDS971Ioh/f5AyhS xN0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HQm4gBup; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m184-20020a6258c1000000b006456f83ea4dsi19411089pfb.201.2023.05.16.09.23.13; Tue, 16 May 2023 09:23:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HQm4gBup; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229910AbjEPQVB (ORCPT + 99 others); Tue, 16 May 2023 12:21:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229679AbjEPQUi (ORCPT ); Tue, 16 May 2023 12:20:38 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E33C93D4; Tue, 16 May 2023 09:20:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684254026; x=1715790026; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cKf5GIQpwgn4asoKhHVHPT6MqSF8s62d3tlMsMp4IMU=; b=HQm4gBupI2U8slTbp/eIhOLgFDcZVo8lrQThOUIwKb2ueTxrPjB1R1zi gO58Nuc6CMnnEsfFC5B3N816SYifFvvK4upV2e2C9bD2CEbLXMF1F9gck L9PtC3kByS0/AviI+Ak30s57KVCYlUFAJZrpvf2QpXt1jKi3hyebntvND E6Ue+/jfAeJMCbfelc6tUZUXuMsgwQOPSyyvUmP3JoJDblM5ThA/I3fqd 0NDRrrwFE4wowHvbOS9CU5cNLRAu99WhNH/DolVm1JU7hTaHHvWNKB5FA rfl21gxFKkH1XgT4HLkbiLLG0kVOKgx/2IMamwiZqEoU2vc0LPJFqT9FW A==; X-IronPort-AV: E=McAfee;i="6600,9927,10712"; a="340896638" X-IronPort-AV: E=Sophos;i="5.99,278,1677571200"; d="scan'208";a="340896638" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 May 2023 09:20:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10712"; a="701414320" X-IronPort-AV: E=Sophos;i="5.99,278,1677571200"; d="scan'208";a="701414320" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orsmga002.jf.intel.com with ESMTP; 16 May 2023 09:20:21 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Maciej Fijalkowski , Magnus Karlsson , Michal Kubiak , Larysa Zaremba , Jesper Dangaard Brouer , Ilias Apalodimas , Christoph Hellwig , netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 06/11] net: page_pool: avoid calling no-op externals when possible Date: Tue, 16 May 2023 18:18:36 +0200 Message-Id: <20230516161841.37138-7-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230516161841.37138-1-aleksander.lobakin@intel.com> References: <20230516161841.37138-1-aleksander.lobakin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles even when on DMA-coherent platforms (like x86) with no active IOMMU or swiotlb, just for the call ladder. Indeed, it's page_pool_put_page() page_pool_put_defragged_page() <- external __page_pool_put_page() page_pool_dma_sync_for_device() <- non-inline dma_sync_single_range_for_device() dma_sync_single_for_device() <- external dma_direct_sync_single_for_device() dev_is_dma_coherent() <- exit For the inline functions, no guarantees the compiler won't uninline them (they're clearly not one-liners and sometimes compilers uninline even 2 + 2). The first external call is necessary, but the rest 2+ are done for nothing each time, plus a bunch of checks here and there. Since Page Pool mappings are long-term and for one "device + addr" pair dma_need_sync() will always return the same value (basically, whether it belongs to an swiotlb pool), addresses can be tested once right after they're obtained and the result can be reused until the page is unmapped. Define new PP flag, which will mean "do DMA syncs for device, but only when needed" and turn it on by default when the driver asks to sync pages. When a page is mapped, check whether it needs syncs and if so, replace that "sync when needed" back to "always do syncs" globally for the whole pool (better safe than sorry). As long as a pool has no pages requiring DMA syncs, this cuts off a good piece of calls and checks. On my x86_64, this gives from 2% to 5% performance benefit with no negative impact for cases when IOMMU is on and the shortcut can't be used. Signed-off-by: Alexander Lobakin --- include/net/page_pool.h | 3 +++ net/core/page_pool.c | 10 ++++++++++ 2 files changed, 13 insertions(+) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index c8ec2f34722b..8435013de06e 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -46,6 +46,9 @@ * device driver responsibility */ #define PP_FLAG_PAGE_FRAG BIT(2) /* for page frag feature */ +#define PP_FLAG_DMA_MAYBE_SYNC BIT(3) /* Internal, should not be used in the + * drivers + */ #define PP_FLAG_ALL (PP_FLAG_DMA_MAP |\ PP_FLAG_DMA_SYNC_DEV |\ PP_FLAG_PAGE_FRAG) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index e212e9d7edcb..57f323dee6c4 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -175,6 +175,10 @@ static int page_pool_init(struct page_pool *pool, /* pool->p.offset has to be set according to the address * offset used by the DMA engine to start copying rx data */ + + /* Try to avoid calling no-op syncs */ + pool->p.flags |= PP_FLAG_DMA_MAYBE_SYNC; + pool->p.flags &= ~PP_FLAG_DMA_SYNC_DEV; } if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT && @@ -323,6 +327,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) page_pool_set_dma_addr(page, dma); + if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) && + dma_need_sync(pool->p.dev, dma)) { + pool->p.flags |= PP_FLAG_DMA_SYNC_DEV; + pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC; + } + if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) page_pool_dma_sync_for_device(pool, page, pool->p.max_len); -- 2.40.1