Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp20938080rwd; Thu, 29 Jun 2023 08:51:21 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7o/8qNYOW2ZN4h1HwRUJFEVmxRmxODJkwZEZGOCBvCAo0KQAOQPXe0cbA+1e6p9vO0o9C1 X-Received: by 2002:a17:903:189:b0:1b6:649b:92de with SMTP id z9-20020a170903018900b001b6649b92demr12964309plg.65.1688053880750; Thu, 29 Jun 2023 08:51:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688053880; cv=none; d=google.com; s=arc-20160816; b=EBeoZg8QN7UHZPxWRHpIFIgfMFr1SaIroKLJrGj4cu8vRS+ybSYRdUGesEyqHw/JLh 07pA0G7vJ5XHgP5h5O+Uhj5Td+9VHm10kIMNzTkvBj9apJKF9Q6tfWvqa9i0UkfiCzah NcSvPCG7DTkUmcJaPEwP9A/cFYUVKNJD5oFQm7TwfYK9y5HBK9OwN6dcRJGBPkiIrwFW QACZ5Ii45a4dX019wYX11OhE92GdkA8Uu8s5akfiquvRFcjRMkl6IcXQkbYkW5RTvHOE Sco8Uv7iXMxvgS3YMq6JkbtvyJOQyBMLBVM0pf82a14Pj4iRU/rU0F3VhJzHtleTiWHH m+LQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=o8cNyGh2UQCzR6ZjfPNe2FzF06s8zAdooICeXTr3ats=; fh=sUanlMZiD/v/dA0OzGE9hBpo1yKNqixnflKg4ZyEDrQ=; b=LhUNMJL/CyCEJqNPsz+YGGOMMt2KOUro8Mykai37bzoV0M8pT+xpzWr86kQfXja0/c qs/x1XeJyYY1LIq9ijXEm0fjXbx4j9qUuhou/41FVJ9vk0tScECTf9i/aLPGZc3RYbuS q19SaUU+sE+WiSv2nB0FR+IVqST89cqhpequWsnqnsHHQQHhTVdw9q7RVzDBfTzH/9os Gj5LrssiO6pCWmCr+wVhyIdN3rda9JMoJWVOcB7x2O6EKci+clyB+cZH6vXqH6uzmNMO OMtPsKJPmFIFxLcOyh4ETZsw20GoDu2zZOn9n+mBEwtK4GweY6MClr3B4M9ix2gQS943 t97w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RnycxpmW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jc15-20020a17090325cf00b001a67a19331dsi3976571plb.202.2023.06.29.08.51.05; Thu, 29 Jun 2023 08:51:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RnycxpmW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232613AbjF2PY3 (ORCPT + 99 others); Thu, 29 Jun 2023 11:24:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232192AbjF2PYC (ORCPT ); Thu, 29 Jun 2023 11:24:02 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 963A12703; Thu, 29 Jun 2023 08:24:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1688052241; x=1719588241; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=b/XcIomoM5FwKJ4/woDWeTY9YJxcho3WgtMqtWHrNMM=; b=RnycxpmWI6AMDWRkIz9CAizAp/OMhCqxDCYNevqvFG3dl2jztXrvTq5q FuPm/RwI7mIZXQN+0COgw5qeLtHf59/C266sEHA8Dl+whvhCnPUnDBm4R xeZypJApv48jvtYehlq1qtJtdLt4tPIteqVXNjJIpj0fK4dsIADSL6QZE Wq4rdOh6Phi/h0auxmBZ4pSTxGZ9Hf5Er6I2teI2/wVL7suobyaamFSR1 CLvEbPulrMitt/8ZI24LPxDsnGBE13CNxAZAgYdFGdDpy9kzZHDNkExFU A3vWg1vSSCB5Zddul7IVfvQf1ZrfGCwdyL9HWtZ/TPD+sbw+s+QuOFPCg w==; X-IronPort-AV: E=McAfee;i="6600,9927,10756"; a="346920613" X-IronPort-AV: E=Sophos;i="6.01,168,1684825200"; d="scan'208";a="346920613" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jun 2023 08:24:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10756"; a="830573798" X-IronPort-AV: E=Sophos;i="6.01,168,1684825200"; d="scan'208";a="830573798" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga002.fm.intel.com with ESMTP; 29 Jun 2023 08:23:57 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Maciej Fijalkowski , Larysa Zaremba , Yunsheng Lin , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible Date: Thu, 29 Jun 2023 17:23:03 +0200 Message-ID: <20230629152305.905962-3-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230629152305.905962-1-aleksander.lobakin@intel.com> References: <20230629152305.905962-1-aleksander.lobakin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles even when on DMA-coherent platforms (like x86) with no active IOMMU or swiotlb, just for the call ladder. Indeed, it's page_pool_put_page() page_pool_put_defragged_page() <- external __page_pool_put_page() page_pool_dma_sync_for_device() <- non-inline dma_sync_single_range_for_device() dma_sync_single_for_device() <- external dma_direct_sync_single_for_device() dev_is_dma_coherent() <- exit For the inline functions, no guarantees the compiler won't uninline them (they're clearly not one-liners and sometimes compilers uninline even 2 + 2). The first external call is necessary, but the rest 2+ are done for nothing each time, plus a bunch of checks here and there. Since Page Pool mappings are long-term and for one "device + addr" pair dma_need_sync() will always return the same value (basically, whether it belongs to an swiotlb pool), addresses can be tested once right after they're obtained and the result can be reused until the page is unmapped. Define new PP flag, which will mean "do DMA syncs for device, but only when needed" and turn it on by default when the driver asks to sync pages. When a page is mapped, check whether it needs syncs and if so, replace that "sync when needed" back to "always do syncs" globally for the whole pool (better safe than sorry). As long as a pool has no pages requiring DMA syncs, this cuts off a good piece of calls and checks. On my x86_64, this gives from 2% to 5% performance benefit with no negative impact for cases when IOMMU is on and the shortcut can't be used. Signed-off-by: Alexander Lobakin --- include/net/page_pool.h | 3 +++ net/core/page_pool.c | 10 ++++++++++ 2 files changed, 13 insertions(+) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 829dc1f8ba6b..ff3772fab707 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -23,6 +23,9 @@ * Please note DMA-sync-for-CPU is still * device driver responsibility */ +#define PP_FLAG_DMA_MAYBE_SYNC BIT(2) /* Internal, should not be used in + * drivers + */ #define PP_FLAG_ALL (PP_FLAG_DMA_MAP |\ PP_FLAG_DMA_SYNC_DEV) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index dff0b4fa2316..498e058140b3 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -197,6 +197,10 @@ static int page_pool_init(struct page_pool *pool, /* pool->p.offset has to be set according to the address * offset used by the DMA engine to start copying rx data */ + + /* Try to avoid calling no-op syncs */ + pool->p.flags |= PP_FLAG_DMA_MAYBE_SYNC; + pool->p.flags &= ~PP_FLAG_DMA_SYNC_DEV; } #ifdef CONFIG_PAGE_POOL_STATS @@ -341,6 +345,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) page_pool_set_dma_addr(page, dma); + if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) && + dma_need_sync(pool->p.dev, dma)) { + pool->p.flags |= PP_FLAG_DMA_SYNC_DEV; + pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC; + } + if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) page_pool_dma_sync_for_device(pool, page, pool->p.max_len); -- 2.41.0