Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp839785rwo; Wed, 2 Aug 2023 05:10:54 -0700 (PDT) X-Google-Smtp-Source: APBJJlGAzZeESgG5+Df/GQ4DtqbEPFXlgJzqdXLLDMGNpbW/JO/WqxCFL0MFaJQ7egQQI0VxTn9j X-Received: by 2002:a17:903:230b:b0:1b2:1b22:196 with SMTP id d11-20020a170903230b00b001b21b220196mr16292159plh.48.1690978253812; Wed, 02 Aug 2023 05:10:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690978253; cv=none; d=google.com; s=arc-20160816; b=NZz8vIw+gsFm97lLLaxvuyu1VmQSCireBLJ7jK45SzLweKSfKURTkv5JXyC+GQXD61 8vzNcUIHkt3GtYGLgD5iE7K8vXeNwVq6kghERTi6FFR0LehaB6uYa/vwZUlPdRRsWOoV multCWVAVU7n6lER/joJWNBkLrYxvdwi0cWI0p7SeX7BU3LkfkRo3PmLoz2bZN/wXTbU K9f8Xi9L9QskXf9zPA8b+3s4SAYmCTbA67NXTfR5T9gWWrbPvNT7v/en9gwJf89ROi2Y 6MrMCVt74ap7a0eDWbw7BWev2uCoP/N6k13CthnOyLCFmV0x7yBkU1yS3quiO76oskQb okcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=wVfjLb6K0/mmthl03eKnQtm0UKjvmnxTEY/9asDOMGM=; fh=y6eT55y7sywPcRe1liUgm4npLz18zD3ECrznAeqJlH8=; b=EabIehgwEN/G/DmhRnp3l8Or0oo6Kq+ayvpgPcNU2ooeIexY7oJ3Mf53PUHVfIQ1TE nSO3PrHGIMBfnQAk6bh//UejhBjI0CnNDxHt8mLkz2k7UoQYE1OQTSL6eHR069tSSF2M GYAfPwtP/BMxW4sm284cVAP7d9TW/dluSwUt/sXrbiMrEhOM4gC1LDlrFsgHEBv0n25f eS5ZSvfd6VnERqDYdy3zLGd3WZ3K7ZVv548lME5tDQT32V0wVxfZ9vNPi7EwPQ14uyKc L6Q9UQkQLMilkM94AJPNEsmLJ4irBdM6zk0A6XJWbtcm5sdAj1orjVc14txKSwB3kJgX 1RCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e5-20020a17090301c500b001bb3316f71dsi10930000plh.481.2023.08.02.05.10.40; Wed, 02 Aug 2023 05:10:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232516AbjHBLhr (ORCPT + 99 others); Wed, 2 Aug 2023 07:37:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232592AbjHBLho (ORCPT ); Wed, 2 Aug 2023 07:37:44 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C3162139; Wed, 2 Aug 2023 04:37:39 -0700 (PDT) Received: from dggpemm500005.china.huawei.com (unknown [172.30.72.57]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4RG8z16DmlzLp2H; Wed, 2 Aug 2023 19:34:53 +0800 (CST) Received: from [10.69.30.204] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 2 Aug 2023 19:37:36 +0800 Subject: Re: [PATCH net-next 6/9] page_pool: avoid calling no-op externals when possible To: Alexander Lobakin CC: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Maciej Fijalkowski , Larysa Zaremba , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , Simon Horman , , References: <20230727144336.1646454-1-aleksander.lobakin@intel.com> <20230727144336.1646454-7-aleksander.lobakin@intel.com> <604d4f6c-a6e7-e921-2d9a-45fe46ab9e79@intel.com> <799ebbaf-961d-860a-6071-b74e10360e29@huawei.com> <1644b9d0-27a5-0c2b-c530-bcaa347f73c2@intel.com> From: Yunsheng Lin Message-ID: <00695c43-b376-169d-a62d-c1a373cde90c@huawei.com> Date: Wed, 2 Aug 2023 19:37:35 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <1644b9d0-27a5-0c2b-c530-bcaa347f73c2@intel.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/8/1 21:42, Alexander Lobakin wrote: ... >>>> >>>> It seems other subsystem may have the similar problem as page_pool, >>>> is it possible to implement this kind of trick in the dma subsystem >>>> instead of every subsystem inventing their own trick? >>> >>> In the ladder I described above most of overhead comes from jumping >>> between Page Pool functions, not the generic DMA ones. Let's say I do >>> this shortcut in dma_sync_single_range_for_device(), that is too late >>> already to count on some good CPU saves. >> >> We can force inline the page_pool_dma_sync_for_device() function if it >> is 'the good CPU saves' you mentioned above. >> >>> Plus, DMA sync API operates with dma_addr_t, not struct page. IOW it's >>> not clear to me where to store this "we can shortcut" bit in that case. >> >> It seems we only need one bit in 'struct device' to do the 'shortcut', >> and there seems to have avaliable bit at the end of 'struct device'? > > dma_need_sync() can return different results for two different DMA > addresses within the same device. Yes, that's why we need a per device state in order to do the similar trick like this patch does. > >> >> Is it possible that we do something like this patch does in >> dma_sync_single_range_for_device()? >> >> One thing to note is that there may be multi concurrent callers to >> dma_sync_single_range_for_device(), which seems to be different from >> atomic context for page_pool_dma_map(), so it may need some atomic >> operation for the state changing if we want to implement it in a 'generic' >> way. >> >>> >>> >From "other subsystem" I remember only XDP sockets. There, they also >>> avoid calling their own non-inline functions in the first place, not the >>> generic DMA ones. So I'd say both cases (PP and XSk) can't be solved via >>> some "generic" solution. >> >> If PP and XSk both have a similar trick, isn't it a more clear sight >> that it may be solved via some "generic" solution? > > Both shortcut their own functions in the first place, so I don't know > what generic solution could be to optimize non-generic functions. If we are able to shortcut the generic functions, for the page_pool and XSK case,it seems the non-generic functions just need to be inlined if I understand your concern correctly. And for that we may be able to shortcut the generic functions for dma_sync_single_range_for_device() used in driver too? > >> >> Is there any reason there is no a similar trick for sync for cpu in >> XSk as below code indicates? >> https://elixir.free-electrons.com/linux/v6.4-rc6/source/include/net/xsk_buff_pool.h#L152