Received: by 2002:a05:7412:251c:b0:e2:908c:2ebd with SMTP id w28csp1855790rda; Tue, 24 Oct 2023 05:38:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHIyuSVocSnIVPXCKdlZiacV6jEvhYjyQhdxvIS86CQPdg/vi2QQ4YNZswqN4iELroX1xLc X-Received: by 2002:a05:6830:1116:b0:6c4:f91d:2d9c with SMTP id w22-20020a056830111600b006c4f91d2d9cmr12907807otq.23.1698151132105; Tue, 24 Oct 2023 05:38:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698151132; cv=none; d=google.com; s=arc-20160816; b=fapTta2RPKNKDXthKgq8pszCPOvR4KuNPFhhoWCnCE7lAzeLR4u5dIaBwCWz9ojUI9 T4CqSB4xsM22KRHv54edFTXJiv9rW8zkTIMCEMwCAqXhwjPDTXqh3B2WvCK0K3BfQIVj /hvDT+5rsSuNU3rdS18agPvDEu2C9Q1HcYAgt43iCei709yvd6NqD9XuFm5AX3hmizSK aITv68W/MV23+Jjo0txd7GwE6WJBn8xp4sy4CnwnAGvT1sXlEwiDYdvLlkEreIHj5yuu kmVFLT0pnzpPNiatm6SpBixyvg9j/Goq/lsKJiqDamzGZKF0il3Do5Bm1PISFdnQ8xYP 08EQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=oSALr1qXyKrDbVQjrwe70On5z9o02rRpN25JeI9eO90=; fh=+tio2klJZn0j5UK4bTUcVA1ZGJS2eFmlAR+2Qx/MvAo=; b=pNcnzzGmPPALLStm0WzbQyji032Y1a24e/+LS1E3hKK6VxbsxBPS5xyKT5GbheBcNX cIbW49AoWDZHHabhWpdGUmnILvTvRlceud36jHmGUnYPw4SCiXpkcfweSbgyY22N/Xcd yR/B5nA0hJwcluajOFAJBSOp5q3mJ+Nt6VmhKx/APPWd4gMSb1JeRoM8NpCPoT7bk7RY qVQdOY3ESrub8U6tf0gBjk6wpYQbvgF076ZPJJYwXp6XxEoYWRV5Hr2zk61oiMO4bnf5 Unp7VoCoqT/UuDi29NKOwnQ8N4ul8U87icY6yoPx14iSmO6FP8xdsZlZgNa1eN6wZPKP yMVg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id c7-20020a6566c7000000b005af4cc9e22fsi2542186pgw.840.2023.10.24.05.38.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Oct 2023 05:38:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 2A258808651E; Tue, 24 Oct 2023 05:38:48 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233892AbjJXMi2 (ORCPT + 99 others); Tue, 24 Oct 2023 08:38:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233400AbjJXMiR (ORCPT ); Tue, 24 Oct 2023 08:38:17 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E1D4FDA; Tue, 24 Oct 2023 05:38:14 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A69F12F4; Tue, 24 Oct 2023 05:38:55 -0700 (PDT) Received: from [10.1.196.40] (e121345-lin.cambridge.arm.com [10.1.196.40]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0A4D03F64C; Tue, 24 Oct 2023 05:38:12 -0700 (PDT) Message-ID: <611ef809-248f-424e-993c-f5727ed2e18c@arm.com> Date: Tue, 24 Oct 2023 13:38:11 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] iommu: Introduce a rb_tree for looking up device Content-Language: en-GB To: Huang Jiaqing , joro@8bytes.org, will@kernel.org, dwmw2@infradead.org, baolu.lu@linux.intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev Cc: jacob.jun.pan@linux.intel.com, kevin.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org References: <20231024084124.11155-1-jiaqing.huang@intel.com> From: Robin Murphy In-Reply-To: <20231024084124.11155-1-jiaqing.huang@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 24 Oct 2023 05:38:48 -0700 (PDT) On 24/10/2023 9:41 am, Huang Jiaqing wrote: > The existing IO page fault handler locates the PCI device by calling > pci_get_domain_bus_and_slot(), which searches the list of all PCI > devices until the desired PCI device is found. This is inefficient > because the algorithm efficiency of searching a list is O(n). In the > critical path of handling an IO page fault, this is not performance > friendly given that I/O page fault handling patch is performance > critical, and parallel heavy dsa_test may cause cpu stuck due to > the low efficiency and lock competition in current path. > > To improve the performance of the IO page fault handler, replace > pci_get_domain_bus_and_slot() with a local red-black tree. A red-black > tree is a self-balancing binary search tree, which means that the > average time complexity of searching a red-black tree is O(log(n)). This > is significantly faster than O(n), so it can significantly improve the > performance of the IO page fault handler. > > In addition, we can only insert the affected devices (those that have IO > page fault enabled) into the red-black tree. This can further improve > the performance of the IO page fault handler. Have we had the rbtree vs. xarray debate for this one yet? :) However, how many devices do we actually expect to be sharing the same queue? If it's a small number then it's quite possible that tracking separate per-queue sets of devices is the real win here, and a simple list without the additional overheads of more complex structures could still be sufficient. Thanks, Robin. > This series depends on "deliver page faults to user space" patch-set: > https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.intel.com/ > > Signed-off-by: Huang Jiaqing > --- > drivers/iommu/io-pgfault.c | 104 ++++++++++++++++++++++++++++++++++++- > include/linux/iommu.h | 16 ++++++ > 2 files changed, 118 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c > index 1dbacc4fdf72..68e85dc6b1b6 100644 > --- a/drivers/iommu/io-pgfault.c > +++ b/drivers/iommu/io-pgfault.c > @@ -7,6 +7,7 @@ > > #include > #include > +#include > #include > #include > #include > @@ -392,6 +393,55 @@ int iopf_queue_discard_partial(struct iopf_queue *queue) > } > EXPORT_SYMBOL_GPL(iopf_queue_discard_partial); > > +static int iopf_queue_pci_rbtree_insert(struct iopf_queue *queue, struct pci_dev *pdev) > +{ > + int ret; > + struct rb_node **new, *parent = NULL; > + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(&pdev->dev); > + > + if (!iopf_param) > + return -ENODEV; > + > + down_write(&queue->pci_dev_sem); > + new = &(queue->pci_dev_rbtree.rb_node); > + while (*new) { > + struct iommu_fault_param *this = container_of(*new, struct iommu_fault_param, node); > + struct pci_dev *this_pdev = to_pci_dev(this->dev); > + s16 result = RB_NODE_CMP(pdev->bus->number, pdev->devfn, this_pdev->bus->number, this_pdev->devfn); > + > + parent = *new; > + if (result < 0) > + new = &((*new)->rb_left); > + else if (result > 0) > + new = &((*new)->rb_right); > + else { > + ret = -EEXIST; > + goto err_unlock; > + } > + } > + > + rb_link_node(&iopf_param->node, parent, new); > + rb_insert_color(&iopf_param->node, &queue->pci_dev_rbtree); > + > + up_write(&queue->pci_dev_sem); > + return 0; > +err_unlock: > + up_write(&queue->pci_dev_sem); > + iopf_put_dev_fault_param(iopf_param); > + return ret; > +} > + > +/* Caller must have inserted iopf_param by calling iopf_queue_pci_rbtree_insert() */ > +static void iopf_queue_pci_rbtree_remove(struct iopf_queue *queue, struct iommu_fault_param *iopf_param) > +{ > + down_write(&queue->pci_dev_sem); > + rb_erase(&iopf_param->node, &queue->pci_dev_rbtree); > + up_write(&queue->pci_dev_sem); > + > + /* paired with iopf_queue_pci_rbtree_insert() */ > + iopf_put_dev_fault_param(iopf_param); > +} > + > /** > * iopf_queue_add_device - Add producer to the fault queue > * @queue: IOPF queue > @@ -434,7 +484,13 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) > mutex_unlock(¶m->lock); > mutex_unlock(&queue->lock); > > - return ret; > + if (ret) > + return ret; > + > + if (dev_is_pci(dev)) > + return iopf_queue_pci_rbtree_insert(queue, to_pci_dev(dev)); > + > + return 0; > } > EXPORT_SYMBOL_GPL(iopf_queue_add_device); > > @@ -486,7 +542,13 @@ int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev) > mutex_unlock(¶m->lock); > mutex_unlock(&queue->lock); > > - return ret; > + if (ret) > + return ret; > + > + if (dev_is_pci(dev)) > + iopf_queue_pci_rbtree_remove(queue, fault_param); > + > + return 0; > } > EXPORT_SYMBOL_GPL(iopf_queue_remove_device); > > @@ -519,6 +581,9 @@ struct iopf_queue *iopf_queue_alloc(const char *name) > INIT_LIST_HEAD(&queue->devices); > mutex_init(&queue->lock); > > + queue->pci_dev_rbtree = RB_ROOT; > + init_rwsem(&queue->pci_dev_sem); > + > return queue; > } > EXPORT_SYMBOL_GPL(iopf_queue_alloc); > @@ -544,3 +609,38 @@ void iopf_queue_free(struct iopf_queue *queue) > kfree(queue); > } > EXPORT_SYMBOL_GPL(iopf_queue_free); > + > +/** > + * iopf_queue_find_pdev - Lookup pci device in iopf_queue rbtree > + * @queue: IOPF queue > + * @bus: bus number of pci device to lookup > + * @devfn: devfn of pci device to lookup > + * > + * Return: the pci device on success and NULL on not found. > + */ > +struct pci_dev *iopf_queue_find_pdev(struct iopf_queue *queue, u8 bus, u8 devfn) > +{ > + struct iommu_fault_param *data = NULL; > + struct pci_dev *pdev = NULL; > + struct rb_node *node; > + > + down_read(&queue->pci_dev_sem); > + > + node = queue->pci_dev_rbtree.rb_node; > + while (node) { > + data = container_of(node, struct iommu_fault_param, node); > + pdev = to_pci_dev(data->dev); > + s16 result = RB_NODE_CMP(bus, devfn, pdev->bus->number, pdev->devfn); > + > + if (result < 0) > + node = node->rb_left; > + else if (result > 0) > + node = node->rb_right; > + else > + break; > + } > + up_read(&queue->pci_dev_sem); > + > + return node ? pdev : NULL; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_find_pdev); > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index bcec7e91dfc4..b29bbb0d1843 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -136,11 +136,15 @@ struct iopf_group { > * @wq: the fault workqueue > * @devices: devices attached to this queue > * @lock: protects the device list > + * @pci_dev_rbtree: pci devices for looking up > + * @pci_dev_sem: protects the rb_tree > */ > struct iopf_queue { > struct workqueue_struct *wq; > struct list_head devices; > struct mutex lock; > + struct rb_root pci_dev_rbtree; > + struct rw_semaphore pci_dev_sem; > }; > > /* iommu fault flags */ > @@ -483,6 +487,8 @@ struct iommu_device { > u32 max_pasids; > }; > > +#define RB_NODE_CMP(bus1, devfn1, bus2, devfn2) ((s16)(PCI_DEVID(bus1, devfn1) - PCI_DEVID(bus2, devfn2))) > + > /** > * struct iommu_fault_param - per-device IOMMU fault data > * @lock: protect pending faults list > @@ -494,6 +500,7 @@ struct iommu_device { > * @partial: faults that are part of a Page Request Group for which the last > * request hasn't been submitted yet. > * @faults: holds the pending faults which needs response > + * @node: pci device tracking node(lookup by (bus, devfn)) > */ > struct iommu_fault_param { > struct mutex lock; > @@ -505,6 +512,7 @@ struct iommu_fault_param { > > struct list_head partial; > struct list_head faults; > + struct rb_node node; > }; > > /** > @@ -1286,6 +1294,8 @@ int iopf_queue_discard_dev_pasid(struct device *dev, ioasid_t pasid); > struct iopf_queue *iopf_queue_alloc(const char *name); > void iopf_queue_free(struct iopf_queue *queue); > int iopf_queue_discard_partial(struct iopf_queue *queue); > +struct pci_dev *iopf_queue_find_pdev(struct iopf_queue *queue, > + u8 bus, u8 devfn); > void iopf_free_group(struct iopf_group *group); > int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt); > int iommu_page_response(struct device *dev, struct iommu_page_response *msg); > @@ -1321,6 +1331,12 @@ static inline int iopf_queue_discard_partial(struct iopf_queue *queue) > return -ENODEV; > } > > +static inline struct pci_dev *iopf_queue_find_pdev(struct iopf_queue *queue, > + u8 bus, u8 devfn) > +{ > + return NULL; > +} > + > static inline void iopf_free_group(struct iopf_group *group) > { > }