Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1673367pxk; Tue, 1 Sep 2020 05:05:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwWe01xEezgEXRYrgEmJbQHk+QpYOT8sGKrYu6CrOKZSi9hLQJPF3HVfcOtni0B004czdgc X-Received: by 2002:a17:907:728e:: with SMTP id dt14mr1194010ejc.4.1598961958541; Tue, 01 Sep 2020 05:05:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598961958; cv=none; d=google.com; s=arc-20160816; b=JE8HuxbG2XUVydEGgqSUGDKflSSedFstrpjrilztOakTZgpbic/99J+IvQYgHJw3zh 25HTsEzczSaYiHbRVjgnEF8HTNNY78MnUZMGiQoJ+Cc9vkpOeL8jZgOprAQ14CfzB2a2 cRvsIZinW0sS/CqDgILtUDLRYnBoYd9ZCmv++oz+2zRcSE+4TtIbpNNC1P1IDXGCFl6w 0+05UQcbeStzQp1uTM0t5gd/OE6WzeT0uL9rSypUPyQG60PfU9XRE5KYkZxrtEZjwLK4 GQRXs8xKT1xHLoXJtOb9kk3isKpFM99cxMvuQliozmnzF5gAk+E4iPO6+iyuINwXf1HP otEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:subject:from:dkim-signature; bh=GUfih/nPZyjGZ+niJ4oAcPPHeaBYNtER9pXeSP1ORB8=; b=PWYhN9dOFbz+YzeRkCGudSav43ZnurHTl6/Xl9o0/1j1vpPTLWV+XStJ/Zsm9P0AlX l9jp3XU50K3c888RzvVJuvlvfGDmQtEKmDUu0RpnNIyo6oXC7nQrkBReHSOaQ8tpXOCp UGpysspnHOcRBVX5erexxJB/3140Hy+5L9Qp+WIMLGdsQc0rTIk0ufcU7NdJdVls4Vaw kAefePkhAkhW2BoEAHciMqbw8utwyU0sRURpn7OFsSB9+iZLIPmtOqLd96XIs6sLN462 Yl/J/A59b0AnL5Q0JxvuJDP5ZVenAKLumeHVuY890p5Xnu6ZVhUtcfvP0gJbj1YOk1Rd 4QWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bVudsfBu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r20si503937ejc.678.2020.09.01.05.05.34; Tue, 01 Sep 2020 05:05:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bVudsfBu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727065AbgIAMCI (ORCPT + 99 others); Tue, 1 Sep 2020 08:02:08 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:34991 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726140AbgIALvo (ORCPT ); Tue, 1 Sep 2020 07:51:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1598961097; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GUfih/nPZyjGZ+niJ4oAcPPHeaBYNtER9pXeSP1ORB8=; b=bVudsfBuS9mfXNAt3xh5OSnGxQmHUEoSWun3tnZuo9z2be8BVms1FUANFRLe1JUcAq4Hwr AtJndALza/OJQaKHf4qMEA2J60SrbRVmkPMxekpdc+D4qHjCyP5lqsCFsqn/V1jTyaPOsZ cFOVPCzFPoPIo6Dn0S1dPUKE9hbPGJU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-307--Tv39mLkP-CJ_cg2WTWxjw-1; Tue, 01 Sep 2020 07:51:33 -0400 X-MC-Unique: -Tv39mLkP-CJ_cg2WTWxjw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 314141019627; Tue, 1 Sep 2020 11:51:31 +0000 (UTC) Received: from [10.36.112.51] (ovpn-112-51.ams2.redhat.com [10.36.112.51]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4E3985D9CD; Tue, 1 Sep 2020 11:51:27 +0000 (UTC) From: Auger Eric Subject: Re: [PATCH v2 3/9] iommu/ioasid: Introduce ioasid_set APIs To: Jacob Pan , iommu@lists.linux-foundation.org, LKML , Jean-Philippe Brucker , Lu Baolu , Joerg Roedel , David Woodhouse Cc: Yi Liu , "Tian, Kevin" , Raj Ashok , Wu Hao References: <1598070918-21321-1-git-send-email-jacob.jun.pan@linux.intel.com> <1598070918-21321-4-git-send-email-jacob.jun.pan@linux.intel.com> Message-ID: <54ac8f01-15f3-ba49-410e-380fa20ded3b@redhat.com> Date: Tue, 1 Sep 2020 13:51:26 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <1598070918-21321-4-git-send-email-jacob.jun.pan@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jacob, On 8/22/20 6:35 AM, Jacob Pan wrote: > ioasid_set was introduced as an arbitrary token that are shared by a that is > group of IOASIDs. For example, if IOASID #1 and #2 are allocated via the > same ioasid_set*, they are viewed as to belong to the same set. two IOASIDs allocated via the same ioasid_set pointer belong to the same set? > > For guest SVA usages, system-wide IOASID resources need to be > partitioned such that VMs can have its own quota and being managed their own > separately. ioasid_set is the perfect candidate for meeting such > requirements. This patch redefines and extends ioasid_set with the > following new fields: > - Quota > - Reference count > - Storage of its namespace > - The token is stored in the new ioasid_set but with optional types > > ioasid_set level APIs are introduced that wires up these new data. that wire > Existing users of IOASID APIs are converted where a host IOASID set is > allocated for bare-metal usage. > > Signed-off-by: Liu Yi L > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel/iommu.c | 27 ++- > drivers/iommu/intel/pasid.h | 1 + > drivers/iommu/intel/svm.c | 8 +- > drivers/iommu/ioasid.c | 390 +++++++++++++++++++++++++++++++++++++++++--- > include/linux/ioasid.h | 82 ++++++++-- > 5 files changed, 465 insertions(+), 43 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index a3a0b5c8921d..5813eeaa5edb 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -42,6 +42,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -103,6 +104,9 @@ > */ > #define INTEL_IOMMU_PGSIZES (~0xFFFUL) > > +/* PASIDs used by host SVM */ > +struct ioasid_set *host_pasid_set; > + > static inline int agaw_to_level(int agaw) > { > return agaw + 2; > @@ -3103,8 +3107,8 @@ static void intel_vcmd_ioasid_free(ioasid_t ioasid, void *data) > * Sanity check the ioasid owner is done at upper layer, e.g. VFIO > * We can only free the PASID when all the devices are unbound. > */ > - if (ioasid_find(NULL, ioasid, NULL)) { > - pr_alert("Cannot free active IOASID %d\n", ioasid); > + if (IS_ERR(ioasid_find(host_pasid_set, ioasid, NULL))) { > + pr_err("Cannot free IOASID %d, not in system set\n", ioasid); not sure the change in the trace is worth. Also you may be more explicit like IOASID %d to be freed cannot be found in the system ioasid set. shouldn't it be rate_limited as it is originated from user space? > return; > } > vcmd_free_pasid(iommu, ioasid); > @@ -3288,6 +3292,19 @@ static int __init init_dmars(void) > if (ret) > goto free_iommu; > > + /* PASID is needed for scalable mode irrespective to SVM */ > + if (intel_iommu_sm) { > + ioasid_install_capacity(intel_pasid_max_id); > + /* We should not run out of IOASIDs at boot */ > + host_pasid_set = ioasid_alloc_set(NULL, PID_MAX_DEFAULT, s/PID_MAX_DEFAULT/intel_pasid_max_id? > + IOASID_SET_TYPE_NULL); as suggested by jean-Philippe ioasid_set_alloc > + if (IS_ERR_OR_NULL(host_pasid_set)) { > + pr_err("Failed to enable host PASID allocator %lu\n", > + PTR_ERR(host_pasid_set)); does not sound like the correct error message? failed to allocate the system ioasid_set? > + intel_iommu_sm = 0; > + } > + } > + > /* > * for each drhd > * enable fault log > @@ -5149,7 +5166,7 @@ static void auxiliary_unlink_device(struct dmar_domain *domain, > domain->auxd_refcnt--; > > if (!domain->auxd_refcnt && domain->default_pasid > 0) > - ioasid_free(domain->default_pasid); > + ioasid_free(host_pasid_set, domain->default_pasid); > } > > static int aux_domain_add_dev(struct dmar_domain *domain, > @@ -5167,7 +5184,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain, > int pasid; > > /* No private data needed for the default pasid */ > - pasid = ioasid_alloc(NULL, PASID_MIN, > + pasid = ioasid_alloc(host_pasid_set, PASID_MIN, > pci_max_pasids(to_pci_dev(dev)) - 1, > NULL); don't you want to ioasid_set_put() the ioasid_set in intel_iommu_free_dmars()? > if (pasid == INVALID_IOASID) { > @@ -5210,7 +5227,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain, > spin_unlock(&iommu->lock); > spin_unlock_irqrestore(&device_domain_lock, flags); > if (!domain->auxd_refcnt && domain->default_pasid > 0) > - ioasid_free(domain->default_pasid); > + ioasid_free(host_pasid_set, domain->default_pasid); > > return ret; > } > diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h > index c9850766c3a9..ccdc23446015 100644 > --- a/drivers/iommu/intel/pasid.h > +++ b/drivers/iommu/intel/pasid.h > @@ -99,6 +99,7 @@ static inline bool pasid_pte_is_present(struct pasid_entry *pte) > } > > extern u32 intel_pasid_max_id; > +extern struct ioasid_set *host_pasid_set; > int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp); > void intel_pasid_free_id(int pasid); > void *intel_pasid_lookup_id(int pasid); > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c > index 37a9beabc0ca..634e191ca2c3 100644 > --- a/drivers/iommu/intel/svm.c > +++ b/drivers/iommu/intel/svm.c > @@ -551,7 +551,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops, > pasid_max = intel_pasid_max_id; > > /* Do not use PASID 0, reserved for RID to PASID */ > - svm->pasid = ioasid_alloc(NULL, PASID_MIN, > + svm->pasid = ioasid_alloc(host_pasid_set, PASID_MIN, > pasid_max - 1, svm); > if (svm->pasid == INVALID_IOASID) { > kfree(svm); > @@ -568,7 +568,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops, > if (mm) { > ret = mmu_notifier_register(&svm->notifier, mm); > if (ret) { > - ioasid_free(svm->pasid); > + ioasid_free(host_pasid_set, svm->pasid); > kfree(svm); > kfree(sdev); > goto out; > @@ -586,7 +586,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops, > if (ret) { > if (mm) > mmu_notifier_unregister(&svm->notifier, mm); > - ioasid_free(svm->pasid); > + ioasid_free(host_pasid_set, svm->pasid); > kfree(svm); > kfree(sdev); > goto out; > @@ -655,7 +655,7 @@ static int intel_svm_unbind_mm(struct device *dev, int pasid) > kfree_rcu(sdev, rcu); > > if (list_empty(&svm->devs)) { > - ioasid_free(svm->pasid); > + ioasid_free(host_pasid_set, svm->pasid); > if (svm->mm) > mmu_notifier_unregister(&svm->notifier, svm->mm); > list_del(&svm->list); > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c > index 5f63af07acd5..f73b3dbfc37a 100644 > --- a/drivers/iommu/ioasid.c > +++ b/drivers/iommu/ioasid.c > @@ -1,22 +1,58 @@ > // SPDX-License-Identifier: GPL-2.0 > /* > * I/O Address Space ID allocator. There is one global IOASID space, split into > - * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and I would try to avoid using new terms: s/subset_ioset_set > - * free IOASIDs with ioasid_alloc and ioasid_free. > + * subsets. Users create a subset with ioasid_alloc_set, then allocate/free IDs here also and ioasid_set_alloc > + * with ioasid_alloc and ioasid_free. > */ > -#include > #include > #include > #include > #include > +#include > + > +static DEFINE_XARRAY_ALLOC(ioasid_sets); > +enum ioasid_state { > + IOASID_STATE_INACTIVE, > + IOASID_STATE_ACTIVE, > + IOASID_STATE_FREE_PENDING, > +}; > > +/** > + * struct ioasid_data - Meta data about ioasid > + * > + * @id: Unique ID > + * @users Number of active users > + * @state Track state of the IOASID > + * @set Meta data of the set this IOASID belongs to s/Meta data of the set this IOASID belongs to/ioasid_set the asid belongs to > + * @private Private data associated with the IOASID I would have expected to find the private asid somewhere > + * @rcu For free after RCU grace period > + */ > struct ioasid_data { > ioasid_t id; > struct ioasid_set *set; > + refcount_t users; > + enum ioasid_state state; > void *private; > struct rcu_head rcu; > }; > > +/* Default to PCIe standard 20 bit PASID */ > +#define PCI_PASID_MAX 0x100000 > +static ioasid_t ioasid_capacity = PCI_PASID_MAX; > +static ioasid_t ioasid_capacity_avail = PCI_PASID_MAX; > + > +void ioasid_install_capacity(ioasid_t total) > +{ > + ioasid_capacity = ioasid_capacity_avail = total; > +} > +EXPORT_SYMBOL_GPL(ioasid_install_capacity); > + > +ioasid_t ioasid_get_capacity() > +{ > + return ioasid_capacity; > +} > +EXPORT_SYMBOL_GPL(ioasid_get_capacity); > + > /* > * struct ioasid_allocator_data - Internal data structure to hold information > * about an allocator. There are two types of allocators: > @@ -306,11 +342,23 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, > { > struct ioasid_data *data; > void *adata; > - ioasid_t id; > + ioasid_t id = INVALID_IOASID; > + > + spin_lock(&ioasid_allocator_lock); > + /* Check if the IOASID set has been allocated and initialized */ > + if (WARN_ON(xa_load(&ioasid_sets, set->sid) != set)) { > + pr_warn("Invalid set\n"); > + goto done_unlock; > + } > + > + if (set->quota <= set->nr_ioasids) { > + pr_err("IOASID set %d out of quota %d\n", set->sid, set->quota); > + goto done_unlock; > + } > > data = kzalloc(sizeof(*data), GFP_ATOMIC); > if (!data) > - return INVALID_IOASID; > + goto done_unlock; > > data->set = set; > data->private = private; > @@ -319,7 +367,6 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, > * Custom allocator needs allocator data to perform platform specific > * operations. > */ > - spin_lock(&ioasid_allocator_lock); > adata = active_allocator->flags & IOASID_ALLOCATOR_CUSTOM ? active_allocator->ops->pdata : data; > id = active_allocator->ops->alloc(min, max, adata); > if (id == INVALID_IOASID) { > @@ -335,42 +382,339 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, > goto exit_free; > } > data->id = id; > + data->state = IOASID_STATE_ACTIVE; > + refcount_set(&data->users, 1); > + > + /* Store IOASID in the per set data */ > + if (xa_err(xa_store(&set->xa, id, data, GFP_ATOMIC))) { > + pr_err("Failed to ioasid %d in set %d\n", id, set->sid); > + goto exit_free; > + } > + set->nr_ioasids++; > + goto done_unlock; > > - spin_unlock(&ioasid_allocator_lock); > - return id; > exit_free: > - spin_unlock(&ioasid_allocator_lock); > kfree(data); > - return INVALID_IOASID; > +done_unlock: > + spin_unlock(&ioasid_allocator_lock); > + return id; > } > EXPORT_SYMBOL_GPL(ioasid_alloc); > > +static void ioasid_do_free(struct ioasid_data *data) do_free_locked? > +{ > + struct ioasid_data *ioasid_data; > + struct ioasid_set *sdata; > + > + active_allocator->ops->free(data->id, active_allocator->ops->pdata); > + /* Custom allocator needs additional steps to free the xa element */ > + if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) { > + ioasid_data = xa_erase(&active_allocator->xa, data->id); > + kfree_rcu(ioasid_data, rcu); > + } > + > + sdata = xa_load(&ioasid_sets, data->set->sid); > + if (!sdata) { > + pr_err("No set %d for IOASID %d\n", data->set->sid, > + data->id); > + return; > + } > + xa_erase(&sdata->xa, data->id); > + sdata->nr_ioasids--; > +} > + > +static void ioasid_free_locked(struct ioasid_set *set, ioasid_t ioasid) > +{ > + struct ioasid_data *data; > + > + data = xa_load(&active_allocator->xa, ioasid); > + if (!data) { > + pr_err("Trying to free unknown IOASID %u\n", ioasid); > + return; > + } > + > + if (data->set != set) { > + pr_warn("Cannot free IOASID %u due to set ownership\n", ioasid); > + return; > + } > + data->state = IOASID_STATE_FREE_PENDING; > + > + if (!refcount_dec_and_test(&data->users)) > + return; > + > + ioasid_do_free(data); > +} > + > /** > - * ioasid_free - Free an IOASID > - * @ioasid: the ID to remove > + * ioasid_free - Drop reference on an IOASID. Free if refcount drops to 0, > + * including free from its set and system-wide list. > + * @set: The ioasid_set to check permission with. If not NULL, IOASID > + * free will fail if the set does not match. > + * @ioasid: The IOASID to remove > */ > -void ioasid_free(ioasid_t ioasid) > +void ioasid_free(struct ioasid_set *set, ioasid_t ioasid) > { > - struct ioasid_data *ioasid_data; > + spin_lock(&ioasid_allocator_lock); > + ioasid_free_locked(set, ioasid); > + spin_unlock(&ioasid_allocator_lock); > +} > +EXPORT_SYMBOL_GPL(ioasid_free); > > +/** > + * ioasid_alloc_set - Allocate a new IOASID set for a given token > + * > + * @token: Unique token of the IOASID set, cannot be NULL > + * @quota: Quota allowed in this set. Only for new set creation > + * @flags: Special requirements > + * > + * IOASID can be limited system-wide resource that requires quota management. > + * If caller does not wish to enforce quota, use IOASID_SET_NO_QUOTA flag. > + * > + * Token will be stored in the ioasid_set returned. A reference will be taken > + * upon finding a matching set or newly created set. > + * IOASID allocation within the set and other per set operations will use > + * the retured ioasid_set *. > + * > + */ > +struct ioasid_set *ioasid_alloc_set(void *token, ioasid_t quota, int type) > +{ > + struct ioasid_set *sdata; > + unsigned long index; > + ioasid_t id; > + > + if (type >= IOASID_SET_TYPE_NR) > + return ERR_PTR(-EINVAL); > + > + /* > + * Need to check space available if we share system-wide quota. > + * TODO: we may need to support quota free sets in the future. > + */ > spin_lock(&ioasid_allocator_lock); > - ioasid_data = xa_load(&active_allocator->xa, ioasid); > - if (!ioasid_data) { > - pr_err("Trying to free unknown IOASID %u\n", ioasid); > + if (quota > ioasid_capacity_avail) { > + pr_warn("Out of IOASID capacity! ask %d, avail %d\n", > + quota, ioasid_capacity_avail); > + sdata = ERR_PTR(-ENOSPC); > goto exit_unlock; > } > > - active_allocator->ops->free(ioasid, active_allocator->ops->pdata); > - /* Custom allocator needs additional steps to free the xa element */ > - if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) { > - ioasid_data = xa_erase(&active_allocator->xa, ioasid); > - kfree_rcu(ioasid_data, rcu); > + /* > + * Token is only unique within its types but right now we have only > + * mm type. If we have more token types, we have to match type as well. > + */ > + switch (type) { > + case IOASID_SET_TYPE_MM: > + /* Search existing set tokens, reject duplicates */ > + xa_for_each(&ioasid_sets, index, sdata) { > + if (sdata->token == token && > + sdata->type == IOASID_SET_TYPE_MM) { > + sdata = ERR_PTR(-EEXIST); > + goto exit_unlock; > + } > + } > + break; > + case IOASID_SET_TYPE_NULL: > + if (!token) > + break; > + fallthrough; > + default: > + pr_err("Invalid token and IOASID type\n"); > + sdata = ERR_PTR(-EINVAL); > + goto exit_unlock; > } > > + /* REVISIT: may support set w/o quota, use system available */ > + if (!quota) { > + sdata = ERR_PTR(-EINVAL); > + goto exit_unlock; > + } > + > + sdata = kzalloc(sizeof(*sdata), GFP_ATOMIC); > + if (!sdata) { > + sdata = ERR_PTR(-ENOMEM); > + goto exit_unlock; > + } > + > + if (xa_alloc(&ioasid_sets, &id, sdata, > + XA_LIMIT(0, ioasid_capacity_avail - quota), > + GFP_ATOMIC)) { > + kfree(sdata); > + sdata = ERR_PTR(-ENOSPC); > + goto exit_unlock; > + } > + > + sdata->token = token; > + sdata->type = type; > + sdata->quota = quota; > + sdata->sid = id; > + refcount_set(&sdata->ref, 1); > + > + /* > + * Per set XA is used to store private IDs within the set, get ready > + * for ioasid_set private ID and system-wide IOASID allocation > + * results. > + */ > + xa_init_flags(&sdata->xa, XA_FLAGS_ALLOC); > + ioasid_capacity_avail -= quota; > + > exit_unlock: > spin_unlock(&ioasid_allocator_lock); > + > + return sdata; > } > -EXPORT_SYMBOL_GPL(ioasid_free); > +EXPORT_SYMBOL_GPL(ioasid_alloc_set); > + > +void ioasid_set_get_locked(struct ioasid_set *set) > +{ > + if (WARN_ON(xa_load(&ioasid_sets, set->sid) != set)) { > + pr_warn("Invalid set data\n"); > + return; > + } > + > + refcount_inc(&set->ref); > +} > +EXPORT_SYMBOL_GPL(ioasid_set_get_locked); > + > +void ioasid_set_get(struct ioasid_set *set) > +{ > + spin_lock(&ioasid_allocator_lock); > + ioasid_set_get_locked(set); > + spin_unlock(&ioasid_allocator_lock); > +} > +EXPORT_SYMBOL_GPL(ioasid_set_get); > + > +void ioasid_set_put_locked(struct ioasid_set *set) > +{ > + struct ioasid_data *entry; > + unsigned long index; > + > + if (WARN_ON(xa_load(&ioasid_sets, set->sid) != set)) { > + pr_warn("Invalid set data\n"); > + return; > + } > + > + if (!refcount_dec_and_test(&set->ref)) { > + pr_debug("%s: IOASID set %d has %d users\n", > + __func__, set->sid, refcount_read(&set->ref)); > + return; > + } > + > + /* The set is already empty, we just destroy the set. */ > + if (xa_empty(&set->xa)) > + goto done_destroy; > + > + /* > + * Free all PASIDs from system-wide IOASID pool, all subscribers gets > + * notified and do cleanup of their own. > + * Note that some references of the IOASIDs within the set can still > + * be held after the free call. This is OK in that the IOASIDs will be > + * marked inactive, the only operations can be done is ioasid_put. > + * No need to track IOASID set states since there is no reclaim phase. > + */ > + xa_for_each(&set->xa, index, entry) { > + ioasid_free_locked(set, index); > + /* Free from per set private pool */ > + xa_erase(&set->xa, index); > + } > + > +done_destroy: > + /* Return the quota back to system pool */ > + ioasid_capacity_avail += set->quota; > + kfree_rcu(set, rcu); > + > + /* > + * Token got released right away after the ioasid_set is freed. > + * If a new set is created immediately with the newly released token, > + * it will not allocate the same IOASIDs unless they are reclaimed. > + */ > + xa_erase(&ioasid_sets, set->sid); > +} > +EXPORT_SYMBOL_GPL(ioasid_set_put_locked); > + > +/** > + * ioasid_set_put - Drop a reference to the IOASID set. Free all IOASIDs within > + * the set if there are no more users. > + * > + * @set: The IOASID set ID to be freed > + * > + * If refcount drops to zero, all IOASIDs allocated within the set will be > + * freed. > + */ > +void ioasid_set_put(struct ioasid_set *set) > +{ > + spin_lock(&ioasid_allocator_lock); > + ioasid_set_put_locked(set); > + spin_unlock(&ioasid_allocator_lock); > +} > +EXPORT_SYMBOL_GPL(ioasid_set_put); > + > +/** > + * ioasid_adjust_set - Adjust the quota of an IOASID set > + * @set: IOASID set to be assigned > + * @quota: Quota allowed in this set > + * > + * Return 0 on success. If the new quota is smaller than the number of > + * IOASIDs already allocated, -EINVAL will be returned. No change will be > + * made to the existing quota. > + */ > +int ioasid_adjust_set(struct ioasid_set *set, int quota) > +{ > + int ret = 0; > + > + spin_lock(&ioasid_allocator_lock); > + if (set->nr_ioasids > quota) { > + pr_err("New quota %d is smaller than outstanding IOASIDs %d\n", > + quota, set->nr_ioasids); > + ret = -EINVAL; > + goto done_unlock; > + } > + > + if (quota >= ioasid_capacity_avail) { > + ret = -ENOSPC; > + goto done_unlock; > + } > + > + /* Return the delta back to system pool */ > + ioasid_capacity_avail += set->quota - quota; > + > + /* > + * May have a policy to prevent giving all available IOASIDs > + * to one set. But we don't enforce here, it should be in the > + * upper layers. > + */ > + set->quota = quota; > + > +done_unlock: > + spin_unlock(&ioasid_allocator_lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(ioasid_adjust_set); > + > +/** > + * ioasid_set_for_each_ioasid - Iterate over all the IOASIDs within the set > + * > + * Caller must hold a reference of the set and handles its own locking. > + */ > +int ioasid_set_for_each_ioasid(struct ioasid_set *set, > + void (*fn)(ioasid_t id, void *data), > + void *data) > +{ > + struct ioasid_data *entry; > + unsigned long index; > + int ret = 0; > + > + if (xa_empty(&set->xa)) { > + pr_warn("No IOASIDs in the set %d\n", set->sid); > + return -ENOENT; > + } > + > + xa_for_each(&set->xa, index, entry) { > + fn(index, data); > + } > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(ioasid_set_for_each_ioasid); > > /** > * ioasid_find - Find IOASID data > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h > index 9c44947a68c8..412d025d440e 100644 > --- a/include/linux/ioasid.h > +++ b/include/linux/ioasid.h > @@ -10,8 +10,35 @@ typedef unsigned int ioasid_t; > typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data); > typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data); > > +/* IOASID set types */ > +enum ioasid_set_type { > + IOASID_SET_TYPE_NULL = 1, /* Set token is NULL */ > + IOASID_SET_TYPE_MM, /* Set token is a mm_struct, s/mm_struct/mm_struct pointer > + * i.e. associated with a process > + */ > + IOASID_SET_TYPE_NR, > +}; > + > +/** > + * struct ioasid_set - Meta data about ioasid_set > + * @type: Token types and other features token type. Why "and other features" > + * @token: Unique to identify an IOASID set > + * @xa: XArray to store ioasid_set private IDs, can be used for > + * guest-host IOASID mapping, or just a private IOASID namespace. > + * @quota: Max number of IOASIDs can be allocated within the set > + * @nr_ioasids Number of IOASIDs currently allocated in the set > + * @sid: ID of the set > + * @ref: Reference count of the users > + */ > struct ioasid_set { > - int dummy; > + void *token; > + struct xarray xa; > + int type; > + int quota; > + int nr_ioasids; > + int sid; nit id? sid has a special meaning on ARM. > + refcount_t ref; > + struct rcu_head rcu; > }; > > /** > @@ -29,31 +56,64 @@ struct ioasid_allocator_ops { > void *pdata; > }; > > -#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 } > - > #if IS_ENABLED(CONFIG_IOASID) > +void ioasid_install_capacity(ioasid_t total); > +ioasid_t ioasid_get_capacity(void); > +struct ioasid_set *ioasid_alloc_set(void *token, ioasid_t quota, int type); > +int ioasid_adjust_set(struct ioasid_set *set, int quota); ioasid_set_adjust_quota > +void ioasid_set_get_locked(struct ioasid_set *set); as mentionned during the Plumber uConf, the set_get is unfortunate. Globally I wonder if we shouldn't rename "set" into "pool" or something alike. > +void ioasid_set_put_locked(struct ioasid_set *set); > +void ioasid_set_put(struct ioasid_set *set); > + > ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, > void *private); > -void ioasid_free(ioasid_t ioasid); > -void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, > - bool (*getter)(void *)); > +void ioasid_free(struct ioasid_set *set, ioasid_t ioasid); > + > +bool ioasid_is_active(ioasid_t ioasid); > + > +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, bool (*getter)(void *)); > +int ioasid_attach_data(ioasid_t ioasid, void *data); > int ioasid_register_allocator(struct ioasid_allocator_ops *allocator); > void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator); > -int ioasid_attach_data(ioasid_t ioasid, void *data); > - > +void ioasid_is_in_set(struct ioasid_set *set, ioasid_t ioasid); > +int ioasid_set_for_each_ioasid(struct ioasid_set *sdata, > + void (*fn)(ioasid_t id, void *data), > + void *data); > #else /* !CONFIG_IOASID */ > +static inline void ioasid_install_capacity(ioasid_t total) > +{ > +} > + > +static inline ioasid_t ioasid_get_capacity(void) > +{ > + return 0; > +} > + > static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, > ioasid_t max, void *private) > { > return INVALID_IOASID; > } > > -static inline void ioasid_free(ioasid_t ioasid) > +static inline void ioasid_free(struct ioasid_set *set, ioasid_t ioasid) > +{ > +} > + > +static inline bool ioasid_is_active(ioasid_t ioasid) > +{ > + return false; > +} > + > +static inline struct ioasid_set *ioasid_alloc_set(void *token, ioasid_t quota, int type) > +{ > + return ERR_PTR(-ENOTSUPP); > +} > + > +static inline void ioasid_set_put(struct ioasid_set *set) > { > } > > -static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, > - bool (*getter)(void *)) > +static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, bool (*getter)(void *)) > { > return NULL; > } > I felt very difficult to review this patch. Could you split it into several ones? maybe introduce the a dummy host_pasid_set and update the call sites accordingling. You introduce ownership checking, quota checking, ioasid state, ref counting, ioasid type handling (whereas existing is NULL) so I have the feeling that could ease the review process by adopting a more incremental approach. Thanks Eric