2020-03-25 17:51:21

by Jacob Pan

[permalink] [raw]
Subject: [PATCH 00/10] IOASID extensions for guest SVA

IOASID was introduced in v5.5 as a generic kernel allocator service for
both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub Stream
ID. In addition to basic ID allocation, ioasid_set was introduced as a
token that is shared by a group of IOASIDs. This set token can be used
for permission checking but lack of some features needed by guest Shared
Virtual Address (SVA). In addition, IOASID support for life cycle
management is needed among multiple users.

This patchset introduces two extensions to the IOASID code,
1. IOASID set operations
2. Notifications for IOASID state synchronization

Part #1:
IOASIDs used by each VM fits naturally into an ioasid_set. The usage
for per set management requires the following features:

- Quota enforcement - This is to prevent one VM from abusing the
allocator to take all the system IOASIDs. Though VFIO layer can also
enforce the quota, but it cannot cover the usage with both guest and
host SVA on the same system.

- Stores guest IOASID-Host IOASID mapping within the set. To
support live migration, IOASID namespace should be owned by the
guest. This requires per IOASID set look up between guest and host
IOASIDs. This patchset does not introduce non-identity guest-host
IOASID lookup, we merely introduce the infrastructure in per set data.

- Set level operations, e.g. when a guest terminates, it is likely to
free the entire set. Having a single place to manage the set where the
IOASIDs are stored makes iteration much easier.


New APIs are:
- void ioasid_install_capacity(ioasid_t total);
Set the system capacity prior to any allocations. On x86, VT-d driver
calls this function to set max number of PASIDs, typically 1 million
(20 bits).

- int ioasid_alloc_system_set(int quota);
Host system has a default ioasid_set, during boot it is expected that
this default set is allocated with a reasonable quota, e.g. PID_MAX.
This default/system set is used for baremetal SVA.

- int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
*sid);
Allocate a new set with a token, returned sid (set ID) will be used
to allocate IOASIDs within the set. Allocation of IOASIDs cannot
exceed the quota.

- void ioasid_free_set(int sid, bool destroy_set);
Free the entire set and notify all users with an option to destroy
the set. Set ID can be used for allocation again if not destroyed.

- int ioasid_find_sid(ioasid_t ioasid);
Look up the set ID from an ioasid. There is no reference held,
assuming set has a single owner.

- int ioasid_adjust_set(int sid, int quota);
Change the quota of the set, new quota cannot be less than the number
of IOASIDs already allocated within the set. This is useful when
IOASID resource needs to be balanced among VMs.

Part #2
Notification service. Since IOASIDs are used by many consumers that
follow publisher-subscriber pattern, notification is a natural choice
to keep states synchronized. For example, on x86 system, guest PASID
allocation and bind call results in VFIO IOCTL that can add and change
guest-host PASID states. At the same time, IOMMU driver and KVM need to
maintain its own PASID contexts. In this case, VFIO is the publisher
within the kernel, IOMMU driver and KVM are the subscribers.

This patchset introduces a global blocking notifier chain and APIs to
operate on. Not all events nor all IOASIDs are of interests to all
subscribers. e.g. KVM is only interested in the IOASIDs within its set.
IOMMU driver is not ioasid_set aware. A further optimization could be
having both global and per set notifier. But consider the infrequent
nature of bind/unbind and relatively long process life cycle, this
optimization may not be needed at this time.

To register/unregister notification blocks, use these two APIs:
- int ioasid_add_notifier(struct notifier_block *nb);
- void ioasid_remove_notifier(struct notifier_block *nb)

To send notification on an IOASID with one of the commands (FREE,
BIND/UNBIND, etc.), use:
- int ioasid_notify(ioasid_t id, enum ioasid_notify_val cmd);

This work is a result of collaboration with many people:
Liu, Yi L <[email protected]>
Wu Hao <[email protected]>
Ashok Raj <[email protected]>
Kevin Tian <[email protected]>

Thanks,

Jacob

Jacob Pan (10):
iommu/ioasid: Introduce system-wide capacity
iommu/vt-d: Set IOASID capacity when SVM is enabled
iommu/ioasid: Introduce per set allocation APIs
iommu/ioasid: Rename ioasid_set_data to avoid confusion with
ioasid_set
iommu/ioasid: Create an IOASID set for host SVA use
iommu/ioasid: Convert to set aware allocations
iommu/ioasid: Use mutex instead of spinlock
iommu/ioasid: Introduce notifier APIs
iommu/ioasid: Support ioasid_set quota adjustment
iommu/vt-d: Register PASID notifier for status change

drivers/iommu/intel-iommu.c | 20 ++-
drivers/iommu/intel-svm.c | 89 ++++++++--
drivers/iommu/ioasid.c | 387 +++++++++++++++++++++++++++++++++++++++-----
include/linux/intel-iommu.h | 1 +
include/linux/ioasid.h | 86 +++++++++-
5 files changed, 522 insertions(+), 61 deletions(-)

--
2.7.4


2020-03-25 17:51:46

by Jacob Pan

[permalink] [raw]
Subject: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

Bare metal SVA allocates IOASIDs for native process addresses. This
should be separated from VM allocated IOASIDs thus under its own set.

This patch creates a system IOASID set with its quota set to PID_MAX.
This is a reasonable default in that SVM capable devices can only bind
to limited user processes.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel-iommu.c | 8 +++++++-
drivers/iommu/ioasid.c | 9 +++++++++
include/linux/ioasid.h | 9 +++++++++
3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ec3fc121744a..af7a1ef7b31e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
goto free_iommu;

/* PASID is needed for scalable mode irrespective to SVM */
- if (intel_iommu_sm)
+ if (intel_iommu_sm) {
ioasid_install_capacity(intel_pasid_max_id);
+ /* We should not run out of IOASIDs at boot */
+ if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
+ pr_err("Failed to enable host PASID allocator\n");
+ intel_iommu_sm = 0;
+ }
+ }

/*
* for each drhd
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 6265d2dbbced..9135af171a7c 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -39,6 +39,9 @@ struct ioasid_data {
static ioasid_t ioasid_capacity;
static ioasid_t ioasid_capacity_avail;

+int system_ioasid_sid;
+static DECLARE_IOASID_SET(system_ioasid);
+
/* System capacity can only be set once */
void ioasid_install_capacity(ioasid_t total)
{
@@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
}
EXPORT_SYMBOL_GPL(ioasid_install_capacity);

+int ioasid_alloc_system_set(int quota)
+{
+ return ioasid_alloc_set(&system_ioasid, quota, &system_ioasid_sid);
+}
+EXPORT_SYMBOL_GPL(ioasid_alloc_system_set);
+
/*
* struct ioasid_allocator_data - Internal data structure to hold information
* about an allocator. There are two types of allocators:
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index 8c82d2625671..097b1cc043a3 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -29,6 +29,9 @@ struct ioasid_allocator_ops {
void *pdata;
};

+/* Shared IOASID set for reserved for host system use */
+extern int system_ioasid_sid;
+
#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }

#if IS_ENABLED(CONFIG_IOASID)
@@ -41,6 +44,7 @@ int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
int ioasid_attach_data(ioasid_t ioasid, void *data);
void ioasid_install_capacity(ioasid_t total);
+int ioasid_alloc_system_set(int quota);
int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid);
void ioasid_free_set(int sid, bool destroy_set);
int ioasid_find_sid(ioasid_t ioasid);
@@ -88,5 +92,10 @@ static inline void ioasid_install_capacity(ioasid_t total)
{
}

+static inline int ioasid_alloc_system_set(int quota)
+{
+ return -ENOTSUPP;
+}
+
#endif /* CONFIG_IOASID */
#endif /* __LINUX_IOASID_H */
--
2.7.4

2020-03-25 17:52:05

by Jacob Pan

[permalink] [raw]
Subject: [PATCH 01/10] iommu/ioasid: Introduce system-wide capacity

IOASID is a limited system-wide resource that can be allocated at
runtime. This limitation can be enumerated during boot. For example, on
x86 platforms, PCI Process Address Space ID (PASID) allocation uses
IOASID service. The number of supported PASID bits are enumerated by
extended capability register as defined in the VT-d spec.

This patch adds a helper to set the system capacity, it expected to be
set during boot prior to any allocation request.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/ioasid.c | 15 +++++++++++++++
include/linux/ioasid.h | 5 ++++-
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 0f8dd377aada..4026e52855b9 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -17,6 +17,21 @@ struct ioasid_data {
struct rcu_head rcu;
};

+static ioasid_t ioasid_capacity;
+static ioasid_t ioasid_capacity_avail;
+
+/* System capacity can only be set once */
+void ioasid_install_capacity(ioasid_t total)
+{
+ if (ioasid_capacity) {
+ pr_warn("IOASID capacity already set at %d\n", ioasid_capacity);
+ return;
+ }
+
+ ioasid_capacity = ioasid_capacity_avail = total;
+}
+EXPORT_SYMBOL_GPL(ioasid_install_capacity);
+
/*
* struct ioasid_allocator_data - Internal data structure to hold information
* about an allocator. There are two types of allocators:
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index 6f000d7a0ddc..9711fa0dc357 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -40,7 +40,7 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
int ioasid_set_data(ioasid_t ioasid, void *data);
-
+void ioasid_install_capacity(ioasid_t total);
#else /* !CONFIG_IOASID */
static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
ioasid_t max, void *private)
@@ -72,5 +72,8 @@ static inline int ioasid_set_data(ioasid_t ioasid, void *data)
return -ENOTSUPP;
}

+static inline void ioasid_install_capacity(ioasid_t total)
+{
+}
#endif /* CONFIG_IOASID */
#endif /* __LINUX_IOASID_H */
--
2.7.4

2020-03-25 17:52:11

by Jacob Pan

[permalink] [raw]
Subject: [PATCH 02/10] iommu/vt-d: Set IOASID capacity when SVM is enabled

Assign system-wide PASID capacity with enumerated max value.
Currently, all Intel SVM capable devices should support full 20 bits of
PASID value.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel-iommu.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a699a765c983..ec3fc121744a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3510,6 +3510,10 @@ static int __init init_dmars(void)
if (ret)
goto free_iommu;

+ /* PASID is needed for scalable mode irrespective to SVM */
+ if (intel_iommu_sm)
+ ioasid_install_capacity(intel_pasid_max_id);
+
/*
* for each drhd
* enable fault log
--
2.7.4

2020-03-25 17:52:16

by Jacob Pan

[permalink] [raw]
Subject: [PATCH 04/10] iommu/ioasid: Rename ioasid_set_data to avoid confusion with ioasid_set

IOASID set refers to a group of IOASIDs that shares the same token.
ioasid_set_data() function is used to attach a private data to an IOASID,
rename it to ioasid_attach_data() avoid being confused with the group/set
concept.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel-svm.c | 11 ++++++-----
drivers/iommu/ioasid.c | 6 +++---
include/linux/ioasid.h | 4 ++--
3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index b6405df6cfb5..1991587fd3fd 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -319,14 +319,15 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
svm->gpasid = data->gpasid;
svm->flags |= SVM_FLAG_GUEST_PASID;
}
- ioasid_set_data(data->hpasid, svm);
+
+ ioasid_attach_data(data->hpasid, svm);
INIT_LIST_HEAD_RCU(&svm->devs);
mmput(svm->mm);
}
sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
if (!sdev) {
if (list_empty(&svm->devs)) {
- ioasid_set_data(data->hpasid, NULL);
+ ioasid_attach_data(data->hpasid, NULL);
kfree(svm);
}
ret = -ENOMEM;
@@ -346,7 +347,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
* was allocated in this function.
*/
if (list_empty(&svm->devs)) {
- ioasid_set_data(data->hpasid, NULL);
+ ioasid_attach_data(data->hpasid, NULL);
kfree(svm);
}
goto out;
@@ -375,7 +376,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
*/
kfree(sdev);
if (list_empty(&svm->devs)) {
- ioasid_set_data(data->hpasid, NULL);
+ ioasid_attach_data(data->hpasid, NULL);
kfree(svm);
}
goto out;
@@ -438,7 +439,7 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
* that PASID allocated by one guest cannot be
* used by another.
*/
- ioasid_set_data(pasid, NULL);
+ ioasid_attach_data(pasid, NULL);
kfree(svm);
}
}
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 27ee57f7079b..6265d2dbbced 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -292,14 +292,14 @@ void ioasid_unregister_allocator(struct ioasid_allocator_ops *ops)
EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);

/**
- * ioasid_set_data - Set private data for an allocated ioasid
+ * ioasid_attach_data - Set private data for an allocated ioasid
* @ioasid: the ID to set data
* @data: the private data
*
* For IOASID that is already allocated, private data can be set
* via this API. Future lookup can be done via ioasid_find.
*/
-int ioasid_set_data(ioasid_t ioasid, void *data)
+int ioasid_attach_data(ioasid_t ioasid, void *data)
{
struct ioasid_data *ioasid_data;
int ret = 0;
@@ -321,7 +321,7 @@ int ioasid_set_data(ioasid_t ioasid, void *data)

return ret;
}
-EXPORT_SYMBOL_GPL(ioasid_set_data);
+EXPORT_SYMBOL_GPL(ioasid_attach_data);

/**
* ioasid_alloc - Allocate an IOASID
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index be158e03c034..8c82d2625671 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -39,7 +39,7 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
bool (*getter)(void *));
int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
-int ioasid_set_data(ioasid_t ioasid, void *data);
+int ioasid_attach_data(ioasid_t ioasid, void *data);
void ioasid_install_capacity(ioasid_t total);
int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid);
void ioasid_free_set(int sid, bool destroy_set);
@@ -79,7 +79,7 @@ static inline void ioasid_unregister_allocator(struct ioasid_allocator_ops *allo
{
}

-static inline int ioasid_set_data(ioasid_t ioasid, void *data)
+static inline int ioasid_attach_data(ioasid_t ioasid, void *data)
{
return -ENOTSUPP;
}
--
2.7.4

2020-03-25 17:52:22

by Jacob Pan

[permalink] [raw]
Subject: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

IOASID set defines a group of IDs that share the same token. The
ioasid_set concept helps to do permission checking among users as in the
current code.

With guest SVA usage, each VM has its own IOASID set. More
functionalities are needed:
1. Enforce quota, each guest may be assigned limited quota such that one
guest cannot abuse all the system resource.
2. Stores IOASID mapping between guest and host IOASIDs
3. Per set operations, e.g. free the entire set

For each ioasid_set token, a unique set ID is assigned. This makes
reference of the set and data lookup much easier to implement.

Signed-off-by: Liu Yi L <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/ioasid.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/ioasid.h | 13 +++++
2 files changed, 160 insertions(+)

diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 4026e52855b9..27ee57f7079b 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -10,6 +10,25 @@
#include <linux/spinlock.h>
#include <linux/xarray.h>

+static DEFINE_XARRAY_ALLOC(ioasid_sets);
+/**
+ * struct ioasid_set_data - Meta data about ioasid_set
+ *
+ * @token: Unique to identify an IOASID set
+ * @xa: XArray to store subset ID and IOASID mapping
+ * @size: Max number of IOASIDs can be allocated within the set
+ * @nr_ioasids Number of IOASIDs allocated in the set
+ * @sid ID of the set
+ */
+struct ioasid_set_data {
+ struct ioasid_set *token;
+ struct xarray xa;
+ int size;
+ int nr_ioasids;
+ int sid;
+ struct rcu_head rcu;
+};
+
struct ioasid_data {
ioasid_t id;
struct ioasid_set *set;
@@ -388,6 +407,111 @@ void ioasid_free(ioasid_t ioasid)
EXPORT_SYMBOL_GPL(ioasid_free);

/**
+ * ioasid_alloc_set - Allocate a set of IOASIDs
+ * @token: Unique token of the IOASID set
+ * @quota: Quota allowed in this set
+ * @sid: IOASID set ID to be assigned
+ *
+ * Return 0 upon success. Token will be stored internally for lookup,
+ * IOASID allocation within the set and other per set operations will use
+ * the @sid assigned.
+ *
+ */
+int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid)
+{
+ struct ioasid_set_data *sdata;
+ ioasid_t id;
+ int ret = 0;
+
+ if (quota > ioasid_capacity_avail) {
+ pr_warn("Out of IOASID capacity! ask %d, avail %d\n",
+ quota, ioasid_capacity_avail);
+ return -ENOSPC;
+ }
+
+ sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
+ if (!sdata)
+ return -ENOMEM;
+
+ spin_lock(&ioasid_allocator_lock);
+
+ ret = xa_alloc(&ioasid_sets, &id, sdata,
+ XA_LIMIT(0, ioasid_capacity_avail - quota),
+ GFP_KERNEL);
+ if (ret) {
+ kfree(sdata);
+ goto error;
+ }
+
+ sdata->token = token;
+ sdata->size = quota;
+ sdata->sid = id;
+
+ /*
+ * Set Xarray is used to store IDs within the set, get ready for
+ * sub-set ID and system-wide IOASID allocation results.
+ */
+ xa_init_flags(&sdata->xa, XA_FLAGS_ALLOC);
+
+ ioasid_capacity_avail -= quota;
+ *sid = id;
+
+error:
+ spin_unlock(&ioasid_allocator_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_alloc_set);
+
+/**
+ * ioasid_free_set - Free all IOASIDs within the set
+ *
+ * @sid: The IOASID set ID to be freed
+ * @destroy_set: Whether to keep the set for further allocation.
+ * If true, the set will be destroyed.
+ *
+ * All IOASIDs allocated within the set will be freed upon return.
+ */
+void ioasid_free_set(int sid, bool destroy_set)
+{
+ struct ioasid_set_data *sdata;
+ struct ioasid_data *entry;
+ unsigned long index;
+
+ spin_lock(&ioasid_allocator_lock);
+ sdata = xa_load(&ioasid_sets, sid);
+ if (!sdata) {
+ pr_err("No IOASID set found to free %d\n", sid);
+ goto done_unlock;
+ }
+
+ if (xa_empty(&sdata->xa)) {
+ pr_warn("No IOASIDs in the set %d\n", sdata->sid);
+ goto done_destroy;
+ }
+
+ /* Just a place holder for now */
+ xa_for_each(&sdata->xa, index, entry) {
+ /* Free from per sub-set pool */
+ xa_erase(&sdata->xa, index);
+ }
+
+done_destroy:
+ if (destroy_set) {
+ xa_erase(&ioasid_sets, sid);
+
+ /* Return the quota back to system pool */
+ ioasid_capacity_avail += sdata->size;
+ kfree_rcu(sdata, rcu);
+ }
+
+done_unlock:
+ spin_unlock(&ioasid_allocator_lock);
+}
+EXPORT_SYMBOL_GPL(ioasid_free_set);
+
+
+/**
* ioasid_find - Find IOASID data
* @set: the IOASID set
* @ioasid: the IOASID to find
@@ -431,6 +555,29 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
}
EXPORT_SYMBOL_GPL(ioasid_find);

+/**
+ * ioasid_find_sid - Retrieve IOASID set ID from an ioasid
+ * Caller must hold a reference to the set.
+ *
+ * @ioasid: IOASID associated with the set
+ *
+ * Return IOASID set ID or error
+ */
+int ioasid_find_sid(ioasid_t ioasid)
+{
+ struct ioasid_data *ioasid_data;
+ int ret = 0;
+
+ spin_lock(&ioasid_allocator_lock);
+ ioasid_data = xa_load(&active_allocator->xa, ioasid);
+ ret = (ioasid_data) ? ioasid_data->sdata->sid : -ENOENT;
+
+ spin_unlock(&ioasid_allocator_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_find_sid);
+
MODULE_AUTHOR("Jean-Philippe Brucker <[email protected]>");
MODULE_AUTHOR("Jacob Pan <[email protected]>");
MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index 9711fa0dc357..be158e03c034 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -41,6 +41,9 @@ int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
int ioasid_set_data(ioasid_t ioasid, void *data);
void ioasid_install_capacity(ioasid_t total);
+int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid);
+void ioasid_free_set(int sid, bool destroy_set);
+int ioasid_find_sid(ioasid_t ioasid);
#else /* !CONFIG_IOASID */
static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
ioasid_t max, void *private)
@@ -52,6 +55,15 @@ static inline void ioasid_free(ioasid_t ioasid)
{
}

+static inline int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid)
+{
+ return -ENOTSUPP;
+}
+
+static inline void ioasid_free_set(int sid, bool destroy_set)
+{
+}
+
static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
bool (*getter)(void *))
{
@@ -75,5 +87,6 @@ static inline int ioasid_set_data(ioasid_t ioasid, void *data)
static inline void ioasid_install_capacity(ioasid_t total)
{
}
+
#endif /* CONFIG_IOASID */
#endif /* __LINUX_IOASID_H */
--
2.7.4

2020-03-26 02:13:50

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

On 2020/3/26 1:55, Jacob Pan wrote:
> IOASID set defines a group of IDs that share the same token. The
> ioasid_set concept helps to do permission checking among users as in the
> current code.
>
> With guest SVA usage, each VM has its own IOASID set. More
> functionalities are needed:
> 1. Enforce quota, each guest may be assigned limited quota such that one
> guest cannot abuse all the system resource.
> 2. Stores IOASID mapping between guest and host IOASIDs
> 3. Per set operations, e.g. free the entire set
>
> For each ioasid_set token, a unique set ID is assigned. This makes
> reference of the set and data lookup much easier to implement.
>
> Signed-off-by: Liu Yi L <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/ioasid.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/ioasid.h | 13 +++++
> 2 files changed, 160 insertions(+)
>
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 4026e52855b9..27ee57f7079b 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -10,6 +10,25 @@
> #include <linux/spinlock.h>
> #include <linux/xarray.h>
>
> +static DEFINE_XARRAY_ALLOC(ioasid_sets);
> +/**
> + * struct ioasid_set_data - Meta data about ioasid_set
> + *
> + * @token: Unique to identify an IOASID set
> + * @xa: XArray to store subset ID and IOASID mapping
> + * @size: Max number of IOASIDs can be allocated within the set
> + * @nr_ioasids Number of IOASIDs allocated in the set
> + * @sid ID of the set
> + */
> +struct ioasid_set_data {
> + struct ioasid_set *token;
> + struct xarray xa;
> + int size;
> + int nr_ioasids;
> + int sid;
> + struct rcu_head rcu;
> +};
> +
> struct ioasid_data {
> ioasid_t id;
> struct ioasid_set *set;
> @@ -388,6 +407,111 @@ void ioasid_free(ioasid_t ioasid)
> EXPORT_SYMBOL_GPL(ioasid_free);
>
> /**
> + * ioasid_alloc_set - Allocate a set of IOASIDs
> + * @token: Unique token of the IOASID set
> + * @quota: Quota allowed in this set
> + * @sid: IOASID set ID to be assigned
> + *
> + * Return 0 upon success. Token will be stored internally for lookup,
> + * IOASID allocation within the set and other per set operations will use
> + * the @sid assigned.
> + *
> + */
> +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid)
> +{
> + struct ioasid_set_data *sdata;
> + ioasid_t id;
> + int ret = 0;
> +
> + if (quota > ioasid_capacity_avail) {
> + pr_warn("Out of IOASID capacity! ask %d, avail %d\n",
> + quota, ioasid_capacity_avail);
> + return -ENOSPC;
> + }
> +
> + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> + if (!sdata)
> + return -ENOMEM;
> +
> + spin_lock(&ioasid_allocator_lock);
> +
> + ret = xa_alloc(&ioasid_sets, &id, sdata,
> + XA_LIMIT(0, ioasid_capacity_avail - quota),
> + GFP_KERNEL);
> + if (ret) {
> + kfree(sdata);
> + goto error;
> + }
> +
> + sdata->token = token;
> + sdata->size = quota;
> + sdata->sid = id;
> +
> + /*
> + * Set Xarray is used to store IDs within the set, get ready for
> + * sub-set ID and system-wide IOASID allocation results.
> + */
> + xa_init_flags(&sdata->xa, XA_FLAGS_ALLOC);
> +
> + ioasid_capacity_avail -= quota;
> + *sid = id;
> +
> +error:
> + spin_unlock(&ioasid_allocator_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc_set);
> +
> +/**
> + * ioasid_free_set - Free all IOASIDs within the set
> + *
> + * @sid: The IOASID set ID to be freed
> + * @destroy_set: Whether to keep the set for further allocation.
> + * If true, the set will be destroyed.
> + *
> + * All IOASIDs allocated within the set will be freed upon return.
> + */
> +void ioasid_free_set(int sid, bool destroy_set)
> +{
> + struct ioasid_set_data *sdata;
> + struct ioasid_data *entry;
> + unsigned long index;
> +
> + spin_lock(&ioasid_allocator_lock);
> + sdata = xa_load(&ioasid_sets, sid);
> + if (!sdata) {
> + pr_err("No IOASID set found to free %d\n", sid);
> + goto done_unlock;
> + }
> +
> + if (xa_empty(&sdata->xa)) {
> + pr_warn("No IOASIDs in the set %d\n", sdata->sid);
> + goto done_destroy;
> + }
> +
> + /* Just a place holder for now */
> + xa_for_each(&sdata->xa, index, entry) {
> + /* Free from per sub-set pool */
> + xa_erase(&sdata->xa, index);
> + }
> +
> +done_destroy:
> + if (destroy_set) {
> + xa_erase(&ioasid_sets, sid);
> +
> + /* Return the quota back to system pool */
> + ioasid_capacity_avail += sdata->size;
> + kfree_rcu(sdata, rcu);
> + }
> +
> +done_unlock:
> + spin_unlock(&ioasid_allocator_lock);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_free_set);
> +
> +
> +/**
> * ioasid_find - Find IOASID data
> * @set: the IOASID set
> * @ioasid: the IOASID to find
> @@ -431,6 +555,29 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> }
> EXPORT_SYMBOL_GPL(ioasid_find);
>
> +/**
> + * ioasid_find_sid - Retrieve IOASID set ID from an ioasid
> + * Caller must hold a reference to the set.
> + *
> + * @ioasid: IOASID associated with the set
> + *
> + * Return IOASID set ID or error
> + */
> +int ioasid_find_sid(ioasid_t ioasid)
> +{
> + struct ioasid_data *ioasid_data;
> + int ret = 0;
> +
> + spin_lock(&ioasid_allocator_lock);
> + ioasid_data = xa_load(&active_allocator->xa, ioasid);
> + ret = (ioasid_data) ? ioasid_data->sdata->sid : -ENOENT;
> +
> + spin_unlock(&ioasid_allocator_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_find_sid);
> +
> MODULE_AUTHOR("Jean-Philippe Brucker <[email protected]>");
> MODULE_AUTHOR("Jacob Pan <[email protected]>");
> MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index 9711fa0dc357..be158e03c034 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -41,6 +41,9 @@ int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
> void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> int ioasid_set_data(ioasid_t ioasid, void *data);
> void ioasid_install_capacity(ioasid_t total);
> +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid);
> +void ioasid_free_set(int sid, bool destroy_set);
> +int ioasid_find_sid(ioasid_t ioasid);
> #else /* !CONFIG_IOASID */
> static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> ioasid_t max, void *private)
> @@ -52,6 +55,15 @@ static inline void ioasid_free(ioasid_t ioasid)
> {
> }
>
> +static inline int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid)
> +{
> + return -ENOTSUPP;
> +}
> +
> +static inline void ioasid_free_set(int sid, bool destroy_set)
> +{
> +}
> +

static inline int ioasid_find_sid(ioasid_t ioasid)
{
return INVALID_IOASID;
}

Best regards,
baolu

> static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> bool (*getter)(void *))
> {
> @@ -75,5 +87,6 @@ static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> static inline void ioasid_install_capacity(ioasid_t total)
> {
> }
> +
> #endif /* CONFIG_IOASID */
> #endif /* __LINUX_IOASID_H */
>

2020-03-26 21:25:32

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

Hi Baolu,

On Thu, 26 Mar 2020 10:12:36 +0800
Lu Baolu <[email protected]> wrote:

> On 2020/3/26 1:55, Jacob Pan wrote:
> > IOASID set defines a group of IDs that share the same token. The
> > ioasid_set concept helps to do permission checking among users as
> > in the current code.
> >
> > With guest SVA usage, each VM has its own IOASID set. More
> > functionalities are needed:
> > 1. Enforce quota, each guest may be assigned limited quota such
> > that one guest cannot abuse all the system resource.
> > 2. Stores IOASID mapping between guest and host IOASIDs
> > 3. Per set operations, e.g. free the entire set
> >
> > For each ioasid_set token, a unique set ID is assigned. This makes
> > reference of the set and data lookup much easier to implement.
> >
> > Signed-off-by: Liu Yi L <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/ioasid.c | 147
> > +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/ioasid.h | 13 +++++ 2 files changed, 160
> > insertions(+)
> >
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > index 4026e52855b9..27ee57f7079b 100644
> > --- a/drivers/iommu/ioasid.c
> > +++ b/drivers/iommu/ioasid.c
> > @@ -10,6 +10,25 @@
> > #include <linux/spinlock.h>
> > #include <linux/xarray.h>
> >
> > +static DEFINE_XARRAY_ALLOC(ioasid_sets);
> > +/**
> > + * struct ioasid_set_data - Meta data about ioasid_set
> > + *
> > + * @token: Unique to identify an IOASID set
> > + * @xa: XArray to store subset ID and IOASID mapping
> > + * @size: Max number of IOASIDs can be allocated within the
> > set
> > + * @nr_ioasids Number of IOASIDs allocated in the set
> > + * @sid ID of the set
> > + */
> > +struct ioasid_set_data {
> > + struct ioasid_set *token;
> > + struct xarray xa;
> > + int size;
> > + int nr_ioasids;
> > + int sid;
> > + struct rcu_head rcu;
> > +};
> > +
> > struct ioasid_data {
> > ioasid_t id;
> > struct ioasid_set *set;
> > @@ -388,6 +407,111 @@ void ioasid_free(ioasid_t ioasid)
> > EXPORT_SYMBOL_GPL(ioasid_free);
> >
> > /**
> > + * ioasid_alloc_set - Allocate a set of IOASIDs
> > + * @token: Unique token of the IOASID set
> > + * @quota: Quota allowed in this set
> > + * @sid: IOASID set ID to be assigned
> > + *
> > + * Return 0 upon success. Token will be stored internally for
> > lookup,
> > + * IOASID allocation within the set and other per set operations
> > will use
> > + * the @sid assigned.
> > + *
> > + */
> > +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > *sid) +{
> > + struct ioasid_set_data *sdata;
> > + ioasid_t id;
> > + int ret = 0;
> > +
> > + if (quota > ioasid_capacity_avail) {
> > + pr_warn("Out of IOASID capacity! ask %d, avail
> > %d\n",
> > + quota, ioasid_capacity_avail);
> > + return -ENOSPC;
> > + }
> > +
> > + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> > + if (!sdata)
> > + return -ENOMEM;
> > +
> > + spin_lock(&ioasid_allocator_lock);
> > +
> > + ret = xa_alloc(&ioasid_sets, &id, sdata,
> > + XA_LIMIT(0, ioasid_capacity_avail - quota),
> > + GFP_KERNEL);
> > + if (ret) {
> > + kfree(sdata);
> > + goto error;
> > + }
> > +
> > + sdata->token = token;
> > + sdata->size = quota;
> > + sdata->sid = id;
> > +
> > + /*
> > + * Set Xarray is used to store IDs within the set, get
> > ready for
> > + * sub-set ID and system-wide IOASID allocation results.
> > + */
> > + xa_init_flags(&sdata->xa, XA_FLAGS_ALLOC);
> > +
> > + ioasid_capacity_avail -= quota;
> > + *sid = id;
> > +
> > +error:
> > + spin_unlock(&ioasid_allocator_lock);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_alloc_set);
> > +
> > +/**
> > + * ioasid_free_set - Free all IOASIDs within the set
> > + *
> > + * @sid: The IOASID set ID to be freed
> > + * @destroy_set: Whether to keep the set for further
> > allocation.
> > + * If true, the set will be destroyed.
> > + *
> > + * All IOASIDs allocated within the set will be freed upon return.
> > + */
> > +void ioasid_free_set(int sid, bool destroy_set)
> > +{
> > + struct ioasid_set_data *sdata;
> > + struct ioasid_data *entry;
> > + unsigned long index;
> > +
> > + spin_lock(&ioasid_allocator_lock);
> > + sdata = xa_load(&ioasid_sets, sid);
> > + if (!sdata) {
> > + pr_err("No IOASID set found to free %d\n", sid);
> > + goto done_unlock;
> > + }
> > +
> > + if (xa_empty(&sdata->xa)) {
> > + pr_warn("No IOASIDs in the set %d\n", sdata->sid);
> > + goto done_destroy;
> > + }
> > +
> > + /* Just a place holder for now */
> > + xa_for_each(&sdata->xa, index, entry) {
> > + /* Free from per sub-set pool */
> > + xa_erase(&sdata->xa, index);
> > + }
> > +
> > +done_destroy:
> > + if (destroy_set) {
> > + xa_erase(&ioasid_sets, sid);
> > +
> > + /* Return the quota back to system pool */
> > + ioasid_capacity_avail += sdata->size;
> > + kfree_rcu(sdata, rcu);
> > + }
> > +
> > +done_unlock:
> > + spin_unlock(&ioasid_allocator_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_free_set);
> > +
> > +
> > +/**
> > * ioasid_find - Find IOASID data
> > * @set: the IOASID set
> > * @ioasid: the IOASID to find
> > @@ -431,6 +555,29 @@ void *ioasid_find(struct ioasid_set *set,
> > ioasid_t ioasid, }
> > EXPORT_SYMBOL_GPL(ioasid_find);
> >
> > +/**
> > + * ioasid_find_sid - Retrieve IOASID set ID from an ioasid
> > + * Caller must hold a reference to the set.
> > + *
> > + * @ioasid: IOASID associated with the set
> > + *
> > + * Return IOASID set ID or error
> > + */
> > +int ioasid_find_sid(ioasid_t ioasid)
> > +{
> > + struct ioasid_data *ioasid_data;
> > + int ret = 0;
> > +
> > + spin_lock(&ioasid_allocator_lock);
> > + ioasid_data = xa_load(&active_allocator->xa, ioasid);
> > + ret = (ioasid_data) ? ioasid_data->sdata->sid : -ENOENT;
> > +
> > + spin_unlock(&ioasid_allocator_lock);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_find_sid);
> > +
> > MODULE_AUTHOR("Jean-Philippe Brucker
> > <[email protected]>"); MODULE_AUTHOR("Jacob Pan
> > <[email protected]>"); MODULE_DESCRIPTION("IO Address
> > Space ID (IOASID) allocator"); diff --git a/include/linux/ioasid.h
> > b/include/linux/ioasid.h index 9711fa0dc357..be158e03c034 100644
> > --- a/include/linux/ioasid.h
> > +++ b/include/linux/ioasid.h
> > @@ -41,6 +41,9 @@ int ioasid_register_allocator(struct
> > ioasid_allocator_ops *allocator); void
> > ioasid_unregister_allocator(struct ioasid_allocator_ops
> > *allocator); int ioasid_set_data(ioasid_t ioasid, void *data); void
> > ioasid_install_capacity(ioasid_t total); +int
> > ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > *sid); +void ioasid_free_set(int sid, bool destroy_set); +int
> > ioasid_find_sid(ioasid_t ioasid); #else /* !CONFIG_IOASID */
> > static inline ioasid_t ioasid_alloc(struct ioasid_set *set,
> > ioasid_t min, ioasid_t max, void *private)
> > @@ -52,6 +55,15 @@ static inline void ioasid_free(ioasid_t ioasid)
> > {
> > }
> >
> > +static inline int ioasid_alloc_set(struct ioasid_set *token,
> > ioasid_t quota, int *sid) +{
> > + return -ENOTSUPP;
> > +}
> > +
> > +static inline void ioasid_free_set(int sid, bool destroy_set)
> > +{
> > +}
> > +
>
> static inline int ioasid_find_sid(ioasid_t ioasid)
> {
> return INVALID_IOASID;
> }
>
Good catch, will add to the next version.

> Best regards,
> baolu
>
> > static inline void *ioasid_find(struct ioasid_set *set, ioasid_t
> > ioasid, bool (*getter)(void *))
> > {
> > @@ -75,5 +87,6 @@ static inline int ioasid_set_data(ioasid_t
> > ioasid, void *data) static inline void
> > ioasid_install_capacity(ioasid_t total) {
> > }
> > +
> > #endif /* CONFIG_IOASID */
> > #endif /* __LINUX_IOASID_H */
> >

[Jacob Pan]

2020-03-27 08:08:18

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 01/10] iommu/ioasid: Introduce system-wide capacity

> From: Jacob Pan <[email protected]>
> Sent: Thursday, March 26, 2020 1:55 AM
>
> IOASID is a limited system-wide resource that can be allocated at
> runtime. This limitation can be enumerated during boot. For example, on
> x86 platforms, PCI Process Address Space ID (PASID) allocation uses
> IOASID service. The number of supported PASID bits are enumerated by
> extended capability register as defined in the VT-d spec.
>
> This patch adds a helper to set the system capacity, it expected to be
> set during boot prior to any allocation request.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/ioasid.c | 15 +++++++++++++++
> include/linux/ioasid.h | 5 ++++-
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 0f8dd377aada..4026e52855b9 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -17,6 +17,21 @@ struct ioasid_data {
> struct rcu_head rcu;
> };
>
> +static ioasid_t ioasid_capacity;
> +static ioasid_t ioasid_capacity_avail;
> +
> +/* System capacity can only be set once */
> +void ioasid_install_capacity(ioasid_t total)
> +{
> + if (ioasid_capacity) {
> + pr_warn("IOASID capacity already set at %d\n",
> ioasid_capacity);
> + return;
> + }
> +
> + ioasid_capacity = ioasid_capacity_avail = total;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> +

From all the texts in this patch, looks ioasid_set_capacity might be
a better name?

> /*
> * struct ioasid_allocator_data - Internal data structure to hold information
> * about an allocator. There are two types of allocators:
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index 6f000d7a0ddc..9711fa0dc357 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -40,7 +40,7 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
> void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> int ioasid_set_data(ioasid_t ioasid, void *data);
> -
> +void ioasid_install_capacity(ioasid_t total);
> #else /* !CONFIG_IOASID */
> static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> ioasid_t max, void *private)
> @@ -72,5 +72,8 @@ static inline int ioasid_set_data(ioasid_t ioasid, void
> *data)
> return -ENOTSUPP;
> }
>
> +static inline void ioasid_install_capacity(ioasid_t total)
> +{
> +}
> #endif /* CONFIG_IOASID */
> #endif /* __LINUX_IOASID_H */
> --
> 2.7.4

2020-03-27 08:09:22

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 02/10] iommu/vt-d: Set IOASID capacity when SVM is enabled

> From: Jacob Pan <[email protected]>
> Sent: Thursday, March 26, 2020 1:55 AM
>
> Assign system-wide PASID capacity with enumerated max value.
> Currently, all Intel SVM capable devices should support full 20 bits of
> PASID value.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index a699a765c983..ec3fc121744a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -3510,6 +3510,10 @@ static int __init init_dmars(void)
> if (ret)
> goto free_iommu;
>
> + /* PASID is needed for scalable mode irrespective to SVM */
> + if (intel_iommu_sm)
> + ioasid_install_capacity(intel_pasid_max_id);
> +
> /*
> * for each drhd
> * enable fault log
> --
> 2.7.4

Reviewed-by: Kevin Tian <[email protected]>

2020-03-27 08:40:48

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

> From: Jacob Pan <[email protected]>
> Sent: Thursday, March 26, 2020 1:55 AM
>
> IOASID set defines a group of IDs that share the same token. The
> ioasid_set concept helps to do permission checking among users as in the
> current code.
>
> With guest SVA usage, each VM has its own IOASID set. More
> functionalities are needed:
> 1. Enforce quota, each guest may be assigned limited quota such that one
> guest cannot abuse all the system resource.
> 2. Stores IOASID mapping between guest and host IOASIDs
> 3. Per set operations, e.g. free the entire set
>
> For each ioasid_set token, a unique set ID is assigned. This makes
> reference of the set and data lookup much easier to implement.
>
> Signed-off-by: Liu Yi L <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/ioasid.c | 147
> +++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/ioasid.h | 13 +++++
> 2 files changed, 160 insertions(+)
>
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 4026e52855b9..27ee57f7079b 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -10,6 +10,25 @@
> #include <linux/spinlock.h>
> #include <linux/xarray.h>
>
> +static DEFINE_XARRAY_ALLOC(ioasid_sets);
> +/**
> + * struct ioasid_set_data - Meta data about ioasid_set
> + *
> + * @token: Unique to identify an IOASID set
> + * @xa: XArray to store subset ID and IOASID mapping

what is a subset? is it a different thing from set?

> + * @size: Max number of IOASIDs can be allocated within the set

'size' reads more like 'current size' instead of 'max size'. maybe call
it 'max_ioasids' to align with 'nr_ioasids'? or simplify both as
'max' and 'nr'?

> + * @nr_ioasids Number of IOASIDs allocated in the set
> + * @sid ID of the set
> + */
> +struct ioasid_set_data {
> + struct ioasid_set *token;
> + struct xarray xa;
> + int size;
> + int nr_ioasids;
> + int sid;
> + struct rcu_head rcu;
> +};
> +
> struct ioasid_data {
> ioasid_t id;
> struct ioasid_set *set;
> @@ -388,6 +407,111 @@ void ioasid_free(ioasid_t ioasid)
> EXPORT_SYMBOL_GPL(ioasid_free);
>
> /**
> + * ioasid_alloc_set - Allocate a set of IOASIDs

'a set of IOASIDS' sounds like 'many IOASIDs'. Just saying 'allocate
an IOASID set' is more clear. ????

> + * @token: Unique token of the IOASID set
> + * @quota: Quota allowed in this set
> + * @sid: IOASID set ID to be assigned
> + *
> + * Return 0 upon success. Token will be stored internally for lookup,
> + * IOASID allocation within the set and other per set operations will use
> + * the @sid assigned.
> + *
> + */
> +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid)
> +{
> + struct ioasid_set_data *sdata;
> + ioasid_t id;
> + int ret = 0;
> +
> + if (quota > ioasid_capacity_avail) {
> + pr_warn("Out of IOASID capacity! ask %d, avail %d\n",
> + quota, ioasid_capacity_avail);
> + return -ENOSPC;
> + }
> +
> + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> + if (!sdata)
> + return -ENOMEM;
> +
> + spin_lock(&ioasid_allocator_lock);
> +
> + ret = xa_alloc(&ioasid_sets, &id, sdata,
> + XA_LIMIT(0, ioasid_capacity_avail - quota),
> + GFP_KERNEL);

Interestingly I didn't find the definition of ioasid_sets. and it is not in
existing file.

I'm not sure how many sets can be created, but anyway the set
namespace is different from ioasid name space. Then why do we
use ioasid capability as the limitation for allocating set id here?

> + if (ret) {
> + kfree(sdata);
> + goto error;
> + }
> +
> + sdata->token = token;

given token must be unique, a check on any conflict is required here?

> + sdata->size = quota;
> + sdata->sid = id;
> +
> + /*
> + * Set Xarray is used to store IDs within the set, get ready for
> + * sub-set ID and system-wide IOASID allocation results.

looks 'subset' is the same thing as 'set'. let's make it consistent.

> + */
> + xa_init_flags(&sdata->xa, XA_FLAGS_ALLOC);
> +
> + ioasid_capacity_avail -= quota;
> + *sid = id;
> +
> +error:
> + spin_unlock(&ioasid_allocator_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc_set);
> +
> +/**
> + * ioasid_free_set - Free all IOASIDs within the set
> + *
> + * @sid: The IOASID set ID to be freed
> + * @destroy_set: Whether to keep the set for further allocation.
> + * If true, the set will be destroyed.
> + *
> + * All IOASIDs allocated within the set will be freed upon return.
> + */
> +void ioasid_free_set(int sid, bool destroy_set)
> +{

what is the actual usage of just freeing ioasid while keeping the
set itself?

> + struct ioasid_set_data *sdata;
> + struct ioasid_data *entry;
> + unsigned long index;
> +
> + spin_lock(&ioasid_allocator_lock);
> + sdata = xa_load(&ioasid_sets, sid);
> + if (!sdata) {
> + pr_err("No IOASID set found to free %d\n", sid);
> + goto done_unlock;
> + }
> +
> + if (xa_empty(&sdata->xa)) {
> + pr_warn("No IOASIDs in the set %d\n", sdata->sid);
> + goto done_destroy;
> + }

why is it a warning condition? it is possible that an user has done
ioasid_free for all allocated ioasids and then call this function, which
is actually the expected normal situation.

> +
> + /* Just a place holder for now */
> + xa_for_each(&sdata->xa, index, entry) {
> + /* Free from per sub-set pool */
> + xa_erase(&sdata->xa, index);
> + }

but the placeholder would lead to undesired behavior, not good for bisect.
If no support now, then should return an error if any in-use ioasid is not
freed.

> +
> +done_destroy:
> + if (destroy_set) {
> + xa_erase(&ioasid_sets, sid);
> +
> + /* Return the quota back to system pool */
> + ioasid_capacity_avail += sdata->size;
> + kfree_rcu(sdata, rcu);
> + }
> +
> +done_unlock:
> + spin_unlock(&ioasid_allocator_lock);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_free_set);
> +
> +
> +/**
> * ioasid_find - Find IOASID data
> * @set: the IOASID set
> * @ioasid: the IOASID to find
> @@ -431,6 +555,29 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t
> ioasid,
> }
> EXPORT_SYMBOL_GPL(ioasid_find);
>
> +/**
> + * ioasid_find_sid - Retrieve IOASID set ID from an ioasid
> + * Caller must hold a reference to the set.

please unify capitalization around IOASID or ioasid.

Thanks
Kevin

> + *
> + * @ioasid: IOASID associated with the set
> + *
> + * Return IOASID set ID or error
> + */
> +int ioasid_find_sid(ioasid_t ioasid)
> +{
> + struct ioasid_data *ioasid_data;
> + int ret = 0;
> +
> + spin_lock(&ioasid_allocator_lock);
> + ioasid_data = xa_load(&active_allocator->xa, ioasid);
> + ret = (ioasid_data) ? ioasid_data->sdata->sid : -ENOENT;
> +
> + spin_unlock(&ioasid_allocator_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_find_sid);
> +
> MODULE_AUTHOR("Jean-Philippe Brucker <jean-
> [email protected]>");
> MODULE_AUTHOR("Jacob Pan <[email protected]>");
> MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index 9711fa0dc357..be158e03c034 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -41,6 +41,9 @@ int ioasid_register_allocator(struct ioasid_allocator_ops
> *allocator);
> void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> int ioasid_set_data(ioasid_t ioasid, void *data);
> void ioasid_install_capacity(ioasid_t total);
> +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid);
> +void ioasid_free_set(int sid, bool destroy_set);
> +int ioasid_find_sid(ioasid_t ioasid);
> #else /* !CONFIG_IOASID */
> static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> ioasid_t max, void *private)
> @@ -52,6 +55,15 @@ static inline void ioasid_free(ioasid_t ioasid)
> {
> }
>
> +static inline int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> *sid)
> +{
> + return -ENOTSUPP;
> +}
> +
> +static inline void ioasid_free_set(int sid, bool destroy_set)
> +{
> +}
> +
> static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> bool (*getter)(void *))
> {
> @@ -75,5 +87,6 @@ static inline int ioasid_set_data(ioasid_t ioasid, void
> *data)
> static inline void ioasid_install_capacity(ioasid_t total)
> {
> }
> +
> #endif /* CONFIG_IOASID */
> #endif /* __LINUX_IOASID_H */
> --
> 2.7.4

2020-03-27 09:37:34

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 04/10] iommu/ioasid: Rename ioasid_set_data to avoid confusion with ioasid_set

> From: Jacob Pan <[email protected]>
> Sent: Thursday, March 26, 2020 1:55 AM
>
> IOASID set refers to a group of IOASIDs that shares the same token.
> ioasid_set_data() function is used to attach a private data to an IOASID,
> rename it to ioasid_attach_data() avoid being confused with the group/set
> concept.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/intel-svm.c | 11 ++++++-----
> drivers/iommu/ioasid.c | 6 +++---
> include/linux/ioasid.h | 4 ++--
> 3 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index b6405df6cfb5..1991587fd3fd 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -319,14 +319,15 @@ int intel_svm_bind_gpasid(struct iommu_domain
> *domain,
> svm->gpasid = data->gpasid;
> svm->flags |= SVM_FLAG_GUEST_PASID;
> }
> - ioasid_set_data(data->hpasid, svm);
> +
> + ioasid_attach_data(data->hpasid, svm);
> INIT_LIST_HEAD_RCU(&svm->devs);
> mmput(svm->mm);
> }
> sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> if (!sdev) {
> if (list_empty(&svm->devs)) {
> - ioasid_set_data(data->hpasid, NULL);
> + ioasid_attach_data(data->hpasid, NULL);
> kfree(svm);
> }
> ret = -ENOMEM;
> @@ -346,7 +347,7 @@ int intel_svm_bind_gpasid(struct iommu_domain
> *domain,
> * was allocated in this function.
> */
> if (list_empty(&svm->devs)) {
> - ioasid_set_data(data->hpasid, NULL);
> + ioasid_attach_data(data->hpasid, NULL);
> kfree(svm);
> }
> goto out;
> @@ -375,7 +376,7 @@ int intel_svm_bind_gpasid(struct iommu_domain
> *domain,
> */
> kfree(sdev);
> if (list_empty(&svm->devs)) {
> - ioasid_set_data(data->hpasid, NULL);
> + ioasid_attach_data(data->hpasid, NULL);
> kfree(svm);
> }
> goto out;
> @@ -438,7 +439,7 @@ int intel_svm_unbind_gpasid(struct device *dev, int
> pasid)
> * that PASID allocated by one guest cannot
> be
> * used by another.
> */
> - ioasid_set_data(pasid, NULL);
> + ioasid_attach_data(pasid, NULL);
> kfree(svm);
> }
> }
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 27ee57f7079b..6265d2dbbced 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -292,14 +292,14 @@ void ioasid_unregister_allocator(struct
> ioasid_allocator_ops *ops)
> EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
>
> /**
> - * ioasid_set_data - Set private data for an allocated ioasid
> + * ioasid_attach_data - Set private data for an allocated ioasid
> * @ioasid: the ID to set data
> * @data: the private data
> *
> * For IOASID that is already allocated, private data can be set
> * via this API. Future lookup can be done via ioasid_find.
> */
> -int ioasid_set_data(ioasid_t ioasid, void *data)
> +int ioasid_attach_data(ioasid_t ioasid, void *data)
> {
> struct ioasid_data *ioasid_data;
> int ret = 0;
> @@ -321,7 +321,7 @@ int ioasid_set_data(ioasid_t ioasid, void *data)
>
> return ret;
> }
> -EXPORT_SYMBOL_GPL(ioasid_set_data);
> +EXPORT_SYMBOL_GPL(ioasid_attach_data);
>
> /**
> * ioasid_alloc - Allocate an IOASID
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index be158e03c034..8c82d2625671 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -39,7 +39,7 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> bool (*getter)(void *));
> int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
> void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> -int ioasid_set_data(ioasid_t ioasid, void *data);
> +int ioasid_attach_data(ioasid_t ioasid, void *data);
> void ioasid_install_capacity(ioasid_t total);
> int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid);
> void ioasid_free_set(int sid, bool destroy_set);
> @@ -79,7 +79,7 @@ static inline void ioasid_unregister_allocator(struct
> ioasid_allocator_ops *allo
> {
> }
>
> -static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> +static inline int ioasid_attach_data(ioasid_t ioasid, void *data)
> {
> return -ENOTSUPP;
> }
> --
> 2.7.4

Reviewed-by: Kevin Tian <[email protected]>

2020-03-27 09:42:36

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

> From: Jacob Pan <[email protected]>
> Sent: Thursday, March 26, 2020 1:55 AM
>
> Bare metal SVA allocates IOASIDs for native process addresses. This
> should be separated from VM allocated IOASIDs thus under its own set.

A curious question. Now bare metal SVA uses the system set and guest
SVA uses dynamically-created set. Then do we still allow ioasid_alloc
with a NULL set pointer?

>
> This patch creates a system IOASID set with its quota set to PID_MAX.
> This is a reasonable default in that SVM capable devices can only bind
> to limited user processes.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 8 +++++++-
> drivers/iommu/ioasid.c | 9 +++++++++
> include/linux/ioasid.h | 9 +++++++++
> 3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index ec3fc121744a..af7a1ef7b31e 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
> goto free_iommu;
>
> /* PASID is needed for scalable mode irrespective to SVM */
> - if (intel_iommu_sm)
> + if (intel_iommu_sm) {
> ioasid_install_capacity(intel_pasid_max_id);
> + /* We should not run out of IOASIDs at boot */
> + if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
> + pr_err("Failed to enable host PASID allocator\n");
> + intel_iommu_sm = 0;
> + }
> + }
>
> /*
> * for each drhd
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 6265d2dbbced..9135af171a7c 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -39,6 +39,9 @@ struct ioasid_data {
> static ioasid_t ioasid_capacity;
> static ioasid_t ioasid_capacity_avail;
>
> +int system_ioasid_sid;
> +static DECLARE_IOASID_SET(system_ioasid);
> +
> /* System capacity can only be set once */
> void ioasid_install_capacity(ioasid_t total)
> {
> @@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
> }
> EXPORT_SYMBOL_GPL(ioasid_install_capacity);
>
> +int ioasid_alloc_system_set(int quota)
> +{
> + return ioasid_alloc_set(&system_ioasid, quota, &system_ioasid_sid);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc_system_set);
> +
> /*
> * struct ioasid_allocator_data - Internal data structure to hold information
> * about an allocator. There are two types of allocators:
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index 8c82d2625671..097b1cc043a3 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -29,6 +29,9 @@ struct ioasid_allocator_ops {
> void *pdata;
> };
>
> +/* Shared IOASID set for reserved for host system use */
> +extern int system_ioasid_sid;
> +
> #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
>
> #if IS_ENABLED(CONFIG_IOASID)
> @@ -41,6 +44,7 @@ int ioasid_register_allocator(struct ioasid_allocator_ops
> *allocator);
> void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> int ioasid_attach_data(ioasid_t ioasid, void *data);
> void ioasid_install_capacity(ioasid_t total);
> +int ioasid_alloc_system_set(int quota);
> int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid);
> void ioasid_free_set(int sid, bool destroy_set);
> int ioasid_find_sid(ioasid_t ioasid);
> @@ -88,5 +92,10 @@ static inline void ioasid_install_capacity(ioasid_t total)
> {
> }
>
> +static inline int ioasid_alloc_system_set(int quota)
> +{
> + return -ENOTSUPP;
> +}
> +
> #endif /* CONFIG_IOASID */
> #endif /* __LINUX_IOASID_H */
> --
> 2.7.4

2020-03-27 16:04:39

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 01/10] iommu/ioasid: Introduce system-wide capacity

On Fri, 27 Mar 2020 08:07:20 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan <[email protected]>
> > Sent: Thursday, March 26, 2020 1:55 AM
> >
> > IOASID is a limited system-wide resource that can be allocated at
> > runtime. This limitation can be enumerated during boot. For
> > example, on x86 platforms, PCI Process Address Space ID (PASID)
> > allocation uses IOASID service. The number of supported PASID bits
> > are enumerated by extended capability register as defined in the
> > VT-d spec.
> >
> > This patch adds a helper to set the system capacity, it expected to
> > be set during boot prior to any allocation request.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/ioasid.c | 15 +++++++++++++++
> > include/linux/ioasid.h | 5 ++++-
> > 2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > index 0f8dd377aada..4026e52855b9 100644
> > --- a/drivers/iommu/ioasid.c
> > +++ b/drivers/iommu/ioasid.c
> > @@ -17,6 +17,21 @@ struct ioasid_data {
> > struct rcu_head rcu;
> > };
> >
> > +static ioasid_t ioasid_capacity;
> > +static ioasid_t ioasid_capacity_avail;
> > +
> > +/* System capacity can only be set once */
> > +void ioasid_install_capacity(ioasid_t total)
> > +{
> > + if (ioasid_capacity) {
> > + pr_warn("IOASID capacity already set at %d\n",
> > ioasid_capacity);
> > + return;
> > + }
> > +
> > + ioasid_capacity = ioasid_capacity_avail = total;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> > +
>
> From all the texts in this patch, looks ioasid_set_capacity might be
> a better name?
>
That was my initial choice as well :), but it may get confused with
ioasid_set.

I will rephrase the text to avoid 'set'.

> > /*
> > * struct ioasid_allocator_data - Internal data structure to hold
> > information
> > * about an allocator. There are two types of allocators:
> > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > index 6f000d7a0ddc..9711fa0dc357 100644
> > --- a/include/linux/ioasid.h
> > +++ b/include/linux/ioasid.h
> > @@ -40,7 +40,7 @@ void *ioasid_find(struct ioasid_set *set,
> > ioasid_t ioasid, int ioasid_register_allocator(struct
> > ioasid_allocator_ops *allocator); void
> > ioasid_unregister_allocator(struct ioasid_allocator_ops
> > *allocator); int ioasid_set_data(ioasid_t ioasid, void *data); -
> > +void ioasid_install_capacity(ioasid_t total);
> > #else /* !CONFIG_IOASID */
> > static inline ioasid_t ioasid_alloc(struct ioasid_set *set,
> > ioasid_t min, ioasid_t max, void *private)
> > @@ -72,5 +72,8 @@ static inline int ioasid_set_data(ioasid_t
> > ioasid, void *data)
> > return -ENOTSUPP;
> > }
> >
> > +static inline void ioasid_install_capacity(ioasid_t total)
> > +{
> > +}
> > #endif /* CONFIG_IOASID */
> > #endif /* __LINUX_IOASID_H */
> > --
> > 2.7.4
>

[Jacob Pan]

2020-03-27 16:55:25

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

On Fri, 27 Mar 2020 08:38:44 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan <[email protected]>
> > Sent: Thursday, March 26, 2020 1:55 AM
> >
> > IOASID set defines a group of IDs that share the same token. The
> > ioasid_set concept helps to do permission checking among users as
> > in the current code.
> >
> > With guest SVA usage, each VM has its own IOASID set. More
> > functionalities are needed:
> > 1. Enforce quota, each guest may be assigned limited quota such
> > that one guest cannot abuse all the system resource.
> > 2. Stores IOASID mapping between guest and host IOASIDs
> > 3. Per set operations, e.g. free the entire set
> >
> > For each ioasid_set token, a unique set ID is assigned. This makes
> > reference of the set and data lookup much easier to implement.
> >
> > Signed-off-by: Liu Yi L <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/ioasid.c | 147
> > +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/ioasid.h | 13 +++++
> > 2 files changed, 160 insertions(+)
> >
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > index 4026e52855b9..27ee57f7079b 100644
> > --- a/drivers/iommu/ioasid.c
> > +++ b/drivers/iommu/ioasid.c
> > @@ -10,6 +10,25 @@
> > #include <linux/spinlock.h>
> > #include <linux/xarray.h>
> >
> > +static DEFINE_XARRAY_ALLOC(ioasid_sets);
> > +/**
> > + * struct ioasid_set_data - Meta data about ioasid_set
> > + *
> > + * @token: Unique to identify an IOASID set
> > + * @xa: XArray to store subset ID and IOASID
> > mapping
>
> what is a subset? is it a different thing from set?
>
Subset is a set, but a subset ID is an ID only valid within the set.
When we have non-identity Guest-Host PASID mapping, Subset ID is
the Guest PASID but in more general terms. Or call it "Set Private ID"

This can be confusing, perhaps I rephrase it as:
"XArray to store ioasid_set private ID to system-wide IOASID mapping"


> > + * @size: Max number of IOASIDs can be allocated within the
> > set
>
> 'size' reads more like 'current size' instead of 'max size'. maybe
> call it 'max_ioasids' to align with 'nr_ioasids'? or simplify both as
> 'max' and 'nr'?
>
Right, how about max_id and nr_id?

> > + * @nr_ioasids Number of IOASIDs allocated in the set
> > + * @sid ID of the set
> > + */
> > +struct ioasid_set_data {
> > + struct ioasid_set *token;
> > + struct xarray xa;
> > + int size;
> > + int nr_ioasids;
> > + int sid;
> > + struct rcu_head rcu;
> > +};
> > +
> > struct ioasid_data {
> > ioasid_t id;
> > struct ioasid_set *set;
> > @@ -388,6 +407,111 @@ void ioasid_free(ioasid_t ioasid)
> > EXPORT_SYMBOL_GPL(ioasid_free);
> >
> > /**
> > + * ioasid_alloc_set - Allocate a set of IOASIDs
>
> 'a set of IOASIDS' sounds like 'many IOASIDs'. Just saying 'allocate
> an IOASID set' is more clear. ????
>
Make sense

> > + * @token: Unique token of the IOASID set
> > + * @quota: Quota allowed in this set
> > + * @sid: IOASID set ID to be assigned
> > + *
> > + * Return 0 upon success. Token will be stored internally for
> > lookup,
> > + * IOASID allocation within the set and other per set operations
> > will use
> > + * the @sid assigned.
> > + *
> > + */
> > +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > *sid) +{
> > + struct ioasid_set_data *sdata;
> > + ioasid_t id;
> > + int ret = 0;
> > +
> > + if (quota > ioasid_capacity_avail) {
> > + pr_warn("Out of IOASID capacity! ask %d, avail
> > %d\n",
> > + quota, ioasid_capacity_avail);
> > + return -ENOSPC;
> > + }
> > +
> > + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> > + if (!sdata)
> > + return -ENOMEM;
> > +
> > + spin_lock(&ioasid_allocator_lock);
> > +
> > + ret = xa_alloc(&ioasid_sets, &id, sdata,
> > + XA_LIMIT(0, ioasid_capacity_avail - quota),
> > + GFP_KERNEL);
>
> Interestingly I didn't find the definition of ioasid_sets. and it is
> not in existing file.
>
It is at the beginning of this file
+static DEFINE_XARRAY_ALLOC(ioasid_sets);

> I'm not sure how many sets can be created, but anyway the set
> namespace is different from ioasid name space. Then why do we
> use ioasid capability as the limitation for allocating set id here?
>
I am assuming the worst case scenario which is one IOASID per set, that
is why the number of sets are limited by the number of system IOASIDs.

> > + if (ret) {
> > + kfree(sdata);
> > + goto error;
> > + }
> > +
> > + sdata->token = token;
>
> given token must be unique, a check on any conflict is required here?
>
Right, I will add a check to reject duplicated tokens.

/* Search existing set tokens, reject duplicates */
xa_for_each(&ioasid_sets, index, sdata) {
if (sdata->token == token) {
pr_warn("Token already exists in the set %lu\n", index);
ret = -EEXIST;
goto error;
}
}




> > + sdata->size = quota;
> > + sdata->sid = id;
> > +
> > + /*
> > + * Set Xarray is used to store IDs within the set, get
> > ready for
> > + * sub-set ID and system-wide IOASID allocation results.
>
> looks 'subset' is the same thing as 'set'. let's make it consistent.
>
Sounds good, will also rename subset ID to set private ID.

> > + */
> > + xa_init_flags(&sdata->xa, XA_FLAGS_ALLOC);
> > +
> > + ioasid_capacity_avail -= quota;
> > + *sid = id;
> > +
> > +error:
> > + spin_unlock(&ioasid_allocator_lock);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_alloc_set);
> > +
> > +/**
> > + * ioasid_free_set - Free all IOASIDs within the set
> > + *
> > + * @sid: The IOASID set ID to be freed
> > + * @destroy_set: Whether to keep the set for further
> > allocation.
> > + * If true, the set will be destroyed.
> > + *
> > + * All IOASIDs allocated within the set will be freed upon return.
> > + */
> > +void ioasid_free_set(int sid, bool destroy_set)
> > +{
>
> what is the actual usage of just freeing ioasid while keeping the
> set itself?
>
I was thinking users use mm as token can retain the ioasid_set until
mm being destroyed. This is to support some kind of lazy free.

> > + struct ioasid_set_data *sdata;
> > + struct ioasid_data *entry;
> > + unsigned long index;
> > +
> > + spin_lock(&ioasid_allocator_lock);
> > + sdata = xa_load(&ioasid_sets, sid);
> > + if (!sdata) {
> > + pr_err("No IOASID set found to free %d\n", sid);
> > + goto done_unlock;
> > + }
> > +
> > + if (xa_empty(&sdata->xa)) {
> > + pr_warn("No IOASIDs in the set %d\n", sdata->sid);
> > + goto done_destroy;
> > + }
>
> why is it a warning condition? it is possible that an user has done
> ioasid_free for all allocated ioasids and then call this function,
> which is actually the expected normal situation.
>
You are right, there is no need to warn. I will put the following
comment in place.
/* The set is already empty, we just destroy the set if requested */
if (xa_empty(&sdata->xa))
goto done_destroy;

> > +
> > + /* Just a place holder for now */
> > + xa_for_each(&sdata->xa, index, entry) {
> > + /* Free from per sub-set pool */
> > + xa_erase(&sdata->xa, index);
> > + }
>
> but the placeholder would lead to undesired behavior, not good for
> bisect. If no support now, then should return an error if any in-use
> ioasid is not freed.
>
Good point, I will return -ENOTSUPP in the place holder. Remove it
during the API conversion.

> > +
> > +done_destroy:
> > + if (destroy_set) {
> > + xa_erase(&ioasid_sets, sid);
> > +
> > + /* Return the quota back to system pool */
> > + ioasid_capacity_avail += sdata->size;
> > + kfree_rcu(sdata, rcu);
> > + }
> > +
> > +done_unlock:
> > + spin_unlock(&ioasid_allocator_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_free_set);
> > +
> > +
> > +/**
> > * ioasid_find - Find IOASID data
> > * @set: the IOASID set
> > * @ioasid: the IOASID to find
> > @@ -431,6 +555,29 @@ void *ioasid_find(struct ioasid_set *set,
> > ioasid_t ioasid,
> > }
> > EXPORT_SYMBOL_GPL(ioasid_find);
> >
> > +/**
> > + * ioasid_find_sid - Retrieve IOASID set ID from an ioasid
> > + * Caller must hold a reference to the set.
>
> please unify capitalization around IOASID or ioasid.
>
Will do.

> Thanks
> Kevin
>
> > + *
> > + * @ioasid: IOASID associated with the set
> > + *
> > + * Return IOASID set ID or error
> > + */
> > +int ioasid_find_sid(ioasid_t ioasid)
> > +{
> > + struct ioasid_data *ioasid_data;
> > + int ret = 0;
> > +
> > + spin_lock(&ioasid_allocator_lock);
> > + ioasid_data = xa_load(&active_allocator->xa, ioasid);
> > + ret = (ioasid_data) ? ioasid_data->sdata->sid : -ENOENT;
> > +
> > + spin_unlock(&ioasid_allocator_lock);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_find_sid);
> > +
> > MODULE_AUTHOR("Jean-Philippe Brucker <jean-
> > [email protected]>");
> > MODULE_AUTHOR("Jacob Pan <[email protected]>");
> > MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
> > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > index 9711fa0dc357..be158e03c034 100644
> > --- a/include/linux/ioasid.h
> > +++ b/include/linux/ioasid.h
> > @@ -41,6 +41,9 @@ int ioasid_register_allocator(struct
> > ioasid_allocator_ops *allocator);
> > void ioasid_unregister_allocator(struct ioasid_allocator_ops
> > *allocator); int ioasid_set_data(ioasid_t ioasid, void *data);
> > void ioasid_install_capacity(ioasid_t total);
> > +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > *sid); +void ioasid_free_set(int sid, bool destroy_set);
> > +int ioasid_find_sid(ioasid_t ioasid);
> > #else /* !CONFIG_IOASID */
> > static inline ioasid_t ioasid_alloc(struct ioasid_set *set,
> > ioasid_t min, ioasid_t max, void *private)
> > @@ -52,6 +55,15 @@ static inline void ioasid_free(ioasid_t ioasid)
> > {
> > }
> >
> > +static inline int ioasid_alloc_set(struct ioasid_set *token,
> > ioasid_t quota, int *sid)
> > +{
> > + return -ENOTSUPP;
> > +}
> > +
> > +static inline void ioasid_free_set(int sid, bool destroy_set)
> > +{
> > +}
> > +
> > static inline void *ioasid_find(struct ioasid_set *set, ioasid_t
> > ioasid, bool (*getter)(void *))
> > {
> > @@ -75,5 +87,6 @@ static inline int ioasid_set_data(ioasid_t
> > ioasid, void *data)
> > static inline void ioasid_install_capacity(ioasid_t total)
> > {
> > }
> > +
> > #endif /* CONFIG_IOASID */
> > #endif /* __LINUX_IOASID_H */
> > --
> > 2.7.4
>

[Jacob Pan]

2020-03-27 17:23:46

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

On Fri, 27 Mar 2020 09:41:55 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan <[email protected]>
> > Sent: Thursday, March 26, 2020 1:55 AM
> >
> > Bare metal SVA allocates IOASIDs for native process addresses. This
> > should be separated from VM allocated IOASIDs thus under its own
> > set.
>
> A curious question. Now bare metal SVA uses the system set and guest
> SVA uses dynamically-created set. Then do we still allow ioasid_alloc
> with a NULL set pointer?
>
Good point! There shouldn't be NULL set. That was one of the sticky
point in the previous allocation API. I will add a check in
ioasid_alloc_set().

However, there is still need for global search, e.g. PRS.
https://lore.kernel.org/linux-arm-kernel/[email protected]/#t

In that case, use INVALID_IOASID_SET to indicate the search is global.
e.g.
ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);

> >
> > This patch creates a system IOASID set with its quota set to
> > PID_MAX. This is a reasonable default in that SVM capable devices
> > can only bind to limited user processes.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/intel-iommu.c | 8 +++++++-
> > drivers/iommu/ioasid.c | 9 +++++++++
> > include/linux/ioasid.h | 9 +++++++++
> > 3 files changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index ec3fc121744a..af7a1ef7b31e
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
> > goto free_iommu;
> >
> > /* PASID is needed for scalable mode irrespective to SVM */
> > - if (intel_iommu_sm)
> > + if (intel_iommu_sm) {
> > ioasid_install_capacity(intel_pasid_max_id);
> > + /* We should not run out of IOASIDs at boot */
> > + if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
> > + pr_err("Failed to enable host PASID
> > allocator\n");
> > + intel_iommu_sm = 0;
> > + }
> > + }
> >
> > /*
> > * for each drhd
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > index 6265d2dbbced..9135af171a7c 100644
> > --- a/drivers/iommu/ioasid.c
> > +++ b/drivers/iommu/ioasid.c
> > @@ -39,6 +39,9 @@ struct ioasid_data {
> > static ioasid_t ioasid_capacity;
> > static ioasid_t ioasid_capacity_avail;
> >
> > +int system_ioasid_sid;
> > +static DECLARE_IOASID_SET(system_ioasid);
> > +
> > /* System capacity can only be set once */
> > void ioasid_install_capacity(ioasid_t total)
> > {
> > @@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
> > }
> > EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> >
> > +int ioasid_alloc_system_set(int quota)
> > +{
> > + return ioasid_alloc_set(&system_ioasid, quota,
> > &system_ioasid_sid); +}
> > +EXPORT_SYMBOL_GPL(ioasid_alloc_system_set);
> > +
> > /*
> > * struct ioasid_allocator_data - Internal data structure to hold
> > information
> > * about an allocator. There are two types of allocators:
> > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > index 8c82d2625671..097b1cc043a3 100644
> > --- a/include/linux/ioasid.h
> > +++ b/include/linux/ioasid.h
> > @@ -29,6 +29,9 @@ struct ioasid_allocator_ops {
> > void *pdata;
> > };
> >
> > +/* Shared IOASID set for reserved for host system use */
> > +extern int system_ioasid_sid;
> > +
> > #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> >
> > #if IS_ENABLED(CONFIG_IOASID)
> > @@ -41,6 +44,7 @@ int ioasid_register_allocator(struct
> > ioasid_allocator_ops *allocator);
> > void ioasid_unregister_allocator(struct ioasid_allocator_ops
> > *allocator); int ioasid_attach_data(ioasid_t ioasid, void *data);
> > void ioasid_install_capacity(ioasid_t total);
> > +int ioasid_alloc_system_set(int quota);
> > int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > *sid); void ioasid_free_set(int sid, bool destroy_set);
> > int ioasid_find_sid(ioasid_t ioasid);
> > @@ -88,5 +92,10 @@ static inline void
> > ioasid_install_capacity(ioasid_t total) {
> > }
> >
> > +static inline int ioasid_alloc_system_set(int quota)
> > +{
> > + return -ENOTSUPP;
> > +}
> > +
> > #endif /* CONFIG_IOASID */
> > #endif /* __LINUX_IOASID_H */
> > --
> > 2.7.4
>

[Jacob Pan]

2020-03-28 06:34:50

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

> From: Jacob Pan <[email protected]>
> Sent: Saturday, March 28, 2020 1:29 AM
>
> On Fri, 27 Mar 2020 09:41:55 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Jacob Pan <[email protected]>
> > > Sent: Thursday, March 26, 2020 1:55 AM
> > >
> > > Bare metal SVA allocates IOASIDs for native process addresses. This
> > > should be separated from VM allocated IOASIDs thus under its own
> > > set.
> >
> > A curious question. Now bare metal SVA uses the system set and guest
> > SVA uses dynamically-created set. Then do we still allow ioasid_alloc
> > with a NULL set pointer?
> >
> Good point! There shouldn't be NULL set. That was one of the sticky
> point in the previous allocation API. I will add a check in
> ioasid_alloc_set().
>
> However, there is still need for global search, e.g. PRS.
> https://lore.kernel.org/linux-arm-kernel/1d62b2e1-fe8c-066d-34e0-
> [email protected]/#t
>
> In that case, use INVALID_IOASID_SET to indicate the search is global.
> e.g.
> ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);

ok, it makes sense.

>
> > >
> > > This patch creates a system IOASID set with its quota set to
> > > PID_MAX. This is a reasonable default in that SVM capable devices
> > > can only bind to limited user processes.
> > >
> > > Signed-off-by: Jacob Pan <[email protected]>
> > > ---
> > > drivers/iommu/intel-iommu.c | 8 +++++++-
> > > drivers/iommu/ioasid.c | 9 +++++++++
> > > include/linux/ioasid.h | 9 +++++++++
> > > 3 files changed, 25 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/intel-iommu.c
> > > b/drivers/iommu/intel-iommu.c index ec3fc121744a..af7a1ef7b31e
> > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
> > > goto free_iommu;
> > >
> > > /* PASID is needed for scalable mode irrespective to SVM */
> > > - if (intel_iommu_sm)
> > > + if (intel_iommu_sm) {
> > > ioasid_install_capacity(intel_pasid_max_id);
> > > + /* We should not run out of IOASIDs at boot */
> > > + if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
> > > + pr_err("Failed to enable host PASID
> > > allocator\n");
> > > + intel_iommu_sm = 0;
> > > + }
> > > + }
> > >
> > > /*
> > > * for each drhd
> > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > > index 6265d2dbbced..9135af171a7c 100644
> > > --- a/drivers/iommu/ioasid.c
> > > +++ b/drivers/iommu/ioasid.c
> > > @@ -39,6 +39,9 @@ struct ioasid_data {
> > > static ioasid_t ioasid_capacity;
> > > static ioasid_t ioasid_capacity_avail;
> > >
> > > +int system_ioasid_sid;
> > > +static DECLARE_IOASID_SET(system_ioasid);
> > > +
> > > /* System capacity can only be set once */
> > > void ioasid_install_capacity(ioasid_t total)
> > > {
> > > @@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
> > > }
> > > EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> > >
> > > +int ioasid_alloc_system_set(int quota)
> > > +{
> > > + return ioasid_alloc_set(&system_ioasid, quota,
> > > &system_ioasid_sid); +}
> > > +EXPORT_SYMBOL_GPL(ioasid_alloc_system_set);
> > > +
> > > /*
> > > * struct ioasid_allocator_data - Internal data structure to hold
> > > information
> > > * about an allocator. There are two types of allocators:
> > > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > > index 8c82d2625671..097b1cc043a3 100644
> > > --- a/include/linux/ioasid.h
> > > +++ b/include/linux/ioasid.h
> > > @@ -29,6 +29,9 @@ struct ioasid_allocator_ops {
> > > void *pdata;
> > > };
> > >
> > > +/* Shared IOASID set for reserved for host system use */
> > > +extern int system_ioasid_sid;
> > > +
> > > #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> > >
> > > #if IS_ENABLED(CONFIG_IOASID)
> > > @@ -41,6 +44,7 @@ int ioasid_register_allocator(struct
> > > ioasid_allocator_ops *allocator);
> > > void ioasid_unregister_allocator(struct ioasid_allocator_ops
> > > *allocator); int ioasid_attach_data(ioasid_t ioasid, void *data);
> > > void ioasid_install_capacity(ioasid_t total);
> > > +int ioasid_alloc_system_set(int quota);
> > > int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > > *sid); void ioasid_free_set(int sid, bool destroy_set);
> > > int ioasid_find_sid(ioasid_t ioasid);
> > > @@ -88,5 +92,10 @@ static inline void
> > > ioasid_install_capacity(ioasid_t total) {
> > > }
> > >
> > > +static inline int ioasid_alloc_system_set(int quota)
> > > +{
> > > + return -ENOTSUPP;
> > > +}
> > > +
> > > #endif /* CONFIG_IOASID */
> > > #endif /* __LINUX_IOASID_H */
> > > --
> > > 2.7.4
> >
>
> [Jacob Pan]

2020-03-28 06:34:50

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

> From: Jacob Pan <[email protected]>
> Sent: Saturday, March 28, 2020 12:59 AM
>
> On Fri, 27 Mar 2020 08:38:44 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Jacob Pan <[email protected]>
> > > Sent: Thursday, March 26, 2020 1:55 AM
> > >
> > > IOASID set defines a group of IDs that share the same token. The
> > > ioasid_set concept helps to do permission checking among users as
> > > in the current code.
> > >
> > > With guest SVA usage, each VM has its own IOASID set. More
> > > functionalities are needed:
> > > 1. Enforce quota, each guest may be assigned limited quota such
> > > that one guest cannot abuse all the system resource.
> > > 2. Stores IOASID mapping between guest and host IOASIDs
> > > 3. Per set operations, e.g. free the entire set
> > >
> > > For each ioasid_set token, a unique set ID is assigned. This makes
> > > reference of the set and data lookup much easier to implement.
> > >
> > > Signed-off-by: Liu Yi L <[email protected]>
> > > Signed-off-by: Jacob Pan <[email protected]>
> > > ---
> > > drivers/iommu/ioasid.c | 147
> > > +++++++++++++++++++++++++++++++++++++++++++++++++
> > > include/linux/ioasid.h | 13 +++++
> > > 2 files changed, 160 insertions(+)
> > >
> > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > > index 4026e52855b9..27ee57f7079b 100644
> > > --- a/drivers/iommu/ioasid.c
> > > +++ b/drivers/iommu/ioasid.c
> > > @@ -10,6 +10,25 @@
> > > #include <linux/spinlock.h>
> > > #include <linux/xarray.h>
> > >
> > > +static DEFINE_XARRAY_ALLOC(ioasid_sets);
> > > +/**
> > > + * struct ioasid_set_data - Meta data about ioasid_set
> > > + *
> > > + * @token: Unique to identify an IOASID set
> > > + * @xa: XArray to store subset ID and IOASID
> > > mapping
> >
> > what is a subset? is it a different thing from set?
> >
> Subset is a set, but a subset ID is an ID only valid within the set.
> When we have non-identity Guest-Host PASID mapping, Subset ID is
> the Guest PASID but in more general terms. Or call it "Set Private ID"
>
> This can be confusing, perhaps I rephrase it as:
> "XArray to store ioasid_set private ID to system-wide IOASID mapping"
>
>
> > > + * @size: Max number of IOASIDs can be allocated within the
> > > set
> >
> > 'size' reads more like 'current size' instead of 'max size'. maybe
> > call it 'max_ioasids' to align with 'nr_ioasids'? or simplify both as
> > 'max' and 'nr'?
> >
> Right, how about max_id and nr_id?

sounds good.

>
> > > + * @nr_ioasids Number of IOASIDs allocated in the set
> > > + * @sid ID of the set
> > > + */
> > > +struct ioasid_set_data {
> > > + struct ioasid_set *token;
> > > + struct xarray xa;
> > > + int size;
> > > + int nr_ioasids;
> > > + int sid;
> > > + struct rcu_head rcu;
> > > +};
> > > +
> > > struct ioasid_data {
> > > ioasid_t id;
> > > struct ioasid_set *set;
> > > @@ -388,6 +407,111 @@ void ioasid_free(ioasid_t ioasid)
> > > EXPORT_SYMBOL_GPL(ioasid_free);
> > >
> > > /**
> > > + * ioasid_alloc_set - Allocate a set of IOASIDs
> >
> > 'a set of IOASIDS' sounds like 'many IOASIDs'. Just saying 'allocate
> > an IOASID set' is more clear. ????
> >
> Make sense
>
> > > + * @token: Unique token of the IOASID set
> > > + * @quota: Quota allowed in this set
> > > + * @sid: IOASID set ID to be assigned
> > > + *
> > > + * Return 0 upon success. Token will be stored internally for
> > > lookup,
> > > + * IOASID allocation within the set and other per set operations
> > > will use
> > > + * the @sid assigned.
> > > + *
> > > + */
> > > +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > > *sid) +{
> > > + struct ioasid_set_data *sdata;
> > > + ioasid_t id;
> > > + int ret = 0;
> > > +
> > > + if (quota > ioasid_capacity_avail) {
> > > + pr_warn("Out of IOASID capacity! ask %d, avail
> > > %d\n",
> > > + quota, ioasid_capacity_avail);
> > > + return -ENOSPC;
> > > + }
> > > +
> > > + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> > > + if (!sdata)
> > > + return -ENOMEM;
> > > +
> > > + spin_lock(&ioasid_allocator_lock);
> > > +
> > > + ret = xa_alloc(&ioasid_sets, &id, sdata,
> > > + XA_LIMIT(0, ioasid_capacity_avail - quota),
> > > + GFP_KERNEL);
> >
> > Interestingly I didn't find the definition of ioasid_sets. and it is
> > not in existing file.
> >
> It is at the beginning of this file
> +static DEFINE_XARRAY_ALLOC(ioasid_sets);

How did I overlook it after several checks... ????

>
> > I'm not sure how many sets can be created, but anyway the set
> > namespace is different from ioasid name space. Then why do we
> > use ioasid capability as the limitation for allocating set id here?
> >
> I am assuming the worst case scenario which is one IOASID per set, that
> is why the number of sets are limited by the number of system IOASIDs.

I feel using a static max is simpler and clearer here. Anyway the set id
is never used on hardware so it is not necessary to tie it with dynamic
IOAPIC numbers.

>
> > > + if (ret) {
> > > + kfree(sdata);
> > > + goto error;
> > > + }
> > > +
> > > + sdata->token = token;
> >
> > given token must be unique, a check on any conflict is required here?
> >
> Right, I will add a check to reject duplicated tokens.
>
> /* Search existing set tokens, reject duplicates */
> xa_for_each(&ioasid_sets, index, sdata) {
> if (sdata->token == token) {
> pr_warn("Token already exists in the set %lu\n",
> index);
> ret = -EEXIST;
> goto error;
> }
> }
>
>
>
>
> > > + sdata->size = quota;
> > > + sdata->sid = id;
> > > +
> > > + /*
> > > + * Set Xarray is used to store IDs within the set, get
> > > ready for
> > > + * sub-set ID and system-wide IOASID allocation results.
> >
> > looks 'subset' is the same thing as 'set'. let's make it consistent.
> >
> Sounds good, will also rename subset ID to set private ID.
>
> > > + */
> > > + xa_init_flags(&sdata->xa, XA_FLAGS_ALLOC);
> > > +
> > > + ioasid_capacity_avail -= quota;
> > > + *sid = id;
> > > +
> > > +error:
> > > + spin_unlock(&ioasid_allocator_lock);
> > > +
> > > + return ret;
> > > +}
> > > +EXPORT_SYMBOL_GPL(ioasid_alloc_set);
> > > +
> > > +/**
> > > + * ioasid_free_set - Free all IOASIDs within the set
> > > + *
> > > + * @sid: The IOASID set ID to be freed
> > > + * @destroy_set: Whether to keep the set for further
> > > allocation.
> > > + * If true, the set will be destroyed.
> > > + *
> > > + * All IOASIDs allocated within the set will be freed upon return.
> > > + */
> > > +void ioasid_free_set(int sid, bool destroy_set)
> > > +{
> >
> > what is the actual usage of just freeing ioasid while keeping the
> > set itself?
> >
> I was thinking users use mm as token can retain the ioasid_set until
> mm being destroyed. This is to support some kind of lazy free.
>
> > > + struct ioasid_set_data *sdata;
> > > + struct ioasid_data *entry;
> > > + unsigned long index;
> > > +
> > > + spin_lock(&ioasid_allocator_lock);
> > > + sdata = xa_load(&ioasid_sets, sid);
> > > + if (!sdata) {
> > > + pr_err("No IOASID set found to free %d\n", sid);
> > > + goto done_unlock;
> > > + }
> > > +
> > > + if (xa_empty(&sdata->xa)) {
> > > + pr_warn("No IOASIDs in the set %d\n", sdata->sid);
> > > + goto done_destroy;
> > > + }
> >
> > why is it a warning condition? it is possible that an user has done
> > ioasid_free for all allocated ioasids and then call this function,
> > which is actually the expected normal situation.
> >
> You are right, there is no need to warn. I will put the following
> comment in place.
> /* The set is already empty, we just destroy the set if requested */
> if (xa_empty(&sdata->xa))
> goto done_destroy;
>
> > > +
> > > + /* Just a place holder for now */
> > > + xa_for_each(&sdata->xa, index, entry) {
> > > + /* Free from per sub-set pool */
> > > + xa_erase(&sdata->xa, index);
> > > + }
> >
> > but the placeholder would lead to undesired behavior, not good for
> > bisect. If no support now, then should return an error if any in-use
> > ioasid is not freed.
> >
> Good point, I will return -ENOTSUPP in the place holder. Remove it
> during the API conversion.
>
> > > +
> > > +done_destroy:
> > > + if (destroy_set) {
> > > + xa_erase(&ioasid_sets, sid);
> > > +
> > > + /* Return the quota back to system pool */
> > > + ioasid_capacity_avail += sdata->size;
> > > + kfree_rcu(sdata, rcu);
> > > + }
> > > +
> > > +done_unlock:
> > > + spin_unlock(&ioasid_allocator_lock);
> > > +}
> > > +EXPORT_SYMBOL_GPL(ioasid_free_set);
> > > +
> > > +
> > > +/**
> > > * ioasid_find - Find IOASID data
> > > * @set: the IOASID set
> > > * @ioasid: the IOASID to find
> > > @@ -431,6 +555,29 @@ void *ioasid_find(struct ioasid_set *set,
> > > ioasid_t ioasid,
> > > }
> > > EXPORT_SYMBOL_GPL(ioasid_find);
> > >
> > > +/**
> > > + * ioasid_find_sid - Retrieve IOASID set ID from an ioasid
> > > + * Caller must hold a reference to the set.
> >
> > please unify capitalization around IOASID or ioasid.
> >
> Will do.
>
> > Thanks
> > Kevin
> >
> > > + *
> > > + * @ioasid: IOASID associated with the set
> > > + *
> > > + * Return IOASID set ID or error
> > > + */
> > > +int ioasid_find_sid(ioasid_t ioasid)
> > > +{
> > > + struct ioasid_data *ioasid_data;
> > > + int ret = 0;
> > > +
> > > + spin_lock(&ioasid_allocator_lock);
> > > + ioasid_data = xa_load(&active_allocator->xa, ioasid);
> > > + ret = (ioasid_data) ? ioasid_data->sdata->sid : -ENOENT;
> > > +
> > > + spin_unlock(&ioasid_allocator_lock);
> > > +
> > > + return ret;
> > > +}
> > > +EXPORT_SYMBOL_GPL(ioasid_find_sid);
> > > +
> > > MODULE_AUTHOR("Jean-Philippe Brucker <jean-
> > > [email protected]>");
> > > MODULE_AUTHOR("Jacob Pan <[email protected]>");
> > > MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
> > > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > > index 9711fa0dc357..be158e03c034 100644
> > > --- a/include/linux/ioasid.h
> > > +++ b/include/linux/ioasid.h
> > > @@ -41,6 +41,9 @@ int ioasid_register_allocator(struct
> > > ioasid_allocator_ops *allocator);
> > > void ioasid_unregister_allocator(struct ioasid_allocator_ops
> > > *allocator); int ioasid_set_data(ioasid_t ioasid, void *data);
> > > void ioasid_install_capacity(ioasid_t total);
> > > +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > > *sid); +void ioasid_free_set(int sid, bool destroy_set);
> > > +int ioasid_find_sid(ioasid_t ioasid);
> > > #else /* !CONFIG_IOASID */
> > > static inline ioasid_t ioasid_alloc(struct ioasid_set *set,
> > > ioasid_t min, ioasid_t max, void *private)
> > > @@ -52,6 +55,15 @@ static inline void ioasid_free(ioasid_t ioasid)
> > > {
> > > }
> > >
> > > +static inline int ioasid_alloc_set(struct ioasid_set *token,
> > > ioasid_t quota, int *sid)
> > > +{
> > > + return -ENOTSUPP;
> > > +}
> > > +
> > > +static inline void ioasid_free_set(int sid, bool destroy_set)
> > > +{
> > > +}
> > > +
> > > static inline void *ioasid_find(struct ioasid_set *set, ioasid_t
> > > ioasid, bool (*getter)(void *))
> > > {
> > > @@ -75,5 +87,6 @@ static inline int ioasid_set_data(ioasid_t
> > > ioasid, void *data)
> > > static inline void ioasid_install_capacity(ioasid_t total)
> > > {
> > > }
> > > +
> > > #endif /* CONFIG_IOASID */
> > > #endif /* __LINUX_IOASID_H */
> > > --
> > > 2.7.4
> >
>
> [Jacob Pan]

2020-04-01 13:48:11

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 01/10] iommu/ioasid: Introduce system-wide capacity

On Wed, Mar 25, 2020 at 10:55:22AM -0700, Jacob Pan wrote:
> IOASID is a limited system-wide resource that can be allocated at
> runtime. This limitation can be enumerated during boot. For example, on
> x86 platforms, PCI Process Address Space ID (PASID) allocation uses
> IOASID service. The number of supported PASID bits are enumerated by
> extended capability register as defined in the VT-d spec.
>
> This patch adds a helper to set the system capacity, it expected to be
> set during boot prior to any allocation request.

This one-time setting is a bit awkward. Since multiple IOMMU drivers may
be loaded, this can't be a module_init() thing. And we generally have
multiple SMMU instances in the system. So we'd need to call
install_capacity() only for the first SMMU loaded with an arbitrary 1<<20,
even though each SMMU can support different numbers of PASID bits.
Furthermore, modules such as iommu-sva will want to initialize their
IOASID set at module_init(), which will happen before the SMMU can set up
the capacity, so ioasid_alloc_set() will return an error.

How about hardcoding ioasid_capacity to 20 bits for now? It's the PCIe
limit and probably won't have to increase for a while.

Thanks,
Jean

>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/ioasid.c | 15 +++++++++++++++
> include/linux/ioasid.h | 5 ++++-
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 0f8dd377aada..4026e52855b9 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -17,6 +17,21 @@ struct ioasid_data {
> struct rcu_head rcu;
> };
>
> +static ioasid_t ioasid_capacity;
> +static ioasid_t ioasid_capacity_avail;
> +
> +/* System capacity can only be set once */
> +void ioasid_install_capacity(ioasid_t total)
> +{
> + if (ioasid_capacity) {
> + pr_warn("IOASID capacity already set at %d\n", ioasid_capacity);
> + return;
> + }
> +
> + ioasid_capacity = ioasid_capacity_avail = total;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> +
> /*
> * struct ioasid_allocator_data - Internal data structure to hold information
> * about an allocator. There are two types of allocators:
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index 6f000d7a0ddc..9711fa0dc357 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -40,7 +40,7 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
> void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> int ioasid_set_data(ioasid_t ioasid, void *data);
> -
> +void ioasid_install_capacity(ioasid_t total);
> #else /* !CONFIG_IOASID */
> static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> ioasid_t max, void *private)
> @@ -72,5 +72,8 @@ static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> return -ENOTSUPP;
> }
>
> +static inline void ioasid_install_capacity(ioasid_t total)
> +{
> +}
> #endif /* CONFIG_IOASID */
> #endif /* __LINUX_IOASID_H */
> --
> 2.7.4
>

2020-04-01 13:49:58

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

On Wed, Mar 25, 2020 at 10:55:24AM -0700, Jacob Pan wrote:
> IOASID set defines a group of IDs that share the same token. The
> ioasid_set concept helps to do permission checking among users as in the
> current code.
>
> With guest SVA usage, each VM has its own IOASID set. More
> functionalities are needed:
> 1. Enforce quota, each guest may be assigned limited quota such that one
> guest cannot abuse all the system resource.
> 2. Stores IOASID mapping between guest and host IOASIDs
> 3. Per set operations, e.g. free the entire set
>
> For each ioasid_set token, a unique set ID is assigned. This makes
> reference of the set and data lookup much easier to implement.
>
> Signed-off-by: Liu Yi L <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/ioasid.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/ioasid.h | 13 +++++
> 2 files changed, 160 insertions(+)
>
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 4026e52855b9..27ee57f7079b 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -10,6 +10,25 @@
> #include <linux/spinlock.h>
> #include <linux/xarray.h>
>
> +static DEFINE_XARRAY_ALLOC(ioasid_sets);
> +/**
> + * struct ioasid_set_data - Meta data about ioasid_set
> + *
> + * @token: Unique to identify an IOASID set
> + * @xa: XArray to store subset ID and IOASID mapping
> + * @size: Max number of IOASIDs can be allocated within the set
> + * @nr_ioasids Number of IOASIDs allocated in the set
> + * @sid ID of the set
> + */
> +struct ioasid_set_data {
> + struct ioasid_set *token;
> + struct xarray xa;
> + int size;
> + int nr_ioasids;
> + int sid;
> + struct rcu_head rcu;
> +};
> +
> struct ioasid_data {
> ioasid_t id;
> struct ioasid_set *set;
> @@ -388,6 +407,111 @@ void ioasid_free(ioasid_t ioasid)
> EXPORT_SYMBOL_GPL(ioasid_free);
>
> /**
> + * ioasid_alloc_set - Allocate a set of IOASIDs
> + * @token: Unique token of the IOASID set
> + * @quota: Quota allowed in this set
> + * @sid: IOASID set ID to be assigned
> + *
> + * Return 0 upon success. Token will be stored internally for lookup,
> + * IOASID allocation within the set and other per set operations will use
> + * the @sid assigned.
> + *
> + */
> +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid)
> +{
> + struct ioasid_set_data *sdata;
> + ioasid_t id;
> + int ret = 0;
> +
> + if (quota > ioasid_capacity_avail) {
> + pr_warn("Out of IOASID capacity! ask %d, avail %d\n",
> + quota, ioasid_capacity_avail);
> + return -ENOSPC;
> + }

This check should be in the same critical section as the quota
substraction

> +
> + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> + if (!sdata)
> + return -ENOMEM;

I don't understand why we need this structure at all, nor why we need the
SID. Users have already allocated an ioasid_set, so why not just stick the
content of ioasid_set_data in there, and pass the ioasid_set pointer to
ioasid_alloc()?

> +
> + spin_lock(&ioasid_allocator_lock);
> +
> + ret = xa_alloc(&ioasid_sets, &id, sdata,
> + XA_LIMIT(0, ioasid_capacity_avail - quota),
> + GFP_KERNEL);

Same as Kevin, I think the limit should be the static ioasid_capacity. And
perhaps a comment explaining the worst case of one PASID per set. I found
a little confusing using the same type ioasid_t for IOASIDs and IOASID
sets, it may be clearer to use an int for IOASID set IDs.

Thanks,
Jean

> + if (ret) {
> + kfree(sdata);
> + goto error;
> + }
> +
> + sdata->token = token;
> + sdata->size = quota;
> + sdata->sid = id;
> +
> + /*
> + * Set Xarray is used to store IDs within the set, get ready for
> + * sub-set ID and system-wide IOASID allocation results.
> + */
> + xa_init_flags(&sdata->xa, XA_FLAGS_ALLOC);
> +
> + ioasid_capacity_avail -= quota;
> + *sid = id;
> +
> +error:
> + spin_unlock(&ioasid_allocator_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc_set);
> +
> +/**
> + * ioasid_free_set - Free all IOASIDs within the set
> + *
> + * @sid: The IOASID set ID to be freed
> + * @destroy_set: Whether to keep the set for further allocation.
> + * If true, the set will be destroyed.
> + *
> + * All IOASIDs allocated within the set will be freed upon return.
> + */
> +void ioasid_free_set(int sid, bool destroy_set)
> +{
> + struct ioasid_set_data *sdata;
> + struct ioasid_data *entry;
> + unsigned long index;
> +
> + spin_lock(&ioasid_allocator_lock);
> + sdata = xa_load(&ioasid_sets, sid);
> + if (!sdata) {
> + pr_err("No IOASID set found to free %d\n", sid);
> + goto done_unlock;
> + }
> +
> + if (xa_empty(&sdata->xa)) {
> + pr_warn("No IOASIDs in the set %d\n", sdata->sid);
> + goto done_destroy;
> + }
> +
> + /* Just a place holder for now */
> + xa_for_each(&sdata->xa, index, entry) {
> + /* Free from per sub-set pool */
> + xa_erase(&sdata->xa, index);
> + }
> +
> +done_destroy:
> + if (destroy_set) {
> + xa_erase(&ioasid_sets, sid);
> +
> + /* Return the quota back to system pool */
> + ioasid_capacity_avail += sdata->size;
> + kfree_rcu(sdata, rcu);
> + }
> +
> +done_unlock:
> + spin_unlock(&ioasid_allocator_lock);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_free_set);
> +
> +
> +/**
> * ioasid_find - Find IOASID data
> * @set: the IOASID set
> * @ioasid: the IOASID to find
> @@ -431,6 +555,29 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> }
> EXPORT_SYMBOL_GPL(ioasid_find);
>
> +/**
> + * ioasid_find_sid - Retrieve IOASID set ID from an ioasid
> + * Caller must hold a reference to the set.
> + *
> + * @ioasid: IOASID associated with the set
> + *
> + * Return IOASID set ID or error
> + */
> +int ioasid_find_sid(ioasid_t ioasid)
> +{
> + struct ioasid_data *ioasid_data;
> + int ret = 0;
> +
> + spin_lock(&ioasid_allocator_lock);
> + ioasid_data = xa_load(&active_allocator->xa, ioasid);
> + ret = (ioasid_data) ? ioasid_data->sdata->sid : -ENOENT;
> +
> + spin_unlock(&ioasid_allocator_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_find_sid);
> +
> MODULE_AUTHOR("Jean-Philippe Brucker <[email protected]>");
> MODULE_AUTHOR("Jacob Pan <[email protected]>");
> MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index 9711fa0dc357..be158e03c034 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -41,6 +41,9 @@ int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
> void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> int ioasid_set_data(ioasid_t ioasid, void *data);
> void ioasid_install_capacity(ioasid_t total);
> +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid);
> +void ioasid_free_set(int sid, bool destroy_set);
> +int ioasid_find_sid(ioasid_t ioasid);
> #else /* !CONFIG_IOASID */
> static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> ioasid_t max, void *private)
> @@ -52,6 +55,15 @@ static inline void ioasid_free(ioasid_t ioasid)
> {
> }
>
> +static inline int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid)
> +{
> + return -ENOTSUPP;
> +}
> +
> +static inline void ioasid_free_set(int sid, bool destroy_set)
> +{
> +}
> +
> static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> bool (*getter)(void *))
> {
> @@ -75,5 +87,6 @@ static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> static inline void ioasid_install_capacity(ioasid_t total)
> {
> }
> +
> #endif /* CONFIG_IOASID */
> #endif /* __LINUX_IOASID_H */
> --
> 2.7.4
>

2020-04-01 15:07:03

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

On Wed, Mar 25, 2020 at 10:55:26AM -0700, Jacob Pan wrote:
> Bare metal SVA allocates IOASIDs for native process addresses. This
> should be separated from VM allocated IOASIDs thus under its own set.
>
> This patch creates a system IOASID set with its quota set to PID_MAX.
> This is a reasonable default in that SVM capable devices can only bind
> to limited user processes.

Yes realistically there won't be more than PID_MAX_DEFAULT=0x8000 bound
address spaces. My machine uses a PID_MAX of 4 million though, so in
theory more than 0x8000 processes may want a bond. On Arm the limit of
shared contexts per VM is currently a little less than 0x10000 (which is
the number of CPU ASIDs).

But quotas are only necessary for VMs, when the host shares the PASID
space with them (which isn't a use-case for Arm systems as far as I know,
each VM gets its own PASID space). Could we have quota-free IOASID sets
for the host?

For the SMMU I'd like to allocate two sets, one SVA and one private for
auxiliary domains, and I don't think giving either a quota makes much
sense at the moment. There can be systems using only SVA and systems using
only private PASIDs. I think it should be first-come-first-served until
admins want a knob to define a policy themselves, based on cgroups for
example.

> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 8 +++++++-
> drivers/iommu/ioasid.c | 9 +++++++++
> include/linux/ioasid.h | 9 +++++++++
> 3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index ec3fc121744a..af7a1ef7b31e 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
> goto free_iommu;
>
> /* PASID is needed for scalable mode irrespective to SVM */
> - if (intel_iommu_sm)
> + if (intel_iommu_sm) {
> ioasid_install_capacity(intel_pasid_max_id);
> + /* We should not run out of IOASIDs at boot */
> + if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
> + pr_err("Failed to enable host PASID allocator\n");
> + intel_iommu_sm = 0;
> + }
> + }
>
> /*
> * for each drhd
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 6265d2dbbced..9135af171a7c 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -39,6 +39,9 @@ struct ioasid_data {
> static ioasid_t ioasid_capacity;
> static ioasid_t ioasid_capacity_avail;
>
> +int system_ioasid_sid;
> +static DECLARE_IOASID_SET(system_ioasid);
> +
> /* System capacity can only be set once */
> void ioasid_install_capacity(ioasid_t total)
> {
> @@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
> }
> EXPORT_SYMBOL_GPL(ioasid_install_capacity);
>
> +int ioasid_alloc_system_set(int quota)
> +{
> + return ioasid_alloc_set(&system_ioasid, quota, &system_ioasid_sid);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc_system_set);

I think this helper could stay in the VT-d driver for the moment. If the
SMMU driver ever implements auxiliary domains it will use a private IOASID
set, separate from the shared IOASID set managed by iommu-sva. Both could
qualify as "system set".

Thanks,
Jean

> +
> /*
> * struct ioasid_allocator_data - Internal data structure to hold information
> * about an allocator. There are two types of allocators:
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index 8c82d2625671..097b1cc043a3 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -29,6 +29,9 @@ struct ioasid_allocator_ops {
> void *pdata;
> };
>
> +/* Shared IOASID set for reserved for host system use */
> +extern int system_ioasid_sid;
> +
> #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
>
> #if IS_ENABLED(CONFIG_IOASID)
> @@ -41,6 +44,7 @@ int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
> void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> int ioasid_attach_data(ioasid_t ioasid, void *data);
> void ioasid_install_capacity(ioasid_t total);
> +int ioasid_alloc_system_set(int quota);
> int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int *sid);
> void ioasid_free_set(int sid, bool destroy_set);
> int ioasid_find_sid(ioasid_t ioasid);
> @@ -88,5 +92,10 @@ static inline void ioasid_install_capacity(ioasid_t total)
> {
> }
>
> +static inline int ioasid_alloc_system_set(int quota)
> +{
> + return -ENOTSUPP;
> +}
> +
> #endif /* CONFIG_IOASID */
> #endif /* __LINUX_IOASID_H */
> --
> 2.7.4
>

2020-04-01 15:09:42

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 00/10] IOASID extensions for guest SVA

Hi Jacob,

On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:
> IOASID was introduced in v5.5 as a generic kernel allocator service for
> both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub Stream
> ID. In addition to basic ID allocation, ioasid_set was introduced as a
> token that is shared by a group of IOASIDs. This set token can be used
> for permission checking but lack of some features needed by guest Shared
> Virtual Address (SVA). In addition, IOASID support for life cycle
> management is needed among multiple users.
>
> This patchset introduces two extensions to the IOASID code,
> 1. IOASID set operations
> 2. Notifications for IOASID state synchronization

My main concern with this series is patch 7 changing the spinlock to a
mutex, which prevents SVA from calling ioasid_free() from the RCU callback
of MMU notifiers. Could we use atomic notifiers, or do the FREE
notification another way?

Most of my other comments are just confusion on my part, I think, as I
haven't yet properly looked through the VFIO and VT-d changes. I'd rather
avoid the change to ioasid_find() if possible, because it adds a seemingly
unnecessary indirection to the fast path, but it's probably insignificant.

Thanks,
Jean

>
> Part #1:
> IOASIDs used by each VM fits naturally into an ioasid_set. The usage
> for per set management requires the following features:
>
> - Quota enforcement - This is to prevent one VM from abusing the
> allocator to take all the system IOASIDs. Though VFIO layer can also
> enforce the quota, but it cannot cover the usage with both guest and
> host SVA on the same system.
>
> - Stores guest IOASID-Host IOASID mapping within the set. To
> support live migration, IOASID namespace should be owned by the
> guest. This requires per IOASID set look up between guest and host
> IOASIDs. This patchset does not introduce non-identity guest-host
> IOASID lookup, we merely introduce the infrastructure in per set data.
>
> - Set level operations, e.g. when a guest terminates, it is likely to
> free the entire set. Having a single place to manage the set where the
> IOASIDs are stored makes iteration much easier.
>
>
> New APIs are:
> - void ioasid_install_capacity(ioasid_t total);
> Set the system capacity prior to any allocations. On x86, VT-d driver
> calls this function to set max number of PASIDs, typically 1 million
> (20 bits).
>
> - int ioasid_alloc_system_set(int quota);
> Host system has a default ioasid_set, during boot it is expected that
> this default set is allocated with a reasonable quota, e.g. PID_MAX.
> This default/system set is used for baremetal SVA.
>
> - int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> *sid);
> Allocate a new set with a token, returned sid (set ID) will be used
> to allocate IOASIDs within the set. Allocation of IOASIDs cannot
> exceed the quota.
>
> - void ioasid_free_set(int sid, bool destroy_set);
> Free the entire set and notify all users with an option to destroy
> the set. Set ID can be used for allocation again if not destroyed.
>
> - int ioasid_find_sid(ioasid_t ioasid);
> Look up the set ID from an ioasid. There is no reference held,
> assuming set has a single owner.
>
> - int ioasid_adjust_set(int sid, int quota);
> Change the quota of the set, new quota cannot be less than the number
> of IOASIDs already allocated within the set. This is useful when
> IOASID resource needs to be balanced among VMs.
>
> Part #2
> Notification service. Since IOASIDs are used by many consumers that
> follow publisher-subscriber pattern, notification is a natural choice
> to keep states synchronized. For example, on x86 system, guest PASID
> allocation and bind call results in VFIO IOCTL that can add and change
> guest-host PASID states. At the same time, IOMMU driver and KVM need to
> maintain its own PASID contexts. In this case, VFIO is the publisher
> within the kernel, IOMMU driver and KVM are the subscribers.
>
> This patchset introduces a global blocking notifier chain and APIs to
> operate on. Not all events nor all IOASIDs are of interests to all
> subscribers. e.g. KVM is only interested in the IOASIDs within its set.
> IOMMU driver is not ioasid_set aware. A further optimization could be
> having both global and per set notifier. But consider the infrequent
> nature of bind/unbind and relatively long process life cycle, this
> optimization may not be needed at this time.
>
> To register/unregister notification blocks, use these two APIs:
> - int ioasid_add_notifier(struct notifier_block *nb);
> - void ioasid_remove_notifier(struct notifier_block *nb)
>
> To send notification on an IOASID with one of the commands (FREE,
> BIND/UNBIND, etc.), use:
> - int ioasid_notify(ioasid_t id, enum ioasid_notify_val cmd);
>
> This work is a result of collaboration with many people:
> Liu, Yi L <[email protected]>
> Wu Hao <[email protected]>
> Ashok Raj <[email protected]>
> Kevin Tian <[email protected]>
>
> Thanks,
>
> Jacob
>
> Jacob Pan (10):
> iommu/ioasid: Introduce system-wide capacity
> iommu/vt-d: Set IOASID capacity when SVM is enabled
> iommu/ioasid: Introduce per set allocation APIs
> iommu/ioasid: Rename ioasid_set_data to avoid confusion with
> ioasid_set
> iommu/ioasid: Create an IOASID set for host SVA use
> iommu/ioasid: Convert to set aware allocations
> iommu/ioasid: Use mutex instead of spinlock
> iommu/ioasid: Introduce notifier APIs
> iommu/ioasid: Support ioasid_set quota adjustment
> iommu/vt-d: Register PASID notifier for status change
>
> drivers/iommu/intel-iommu.c | 20 ++-
> drivers/iommu/intel-svm.c | 89 ++++++++--
> drivers/iommu/ioasid.c | 387 +++++++++++++++++++++++++++++++++++++++-----
> include/linux/intel-iommu.h | 1 +
> include/linux/ioasid.h | 86 +++++++++-
> 5 files changed, 522 insertions(+), 61 deletions(-)
>
> --
> 2.7.4
>

2020-04-01 22:44:58

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 01/10] iommu/ioasid: Introduce system-wide capacity

Hi Jean,

On Wed, 1 Apr 2020 15:45:52 +0200
Jean-Philippe Brucker <[email protected]> wrote:

> On Wed, Mar 25, 2020 at 10:55:22AM -0700, Jacob Pan wrote:
> > IOASID is a limited system-wide resource that can be allocated at
> > runtime. This limitation can be enumerated during boot. For
> > example, on x86 platforms, PCI Process Address Space ID (PASID)
> > allocation uses IOASID service. The number of supported PASID bits
> > are enumerated by extended capability register as defined in the
> > VT-d spec.
> >
> > This patch adds a helper to set the system capacity, it expected to
> > be set during boot prior to any allocation request.
>
> This one-time setting is a bit awkward. Since multiple IOMMU drivers
> may be loaded, this can't be a module_init() thing. And we generally
> have multiple SMMU instances in the system. So we'd need to call
> install_capacity() only for the first SMMU loaded with an arbitrary
> 1<<20, even though each SMMU can support different numbers of PASID
> bits. Furthermore, modules such as iommu-sva will want to initialize
> their IOASID set at module_init(), which will happen before the SMMU
> can set up the capacity, so ioasid_alloc_set() will return an error.
>
> How about hardcoding ioasid_capacity to 20 bits for now? It's the
> PCIe limit and probably won't have to increase for a while.
>
Sound good. Default to 20 bits but can be adjusted if needed.


> Thanks,
> Jean
>
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/ioasid.c | 15 +++++++++++++++
> > include/linux/ioasid.h | 5 ++++-
> > 2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > index 0f8dd377aada..4026e52855b9 100644
> > --- a/drivers/iommu/ioasid.c
> > +++ b/drivers/iommu/ioasid.c
> > @@ -17,6 +17,21 @@ struct ioasid_data {
> > struct rcu_head rcu;
> > };
> >
> > +static ioasid_t ioasid_capacity;
> > +static ioasid_t ioasid_capacity_avail;
> > +
> > +/* System capacity can only be set once */
> > +void ioasid_install_capacity(ioasid_t total)
> > +{
> > + if (ioasid_capacity) {
> > + pr_warn("IOASID capacity already set at %d\n",
> > ioasid_capacity);
> > + return;
> > + }
> > +
> > + ioasid_capacity = ioasid_capacity_avail = total;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> > +
> > /*
> > * struct ioasid_allocator_data - Internal data structure to hold
> > information
> > * about an allocator. There are two types of allocators:
> > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > index 6f000d7a0ddc..9711fa0dc357 100644
> > --- a/include/linux/ioasid.h
> > +++ b/include/linux/ioasid.h
> > @@ -40,7 +40,7 @@ void *ioasid_find(struct ioasid_set *set,
> > ioasid_t ioasid, int ioasid_register_allocator(struct
> > ioasid_allocator_ops *allocator); void
> > ioasid_unregister_allocator(struct ioasid_allocator_ops
> > *allocator); int ioasid_set_data(ioasid_t ioasid, void *data); -
> > +void ioasid_install_capacity(ioasid_t total);
> > #else /* !CONFIG_IOASID */
> > static inline ioasid_t ioasid_alloc(struct ioasid_set *set,
> > ioasid_t min, ioasid_t max, void *private)
> > @@ -72,5 +72,8 @@ static inline int ioasid_set_data(ioasid_t
> > ioasid, void *data) return -ENOTSUPP;
> > }
> >
> > +static inline void ioasid_install_capacity(ioasid_t total)
> > +{
> > +}
> > #endif /* CONFIG_IOASID */
> > #endif /* __LINUX_IOASID_H */
> > --
> > 2.7.4
> >

[Jacob Pan]

2020-04-01 23:35:10

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 00/10] IOASID extensions for guest SVA

On Wed, 1 Apr 2020 16:03:01 +0200
Jean-Philippe Brucker <[email protected]> wrote:

> Hi Jacob,
>
> On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:
> > IOASID was introduced in v5.5 as a generic kernel allocator service
> > for both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub
> > Stream ID. In addition to basic ID allocation, ioasid_set was
> > introduced as a token that is shared by a group of IOASIDs. This
> > set token can be used for permission checking but lack of some
> > features needed by guest Shared Virtual Address (SVA). In addition,
> > IOASID support for life cycle management is needed among multiple
> > users.
> >
> > This patchset introduces two extensions to the IOASID code,
> > 1. IOASID set operations
> > 2. Notifications for IOASID state synchronization
>
> My main concern with this series is patch 7 changing the spinlock to a
> mutex, which prevents SVA from calling ioasid_free() from the RCU
> callback of MMU notifiers. Could we use atomic notifiers, or do the
> FREE notification another way?
>
Maybe I am looking at the wrong code, I thought
mmu_notifier_ops.free_notifier() is called outside spinlock with
call_srcu(), which will be invoked in the thread context.
in mmu_notifier.c mmu_notifier_put()
spin_unlock(&mm->notifier_subscriptions->lock);

call_srcu(&srcu, &subscription->rcu, mmu_notifier_free_rcu);

Anyway, if we have to use atomic. I tried atomic notifier first, there
are two subscribers to the free event on x86.
1. IOMMU
2. KVM

For #1, the problem is that in the free operation, VT-d driver
needs to do a lot of clean up in thread context.
- hold a mutex to traverse a list of devices
- clear PASID entry and flush cache

For #2, KVM might be able to deal with spinlocks for updating VMCS
PASID translation table. +Hao

Perhaps two solutions I can think of:
1. Use a cyclic IOASID allocator. The main reason of clean up at free
is to prevent race with IOASID alloc. Similar to PID, 2M IOASID
will take long time overflow. Then we can use atomic notifier and a
deferred workqueue to do IOMMU cleanup. The downside is a large and
growing PASID table, may not be a performance issue since it has TLB.

2. Let VFIO ensure free always happen after unbind. Then there is no
need to do cleanup. But that requires VFIO to keep track of all the
PASIDs within each VM. When the VM terminates, VFIO is responsible for
the clean up. That was Yi's original proposal. I also tried to provide
an IOASID set iterator for VFIO to free the IOASIDs within each VM/set,
but the private data belongs to IOMMU driver.

Thanks,

Jacob

> Most of my other comments are just confusion on my part, I think, as I
> haven't yet properly looked through the VFIO and VT-d changes. I'd
> rather avoid the change to ioasid_find() if possible, because it adds
> a seemingly unnecessary indirection to the fast path, but it's
> probably insignificant.
>
> Thanks,
> Jean
>
> >
> > Part #1:
> > IOASIDs used by each VM fits naturally into an ioasid_set. The usage
> > for per set management requires the following features:
> >
> > - Quota enforcement - This is to prevent one VM from abusing the
> > allocator to take all the system IOASIDs. Though VFIO layer can
> > also enforce the quota, but it cannot cover the usage with both
> > guest and host SVA on the same system.
> >
> > - Stores guest IOASID-Host IOASID mapping within the set. To
> > support live migration, IOASID namespace should be owned by the
> > guest. This requires per IOASID set look up between guest and host
> > IOASIDs. This patchset does not introduce non-identity guest-host
> > IOASID lookup, we merely introduce the infrastructure in per set
> > data.
> >
> > - Set level operations, e.g. when a guest terminates, it is likely
> > to free the entire set. Having a single place to manage the set
> > where the IOASIDs are stored makes iteration much easier.
> >
> >
> > New APIs are:
> > - void ioasid_install_capacity(ioasid_t total);
> > Set the system capacity prior to any allocations. On x86, VT-d
> > driver calls this function to set max number of PASIDs, typically 1
> > million (20 bits).
> >
> > - int ioasid_alloc_system_set(int quota);
> > Host system has a default ioasid_set, during boot it is expected
> > that this default set is allocated with a reasonable quota, e.g.
> > PID_MAX. This default/system set is used for baremetal SVA.
> >
> > - int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > *sid);
> > Allocate a new set with a token, returned sid (set ID) will be
> > used to allocate IOASIDs within the set. Allocation of IOASIDs
> > cannot exceed the quota.
> >
> > - void ioasid_free_set(int sid, bool destroy_set);
> > Free the entire set and notify all users with an option to destroy
> > the set. Set ID can be used for allocation again if not destroyed.
> >
> > - int ioasid_find_sid(ioasid_t ioasid);
> > Look up the set ID from an ioasid. There is no reference held,
> > assuming set has a single owner.
> >
> > - int ioasid_adjust_set(int sid, int quota);
> > Change the quota of the set, new quota cannot be less than the
> > number of IOASIDs already allocated within the set. This is useful
> > when IOASID resource needs to be balanced among VMs.
> >
> > Part #2
> > Notification service. Since IOASIDs are used by many consumers that
> > follow publisher-subscriber pattern, notification is a natural
> > choice to keep states synchronized. For example, on x86 system,
> > guest PASID allocation and bind call results in VFIO IOCTL that can
> > add and change guest-host PASID states. At the same time, IOMMU
> > driver and KVM need to maintain its own PASID contexts. In this
> > case, VFIO is the publisher within the kernel, IOMMU driver and KVM
> > are the subscribers.
> >
> > This patchset introduces a global blocking notifier chain and APIs
> > to operate on. Not all events nor all IOASIDs are of interests to
> > all subscribers. e.g. KVM is only interested in the IOASIDs within
> > its set. IOMMU driver is not ioasid_set aware. A further
> > optimization could be having both global and per set notifier. But
> > consider the infrequent nature of bind/unbind and relatively long
> > process life cycle, this optimization may not be needed at this
> > time.
> >
> > To register/unregister notification blocks, use these two APIs:
> > - int ioasid_add_notifier(struct notifier_block *nb);
> > - void ioasid_remove_notifier(struct notifier_block *nb)
> >
> > To send notification on an IOASID with one of the commands (FREE,
> > BIND/UNBIND, etc.), use:
> > - int ioasid_notify(ioasid_t id, enum ioasid_notify_val cmd);
> >
> > This work is a result of collaboration with many people:
> > Liu, Yi L <[email protected]>
> > Wu Hao <[email protected]>
> > Ashok Raj <[email protected]>
> > Kevin Tian <[email protected]>
> >
> > Thanks,
> >
> > Jacob
> >
> > Jacob Pan (10):
> > iommu/ioasid: Introduce system-wide capacity
> > iommu/vt-d: Set IOASID capacity when SVM is enabled
> > iommu/ioasid: Introduce per set allocation APIs
> > iommu/ioasid: Rename ioasid_set_data to avoid confusion with
> > ioasid_set
> > iommu/ioasid: Create an IOASID set for host SVA use
> > iommu/ioasid: Convert to set aware allocations
> > iommu/ioasid: Use mutex instead of spinlock
> > iommu/ioasid: Introduce notifier APIs
> > iommu/ioasid: Support ioasid_set quota adjustment
> > iommu/vt-d: Register PASID notifier for status change
> >
> > drivers/iommu/intel-iommu.c | 20 ++-
> > drivers/iommu/intel-svm.c | 89 ++++++++--
> > drivers/iommu/ioasid.c | 387
> > +++++++++++++++++++++++++++++++++++++++-----
> > include/linux/intel-iommu.h | 1 + include/linux/ioasid.h |
> > 86 +++++++++- 5 files changed, 522 insertions(+), 61 deletions(-)
> >
> > --
> > 2.7.4
> >

[Jacob Pan]

2020-04-02 12:28:01

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 00/10] IOASID extensions for guest SVA

On Wed, Apr 01, 2020 at 04:38:42PM -0700, Jacob Pan wrote:
> On Wed, 1 Apr 2020 16:03:01 +0200
> Jean-Philippe Brucker <[email protected]> wrote:
>
> > Hi Jacob,
> >
> > On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:
> > > IOASID was introduced in v5.5 as a generic kernel allocator service
> > > for both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub
> > > Stream ID. In addition to basic ID allocation, ioasid_set was
> > > introduced as a token that is shared by a group of IOASIDs. This
> > > set token can be used for permission checking but lack of some
> > > features needed by guest Shared Virtual Address (SVA). In addition,
> > > IOASID support for life cycle management is needed among multiple
> > > users.
> > >
> > > This patchset introduces two extensions to the IOASID code,
> > > 1. IOASID set operations
> > > 2. Notifications for IOASID state synchronization
> >
> > My main concern with this series is patch 7 changing the spinlock to a
> > mutex, which prevents SVA from calling ioasid_free() from the RCU
> > callback of MMU notifiers. Could we use atomic notifiers, or do the
> > FREE notification another way?
> >
> Maybe I am looking at the wrong code, I thought
> mmu_notifier_ops.free_notifier() is called outside spinlock with
> call_srcu(), which will be invoked in the thread context.
> in mmu_notifier.c mmu_notifier_put()
> spin_unlock(&mm->notifier_subscriptions->lock);
>
> call_srcu(&srcu, &subscription->rcu, mmu_notifier_free_rcu);

free_notifier() is called from RCU callback, and according to
Documentation/RCU/checklist.txt:

5. If call_rcu() or call_srcu() is used, the callback function will
be called from softirq context. In particular, it cannot block.

When applying the patch I get the sleep-in-atomic warning:

[ 87.861793] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
[ 87.863293] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 74, name: kworker/6:1
[ 87.863993] 2 locks held by kworker/6:1/74:
[ 87.864493] #0: ffffff885ac12538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x740/0x1880
[ 87.865593] #1: ffffff88591efd30 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at: process_one_work+0x740/0x1880
[ 87.866993] CPU: 6 PID: 74 Comm: kworker/6:1 Not tainted 5.6.0-next-20200331+ #121
[ 87.867393] Hardware name: FVP Base (DT)
[ 87.867893] Workqueue: rcu_gp srcu_invoke_callbacks
[ 87.868393] Call trace:
[ 87.868793] dump_backtrace+0x0/0x310
[ 87.869293] show_stack+0x14/0x20
[ 87.869693] dump_stack+0x124/0x180
[ 87.870193] ___might_sleep+0x2ac/0x428
[ 87.870693] __might_sleep+0x88/0x168
[ 87.871094] __mutex_lock+0xa0/0x1270
[ 87.871593] mutex_lock_nested+0x1c/0x28
[ 87.872093] ioasid_free+0x28/0x48
[ 87.872493] io_mm_free+0x1d0/0x608
[ 87.872993] mmu_notifier_free_rcu+0x74/0xe8
[ 87.873393] srcu_invoke_callbacks+0x1d0/0x2c8
[ 87.873893] process_one_work+0x858/0x1880
[ 87.874393] worker_thread+0x314/0xcd0
[ 87.874793] kthread+0x318/0x400
[ 87.875293] ret_from_fork+0x10/0x18

>
> Anyway, if we have to use atomic. I tried atomic notifier first, there
> are two subscribers to the free event on x86.
> 1. IOMMU
> 2. KVM
>
> For #1, the problem is that in the free operation, VT-d driver
> needs to do a lot of clean up in thread context.
> - hold a mutex to traverse a list of devices
> - clear PASID entry and flush cache
>
> For #2, KVM might be able to deal with spinlocks for updating VMCS
> PASID translation table. +Hao
>
> Perhaps two solutions I can think of:
> 1. Use a cyclic IOASID allocator. The main reason of clean up at free
> is to prevent race with IOASID alloc. Similar to PID, 2M IOASID
> will take long time overflow. Then we can use atomic notifier and a
> deferred workqueue to do IOMMU cleanup. The downside is a large and
> growing PASID table, may not be a performance issue since it has TLB.

That might be a problem for SMMU, which has 1024 * 64kB leaf PASID tables,
for a total of 64MB per endpoint if there is too much fragmentation in
the IOASID space.

> 2. Let VFIO ensure free always happen after unbind. Then there is no
> need to do cleanup. But that requires VFIO to keep track of all the
> PASIDs within each VM. When the VM terminates, VFIO is responsible for
> the clean up. That was Yi's original proposal. I also tried to provide
> an IOASID set iterator for VFIO to free the IOASIDs within each VM/set,
> but the private data belongs to IOMMU driver.

Not really my place to comment on this, but I find it nicer to use the
same gpasid_unbind() path when VFIO frees a PASID as when the guest
explicitly unbinds before freeing.

Thanks,
Jean

2020-04-02 16:57:08

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 00/10] IOASID extensions for guest SVA

On Thu, 2 Apr 2020 14:26:33 +0200
Jean-Philippe Brucker <[email protected]> wrote:

> On Wed, Apr 01, 2020 at 04:38:42PM -0700, Jacob Pan wrote:
> > On Wed, 1 Apr 2020 16:03:01 +0200
> > Jean-Philippe Brucker <[email protected]> wrote:
> >
> > > Hi Jacob,
> > >
> > > On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:
> > > > IOASID was introduced in v5.5 as a generic kernel allocator
> > > > service for both PCIe Process Address Space ID (PASID) and ARM
> > > > SMMU's Sub Stream ID. In addition to basic ID allocation,
> > > > ioasid_set was introduced as a token that is shared by a group
> > > > of IOASIDs. This set token can be used for permission checking
> > > > but lack of some features needed by guest Shared Virtual
> > > > Address (SVA). In addition, IOASID support for life cycle
> > > > management is needed among multiple users.
> > > >
> > > > This patchset introduces two extensions to the IOASID code,
> > > > 1. IOASID set operations
> > > > 2. Notifications for IOASID state synchronization
> > >
> > > My main concern with this series is patch 7 changing the spinlock
> > > to a mutex, which prevents SVA from calling ioasid_free() from
> > > the RCU callback of MMU notifiers. Could we use atomic notifiers,
> > > or do the FREE notification another way?
> > >
> > Maybe I am looking at the wrong code, I thought
> > mmu_notifier_ops.free_notifier() is called outside spinlock with
> > call_srcu(), which will be invoked in the thread context.
> > in mmu_notifier.c mmu_notifier_put()
> > spin_unlock(&mm->notifier_subscriptions->lock);
> >
> > call_srcu(&srcu, &subscription->rcu,
> > mmu_notifier_free_rcu);
>
> free_notifier() is called from RCU callback, and according to
> Documentation/RCU/checklist.txt:
>
> 5. If call_rcu() or call_srcu() is used, the callback function
> will be called from softirq context. In particular, it cannot block.
>
> When applying the patch I get the sleep-in-atomic warning:
>
> [ 87.861793] BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:935 [ 87.863293] in_atomic(): 1,
> irqs_disabled(): 0, non_block: 0, pid: 74, name: kworker/6:1
> [ 87.863993] 2 locks held by kworker/6:1/74: [ 87.864493] #0:
> ffffff885ac12538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at:
> process_one_work+0x740/0x1880 [ 87.865593] #1: ffffff88591efd30
> ((work_completion)(&sdp->work)){+.+.}-{0:0}, at:
> process_one_work+0x740/0x1880 [ 87.866993] CPU: 6 PID: 74 Comm:
> kworker/6:1 Not tainted 5.6.0-next-20200331+ #121 [ 87.867393]
> Hardware name: FVP Base (DT) [ 87.867893] Workqueue: rcu_gp
> srcu_invoke_callbacks [ 87.868393] Call trace: [ 87.868793]
> dump_backtrace+0x0/0x310 [ 87.869293] show_stack+0x14/0x20
> [ 87.869693] dump_stack+0x124/0x180 [ 87.870193]
> ___might_sleep+0x2ac/0x428 [ 87.870693] __might_sleep+0x88/0x168
> [ 87.871094] __mutex_lock+0xa0/0x1270
> [ 87.871593] mutex_lock_nested+0x1c/0x28
> [ 87.872093] ioasid_free+0x28/0x48
> [ 87.872493] io_mm_free+0x1d0/0x608
> [ 87.872993] mmu_notifier_free_rcu+0x74/0xe8
> [ 87.873393] srcu_invoke_callbacks+0x1d0/0x2c8
> [ 87.873893] process_one_work+0x858/0x1880
> [ 87.874393] worker_thread+0x314/0xcd0
> [ 87.874793] kthread+0x318/0x400
> [ 87.875293] ret_from_fork+0x10/0x18
>
You are right, I was reading call_srcu comments too fast. I guess
rcu callbacks are still in softirq not offloaded to kernel threads.

*
* The callback will be invoked from process context, but must
* nevertheless be fast and must not block.
*/
So even atomic works in principle but not a good idea since it may take
a long time.

> >
> > Anyway, if we have to use atomic. I tried atomic notifier first,
> > there are two subscribers to the free event on x86.
> > 1. IOMMU
> > 2. KVM
> >
> > For #1, the problem is that in the free operation, VT-d driver
> > needs to do a lot of clean up in thread context.
> > - hold a mutex to traverse a list of devices
> > - clear PASID entry and flush cache
> >
> > For #2, KVM might be able to deal with spinlocks for updating VMCS
> > PASID translation table. +Hao
> >
> > Perhaps two solutions I can think of:
> > 1. Use a cyclic IOASID allocator. The main reason of clean up at
> > free is to prevent race with IOASID alloc. Similar to PID, 2M IOASID
> > will take long time overflow. Then we can use atomic notifier and a
> > deferred workqueue to do IOMMU cleanup. The downside is a large and
> > growing PASID table, may not be a performance issue since it has
> > TLB.
>
> That might be a problem for SMMU, which has 1024 * 64kB leaf PASID
> tables, for a total of 64MB per endpoint if there is too much
> fragmentation in the IOASID space.
>
OK. Not an option here :(

> > 2. Let VFIO ensure free always happen after unbind. Then there is no
> > need to do cleanup. But that requires VFIO to keep track of all the
> > PASIDs within each VM. When the VM terminates, VFIO is responsible
> > for the clean up. That was Yi's original proposal. I also tried to
> > provide an IOASID set iterator for VFIO to free the IOASIDs within
> > each VM/set, but the private data belongs to IOMMU driver.
>
> Not really my place to comment on this, but I find it nicer to use the
> same gpasid_unbind() path when VFIO frees a PASID as when the guest
> explicitly unbinds before freeing.
>

Might be the only option now.

Thanks,

Jacob

> Thanks,
> Jean
>

[Jacob Pan]

2020-04-06 15:29:56

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

Hi Jean,

On Wed, 1 Apr 2020 15:53:16 +0200
Jean-Philippe Brucker <[email protected]> wrote:

> On Wed, Mar 25, 2020 at 10:55:26AM -0700, Jacob Pan wrote:
> > Bare metal SVA allocates IOASIDs for native process addresses. This
> > should be separated from VM allocated IOASIDs thus under its own
> > set.
> >
> > This patch creates a system IOASID set with its quota set to
> > PID_MAX. This is a reasonable default in that SVM capable devices
> > can only bind to limited user processes.
>
> Yes realistically there won't be more than PID_MAX_DEFAULT=0x8000
> bound address spaces. My machine uses a PID_MAX of 4 million though,
> so in theory more than 0x8000 processes may want a bond.
Got it, I assume we can adjust the system set quota as necessary.

> On Arm the
> limit of shared contexts per VM is currently a little less than
> 0x10000 (which is the number of CPU ASIDs).
>
I guess shared contexts means shared address? then it makes sense
#IOASID < #ASID.

> But quotas are only necessary for VMs, when the host shares the PASID
> space with them (which isn't a use-case for Arm systems as far as I
> know, each VM gets its own PASID space).
Is there a host-guest PASID translation? or the PASID used by the VM is
physical PASID? When a page request comes in to SMMU, how does it know
the owner of the PASID if PASID range can overlap between host and
guest?

> Could we have quota-free IOASID sets for the host?
>
Yes, perhaps just add a flag such that the set has its own namespace.
You mean have this quota-free IOASID set even co-exist with VMs? I still
don't get how PRQ works.

That is not the use case for VT-d in that we have to have system-wide
allocation for host PASIDs. We have enqcmd which can take a PASID from
the per task MSR and deliver to multiple devices, so even though the
PASID table is per device the PASID name space must be global.

> For the SMMU I'd like to allocate two sets, one SVA and one private
> for auxiliary domains, and I don't think giving either a quota makes
> much sense at the moment.
I agree we don;t need the quota if we don't support guest SVA at the
same time.

So the sva set and aux_domain set PASIDs have their own namespaces?

> There can be systems using only SVA and
> systems using only private PASIDs. I think it should be
> first-come-first-served until admins want a knob to define a policy
> themselves, based on cgroups for example.
>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/intel-iommu.c | 8 +++++++-
> > drivers/iommu/ioasid.c | 9 +++++++++
> > include/linux/ioasid.h | 9 +++++++++
> > 3 files changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index ec3fc121744a..af7a1ef7b31e
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
> > goto free_iommu;
> >
> > /* PASID is needed for scalable mode irrespective to SVM */
> > - if (intel_iommu_sm)
> > + if (intel_iommu_sm) {
> > ioasid_install_capacity(intel_pasid_max_id);
> > + /* We should not run out of IOASIDs at boot */
> > + if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
> > + pr_err("Failed to enable host PASID
> > allocator\n");
> > + intel_iommu_sm = 0;
> > + }
> > + }
> >
> > /*
> > * for each drhd
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > index 6265d2dbbced..9135af171a7c 100644
> > --- a/drivers/iommu/ioasid.c
> > +++ b/drivers/iommu/ioasid.c
> > @@ -39,6 +39,9 @@ struct ioasid_data {
> > static ioasid_t ioasid_capacity;
> > static ioasid_t ioasid_capacity_avail;
> >
> > +int system_ioasid_sid;
> > +static DECLARE_IOASID_SET(system_ioasid);
> > +
> > /* System capacity can only be set once */
> > void ioasid_install_capacity(ioasid_t total)
> > {
> > @@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
> > }
> > EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> >
> > +int ioasid_alloc_system_set(int quota)
> > +{
> > + return ioasid_alloc_set(&system_ioasid, quota,
> > &system_ioasid_sid); +}
> > +EXPORT_SYMBOL_GPL(ioasid_alloc_system_set);
>
> I think this helper could stay in the VT-d driver for the moment. If
> the SMMU driver ever implements auxiliary domains it will use a
> private IOASID set, separate from the shared IOASID set managed by
> iommu-sva. Both could qualify as "system set".
>
Sounds good. Perhaps remove the special "system set". SVA code,
VFIO, VT-d, or SMMU driver can all allocate their own sets.
So to meet both SMMU and VT-d requirements, we should do:
1. add an IOASID_PRIVATE flag to ioasid_alloc_set(), indicating this is
a private set
2. All APIs operate on the set_id accordingly, e.g. ioasid_find() will
only search within the private set. Private set is excluded from from
global search (VT-d needs this in PRQ).

Since VT-d already needs private PASIDs for guest SVM where
GPASID!=HPASID, I feel we can just reuse the per ioasid_set Xarray for
both quota-free private set and guest set.


Thanks for the feedback!

Jacob

> Thanks,
> Jean
>
> > +
> > /*
> > * struct ioasid_allocator_data - Internal data structure to hold
> > information
> > * about an allocator. There are two types of allocators:
> > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > index 8c82d2625671..097b1cc043a3 100644
> > --- a/include/linux/ioasid.h
> > +++ b/include/linux/ioasid.h
> > @@ -29,6 +29,9 @@ struct ioasid_allocator_ops {
> > void *pdata;
> > };
> >
> > +/* Shared IOASID set for reserved for host system use */
> > +extern int system_ioasid_sid;
> > +
> > #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> >
> > #if IS_ENABLED(CONFIG_IOASID)
> > @@ -41,6 +44,7 @@ int ioasid_register_allocator(struct
> > ioasid_allocator_ops *allocator); void
> > ioasid_unregister_allocator(struct ioasid_allocator_ops
> > *allocator); int ioasid_attach_data(ioasid_t ioasid, void *data);
> > void ioasid_install_capacity(ioasid_t total); +int
> > ioasid_alloc_system_set(int quota); int ioasid_alloc_set(struct
> > ioasid_set *token, ioasid_t quota, int *sid); void
> > ioasid_free_set(int sid, bool destroy_set); int
> > ioasid_find_sid(ioasid_t ioasid); @@ -88,5 +92,10 @@ static inline
> > void ioasid_install_capacity(ioasid_t total) {
> > }
> >
> > +static inline int ioasid_alloc_system_set(int quota)
> > +{
> > + return -ENOTSUPP;
> > +}
> > +
> > #endif /* CONFIG_IOASID */
> > #endif /* __LINUX_IOASID_H */
> > --
> > 2.7.4
> >

[Jacob Pan]

2020-04-06 19:57:39

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

On Wed, 1 Apr 2020 15:47:45 +0200
Jean-Philippe Brucker <[email protected]> wrote:

> On Wed, Mar 25, 2020 at 10:55:24AM -0700, Jacob Pan wrote:
> > IOASID set defines a group of IDs that share the same token. The
> > ioasid_set concept helps to do permission checking among users as
> > in the current code.
> >
> > With guest SVA usage, each VM has its own IOASID set. More
> > functionalities are needed:
> > 1. Enforce quota, each guest may be assigned limited quota such
> > that one guest cannot abuse all the system resource.
> > 2. Stores IOASID mapping between guest and host IOASIDs
> > 3. Per set operations, e.g. free the entire set
> >
> > For each ioasid_set token, a unique set ID is assigned. This makes
> > reference of the set and data lookup much easier to implement.
> >
> > Signed-off-by: Liu Yi L <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/ioasid.c | 147
> > +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/ioasid.h | 13 +++++ 2 files changed, 160
> > insertions(+)
> >
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > index 4026e52855b9..27ee57f7079b 100644
> > --- a/drivers/iommu/ioasid.c
> > +++ b/drivers/iommu/ioasid.c
> > @@ -10,6 +10,25 @@
> > #include <linux/spinlock.h>
> > #include <linux/xarray.h>
> >
> > +static DEFINE_XARRAY_ALLOC(ioasid_sets);
> > +/**
> > + * struct ioasid_set_data - Meta data about ioasid_set
> > + *
> > + * @token: Unique to identify an IOASID set
> > + * @xa: XArray to store subset ID and IOASID mapping
> > + * @size: Max number of IOASIDs can be allocated within the
> > set
> > + * @nr_ioasids Number of IOASIDs allocated in the set
> > + * @sid ID of the set
> > + */
> > +struct ioasid_set_data {
> > + struct ioasid_set *token;
> > + struct xarray xa;
> > + int size;
> > + int nr_ioasids;
> > + int sid;
> > + struct rcu_head rcu;
> > +};
> > +
> > struct ioasid_data {
> > ioasid_t id;
> > struct ioasid_set *set;
> > @@ -388,6 +407,111 @@ void ioasid_free(ioasid_t ioasid)
> > EXPORT_SYMBOL_GPL(ioasid_free);
> >
> > /**
> > + * ioasid_alloc_set - Allocate a set of IOASIDs
> > + * @token: Unique token of the IOASID set
> > + * @quota: Quota allowed in this set
> > + * @sid: IOASID set ID to be assigned
> > + *
> > + * Return 0 upon success. Token will be stored internally for
> > lookup,
> > + * IOASID allocation within the set and other per set operations
> > will use
> > + * the @sid assigned.
> > + *
> > + */
> > +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > *sid) +{
> > + struct ioasid_set_data *sdata;
> > + ioasid_t id;
> > + int ret = 0;
> > +
> > + if (quota > ioasid_capacity_avail) {
> > + pr_warn("Out of IOASID capacity! ask %d, avail
> > %d\n",
> > + quota, ioasid_capacity_avail);
> > + return -ENOSPC;
> > + }
>
> This check should be in the same critical section as the quota
> substraction
>
yes, right.

> > +
> > + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> > + if (!sdata)
> > + return -ENOMEM;
>
> I don't understand why we need this structure at all, nor why we need
> the SID. Users have already allocated an ioasid_set, so why not just
> stick the content of ioasid_set_data in there, and pass the
> ioasid_set pointer to ioasid_alloc()?
>

My thinking was that ioasid_set is an opaque user token, e.g. we use mm
to identify a common set belong to a VM.

This sdata is an IOASID internal structure for managing & servicing per
set data. If we let user fill in the content, some of the entries need
to be managed by the IOASID code under a lock. IMO, not suitable to let
user allocate and manage.

Perhaps we should rename struct ioasid_set to ioasid_set_token?

/**
* struct ioasid_set_data - Meta data about ioasid_set
*
* @token: Unique to identify an IOASID set
* @xa: XArray to store ioasid_set private ID to
system-wide IOASID
* mapping
* @max_id: Max number of IOASIDs can be allocated within the set
* @nr_id Number of IOASIDs allocated in the set
* @sid ID of the set
*/
struct ioasid_set_data {
struct ioasid_set *token;
struct xarray xa;
int size;
int nr_ioasids;
int sid;
struct rcu_head rcu;
};


I agree SID is optional. User could just use the token to reference
the set. I use the SID for performance reason, i.e. quick lookup from
SID to its data. Otherwise, we may have to search through a list of
sets to find a match?

> > +
> > + spin_lock(&ioasid_allocator_lock);
> > +
> > + ret = xa_alloc(&ioasid_sets, &id, sdata,
> > + XA_LIMIT(0, ioasid_capacity_avail - quota),
> > + GFP_KERNEL);
>
> Same as Kevin, I think the limit should be the static
> ioasid_capacity. And perhaps a comment explaining the worst case of
> one PASID per set. I found a little confusing using the same type
> ioasid_t for IOASIDs and IOASID sets, it may be clearer to use an int
> for IOASID set IDs.
>
Sounds good.

> Thanks,
> Jean
>
> > + if (ret) {
> > + kfree(sdata);
> > + goto error;
> > + }
> > +
> > + sdata->token = token;
> > + sdata->size = quota;
> > + sdata->sid = id;
> > +
> > + /*
> > + * Set Xarray is used to store IDs within the set, get
> > ready for
> > + * sub-set ID and system-wide IOASID allocation results.
> > + */
> > + xa_init_flags(&sdata->xa, XA_FLAGS_ALLOC);
> > +
> > + ioasid_capacity_avail -= quota;
> > + *sid = id;
> > +
> > +error:
> > + spin_unlock(&ioasid_allocator_lock);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_alloc_set);
> > +
> > +/**
> > + * ioasid_free_set - Free all IOASIDs within the set
> > + *
> > + * @sid: The IOASID set ID to be freed
> > + * @destroy_set: Whether to keep the set for further
> > allocation.
> > + * If true, the set will be destroyed.
> > + *
> > + * All IOASIDs allocated within the set will be freed upon return.
> > + */
> > +void ioasid_free_set(int sid, bool destroy_set)
> > +{
> > + struct ioasid_set_data *sdata;
> > + struct ioasid_data *entry;
> > + unsigned long index;
> > +
> > + spin_lock(&ioasid_allocator_lock);
> > + sdata = xa_load(&ioasid_sets, sid);
> > + if (!sdata) {
> > + pr_err("No IOASID set found to free %d\n", sid);
> > + goto done_unlock;
> > + }
> > +
> > + if (xa_empty(&sdata->xa)) {
> > + pr_warn("No IOASIDs in the set %d\n", sdata->sid);
> > + goto done_destroy;
> > + }
> > +
> > + /* Just a place holder for now */
> > + xa_for_each(&sdata->xa, index, entry) {
> > + /* Free from per sub-set pool */
> > + xa_erase(&sdata->xa, index);
> > + }
> > +
> > +done_destroy:
> > + if (destroy_set) {
> > + xa_erase(&ioasid_sets, sid);
> > +
> > + /* Return the quota back to system pool */
> > + ioasid_capacity_avail += sdata->size;
> > + kfree_rcu(sdata, rcu);
> > + }
> > +
> > +done_unlock:
> > + spin_unlock(&ioasid_allocator_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_free_set);
> > +
> > +
> > +/**
> > * ioasid_find - Find IOASID data
> > * @set: the IOASID set
> > * @ioasid: the IOASID to find
> > @@ -431,6 +555,29 @@ void *ioasid_find(struct ioasid_set *set,
> > ioasid_t ioasid, }
> > EXPORT_SYMBOL_GPL(ioasid_find);
> >
> > +/**
> > + * ioasid_find_sid - Retrieve IOASID set ID from an ioasid
> > + * Caller must hold a reference to the set.
> > + *
> > + * @ioasid: IOASID associated with the set
> > + *
> > + * Return IOASID set ID or error
> > + */
> > +int ioasid_find_sid(ioasid_t ioasid)
> > +{
> > + struct ioasid_data *ioasid_data;
> > + int ret = 0;
> > +
> > + spin_lock(&ioasid_allocator_lock);
> > + ioasid_data = xa_load(&active_allocator->xa, ioasid);
> > + ret = (ioasid_data) ? ioasid_data->sdata->sid : -ENOENT;
> > +
> > + spin_unlock(&ioasid_allocator_lock);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_find_sid);
> > +
> > MODULE_AUTHOR("Jean-Philippe Brucker
> > <[email protected]>"); MODULE_AUTHOR("Jacob Pan
> > <[email protected]>"); MODULE_DESCRIPTION("IO Address
> > Space ID (IOASID) allocator"); diff --git a/include/linux/ioasid.h
> > b/include/linux/ioasid.h index 9711fa0dc357..be158e03c034 100644
> > --- a/include/linux/ioasid.h
> > +++ b/include/linux/ioasid.h
> > @@ -41,6 +41,9 @@ int ioasid_register_allocator(struct
> > ioasid_allocator_ops *allocator); void
> > ioasid_unregister_allocator(struct ioasid_allocator_ops
> > *allocator); int ioasid_set_data(ioasid_t ioasid, void *data); void
> > ioasid_install_capacity(ioasid_t total); +int
> > ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > *sid); +void ioasid_free_set(int sid, bool destroy_set); +int
> > ioasid_find_sid(ioasid_t ioasid); #else /* !CONFIG_IOASID */
> > static inline ioasid_t ioasid_alloc(struct ioasid_set *set,
> > ioasid_t min, ioasid_t max, void *private)
> > @@ -52,6 +55,15 @@ static inline void ioasid_free(ioasid_t ioasid)
> > {
> > }
> >
> > +static inline int ioasid_alloc_set(struct ioasid_set *token,
> > ioasid_t quota, int *sid) +{
> > + return -ENOTSUPP;
> > +}
> > +
> > +static inline void ioasid_free_set(int sid, bool destroy_set)
> > +{
> > +}
> > +
> > static inline void *ioasid_find(struct ioasid_set *set, ioasid_t
> > ioasid, bool (*getter)(void *))
> > {
> > @@ -75,5 +87,6 @@ static inline int ioasid_set_data(ioasid_t
> > ioasid, void *data) static inline void
> > ioasid_install_capacity(ioasid_t total) {
> > }
> > +
> > #endif /* CONFIG_IOASID */
> > #endif /* __LINUX_IOASID_H */
> > --
> > 2.7.4
> >

[Jacob Pan]

2020-04-07 11:02:15

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

On Mon, Apr 06, 2020 at 01:02:45PM -0700, Jacob Pan wrote:
> > > + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> > > + if (!sdata)
> > > + return -ENOMEM;
> >
> > I don't understand why we need this structure at all, nor why we need
> > the SID. Users have already allocated an ioasid_set, so why not just
> > stick the content of ioasid_set_data in there, and pass the
> > ioasid_set pointer to ioasid_alloc()?
> >
>
> My thinking was that ioasid_set is an opaque user token, e.g. we use mm
> to identify a common set belong to a VM.
>
> This sdata is an IOASID internal structure for managing & servicing per
> set data. If we let user fill in the content, some of the entries need
> to be managed by the IOASID code under a lock.

We don't have to let users fill the content. A bit like iommu_domain:
device drivers don't modify it, they pass it to iommu_map() rather than
passing a domain ID.

> IMO, not suitable to let user allocate and manage.
>
> Perhaps we should rename struct ioasid_set to ioasid_set_token?

Is the token actually used anywhere? As far as I can tell VFIO does its
own uniqueness check before calling ioasid_alloc_set(), and consumers of
notifications don't read the token.

>
> /**
> * struct ioasid_set_data - Meta data about ioasid_set
> *
> * @token: Unique to identify an IOASID set
> * @xa: XArray to store ioasid_set private ID to
> system-wide IOASID
> * mapping
> * @max_id: Max number of IOASIDs can be allocated within the set
> * @nr_id Number of IOASIDs allocated in the set
> * @sid ID of the set
> */
> struct ioasid_set_data {
> struct ioasid_set *token;
> struct xarray xa;
> int size;
> int nr_ioasids;
> int sid;
> struct rcu_head rcu;
> };

How about we remove the current ioasid_set, call this structure ioasid_set
instead of ioasid_set_data, and have ioasid_alloc_set() return it, rather
than requiring users to allocate the ioasid_set themselves?

struct ioasid_set *ioasid_alloc_set(ioasid_t quota):

This way ioasid_set is opaque to users (we could have the definition in
ioasid.c), but it can be passed to ioasid_alloc() and avoids the lookup by
SID. Could also add the unique token as a void * argument to
ioasid_alloc_set(), if needed.

Thanks,
Jean

2020-04-07 11:03:24

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

On Mon, Apr 06, 2020 at 08:33:53AM -0700, Jacob Pan wrote:
> Hi Jean,
>
> On Wed, 1 Apr 2020 15:53:16 +0200
> Jean-Philippe Brucker <[email protected]> wrote:
>
> > On Wed, Mar 25, 2020 at 10:55:26AM -0700, Jacob Pan wrote:
> > > Bare metal SVA allocates IOASIDs for native process addresses. This
> > > should be separated from VM allocated IOASIDs thus under its own
> > > set.
> > >
> > > This patch creates a system IOASID set with its quota set to
> > > PID_MAX. This is a reasonable default in that SVM capable devices
> > > can only bind to limited user processes.
> >
> > Yes realistically there won't be more than PID_MAX_DEFAULT=0x8000
> > bound address spaces. My machine uses a PID_MAX of 4 million though,
> > so in theory more than 0x8000 processes may want a bond.
> Got it, I assume we can adjust the system set quota as necessary.
>
> > On Arm the
> > limit of shared contexts per VM is currently a little less than
> > 0x10000 (which is the number of CPU ASIDs).
> >
> I guess shared contexts means shared address? then it makes sense
> #IOASID < #ASID.

Yes by shared contexts I mean shared address spaces. Theoretically #ASID <
#IOASID for us, because the max ASID size is 16-bit.

>
> > But quotas are only necessary for VMs, when the host shares the PASID
> > space with them (which isn't a use-case for Arm systems as far as I
> > know, each VM gets its own PASID space).
> Is there a host-guest PASID translation? or the PASID used by the VM is
> physical PASID? When a page request comes in to SMMU, how does it know
> the owner of the PASID if PASID range can overlap between host and
> guest?

We assign PCI functions to VMs, so Page Requests are routed with
RID:PASID, not PASID alone. The SMMU finds the struct device associated
with the RID, and submits the fault with iommu_report_device_fault(). If
the VF is assigned to a VM, then the page request gets injected into the
VM, otherwise it uses the host IOPF handler

> > Could we have quota-free IOASID sets for the host?
> >
> Yes, perhaps just add a flag such that the set has its own namespace.
> You mean have this quota-free IOASID set even co-exist with VMs? I still
> don't get how PRQ works.
>
> That is not the use case for VT-d in that we have to have system-wide
> allocation for host PASIDs. We have enqcmd which can take a PASID from
> the per task MSR and deliver to multiple devices, so even though the
> PASID table is per device the PASID name space must be global.
>
> > For the SMMU I'd like to allocate two sets, one SVA and one private
> > for auxiliary domains, and I don't think giving either a quota makes
> > much sense at the moment.
> I agree we don;t need the quota if we don't support guest SVA at the
> same time.
>
> So the sva set and aux_domain set PASIDs have their own namespaces?

They share the same PASID space, but they store different objects
(mm_struct and context descriptor, respectively) so they need different
ioasid_set tokens.

>
> > There can be systems using only SVA and
> > systems using only private PASIDs. I think it should be
> > first-come-first-served until admins want a knob to define a policy
> > themselves, based on cgroups for example.
> >
> > > Signed-off-by: Jacob Pan <[email protected]>
> > > ---
> > > drivers/iommu/intel-iommu.c | 8 +++++++-
> > > drivers/iommu/ioasid.c | 9 +++++++++
> > > include/linux/ioasid.h | 9 +++++++++
> > > 3 files changed, 25 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/intel-iommu.c
> > > b/drivers/iommu/intel-iommu.c index ec3fc121744a..af7a1ef7b31e
> > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
> > > goto free_iommu;
> > >
> > > /* PASID is needed for scalable mode irrespective to SVM */
> > > - if (intel_iommu_sm)
> > > + if (intel_iommu_sm) {
> > > ioasid_install_capacity(intel_pasid_max_id);
> > > + /* We should not run out of IOASIDs at boot */
> > > + if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
> > > + pr_err("Failed to enable host PASID
> > > allocator\n");
> > > + intel_iommu_sm = 0;
> > > + }
> > > + }
> > >
> > > /*
> > > * for each drhd
> > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > > index 6265d2dbbced..9135af171a7c 100644
> > > --- a/drivers/iommu/ioasid.c
> > > +++ b/drivers/iommu/ioasid.c
> > > @@ -39,6 +39,9 @@ struct ioasid_data {
> > > static ioasid_t ioasid_capacity;
> > > static ioasid_t ioasid_capacity_avail;
> > >
> > > +int system_ioasid_sid;
> > > +static DECLARE_IOASID_SET(system_ioasid);
> > > +
> > > /* System capacity can only be set once */
> > > void ioasid_install_capacity(ioasid_t total)
> > > {
> > > @@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
> > > }
> > > EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> > >
> > > +int ioasid_alloc_system_set(int quota)
> > > +{
> > > + return ioasid_alloc_set(&system_ioasid, quota,
> > > &system_ioasid_sid); +}
> > > +EXPORT_SYMBOL_GPL(ioasid_alloc_system_set);
> >
> > I think this helper could stay in the VT-d driver for the moment. If
> > the SMMU driver ever implements auxiliary domains it will use a
> > private IOASID set, separate from the shared IOASID set managed by
> > iommu-sva. Both could qualify as "system set".
> >
> Sounds good. Perhaps remove the special "system set". SVA code,
> VFIO, VT-d, or SMMU driver can all allocate their own sets.
> So to meet both SMMU and VT-d requirements, we should do:
> 1. add an IOASID_PRIVATE flag to ioasid_alloc_set(), indicating this is
> a private set
> 2. All APIs operate on the set_id accordingly, e.g. ioasid_find() will
> only search within the private set. Private set is excluded from from
> global search (VT-d needs this in PRQ).
>
> Since VT-d already needs private PASIDs for guest SVM where
> GPASID!=HPASID, I feel we can just reuse the per ioasid_set Xarray for
> both quota-free private set and guest set.

Ok I think this sounds fine

Thanks,
Jean

>
>
> Thanks for the feedback!
>
> Jacob
>
> > Thanks,
> > Jean
> >
> > > +
> > > /*
> > > * struct ioasid_allocator_data - Internal data structure to hold
> > > information
> > > * about an allocator. There are two types of allocators:
> > > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > > index 8c82d2625671..097b1cc043a3 100644
> > > --- a/include/linux/ioasid.h
> > > +++ b/include/linux/ioasid.h
> > > @@ -29,6 +29,9 @@ struct ioasid_allocator_ops {
> > > void *pdata;
> > > };
> > >
> > > +/* Shared IOASID set for reserved for host system use */
> > > +extern int system_ioasid_sid;
> > > +
> > > #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> > >
> > > #if IS_ENABLED(CONFIG_IOASID)
> > > @@ -41,6 +44,7 @@ int ioasid_register_allocator(struct
> > > ioasid_allocator_ops *allocator); void
> > > ioasid_unregister_allocator(struct ioasid_allocator_ops
> > > *allocator); int ioasid_attach_data(ioasid_t ioasid, void *data);
> > > void ioasid_install_capacity(ioasid_t total); +int
> > > ioasid_alloc_system_set(int quota); int ioasid_alloc_set(struct
> > > ioasid_set *token, ioasid_t quota, int *sid); void
> > > ioasid_free_set(int sid, bool destroy_set); int
> > > ioasid_find_sid(ioasid_t ioasid); @@ -88,5 +92,10 @@ static inline
> > > void ioasid_install_capacity(ioasid_t total) {
> > > }
> > >
> > > +static inline int ioasid_alloc_system_set(int quota)
> > > +{
> > > + return -ENOTSUPP;
> > > +}
> > > +
> > > #endif /* CONFIG_IOASID */
> > > #endif /* __LINUX_IOASID_H */
> > > --
> > > 2.7.4
> > >
>
> [Jacob Pan]

2020-04-15 03:04:16

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

Hi Jean,

Sorry for the delay, I have to do some research based on your feedback.
I also plan to document this in the next version.


On Tue, 7 Apr 2020 13:01:46 +0200
Jean-Philippe Brucker <[email protected]> wrote:

> On Mon, Apr 06, 2020 at 08:33:53AM -0700, Jacob Pan wrote:
> > Hi Jean,
> >
> > On Wed, 1 Apr 2020 15:53:16 +0200
> > Jean-Philippe Brucker <[email protected]> wrote:
> >
> > > On Wed, Mar 25, 2020 at 10:55:26AM -0700, Jacob Pan wrote:
> > > > Bare metal SVA allocates IOASIDs for native process addresses.
> > > > This should be separated from VM allocated IOASIDs thus under
> > > > its own set.
> > > >
> > > > This patch creates a system IOASID set with its quota set to
> > > > PID_MAX. This is a reasonable default in that SVM capable
> > > > devices can only bind to limited user processes.
> > >
> > > Yes realistically there won't be more than PID_MAX_DEFAULT=0x8000
> > > bound address spaces. My machine uses a PID_MAX of 4 million
> > > though, so in theory more than 0x8000 processes may want a bond.
> > Got it, I assume we can adjust the system set quota as necessary.
> >
> > > On Arm the
> > > limit of shared contexts per VM is currently a little less than
> > > 0x10000 (which is the number of CPU ASIDs).
> > >
> > I guess shared contexts means shared address? then it makes sense
> > #IOASID < #ASID.
>
> Yes by shared contexts I mean shared address spaces. Theoretically
> #ASID < #IOASID for us, because the max ASID size is 16-bit.
>
Got it.

> >
> > > But quotas are only necessary for VMs, when the host shares the
> > > PASID space with them (which isn't a use-case for Arm systems as
> > > far as I know, each VM gets its own PASID space).
> > Is there a host-guest PASID translation? or the PASID used by the
> > VM is physical PASID? When a page request comes in to SMMU, how
> > does it know the owner of the PASID if PASID range can overlap
> > between host and guest?
>
> We assign PCI functions to VMs, so Page Requests are routed with
> RID:PASID, not PASID alone. The SMMU finds the struct device
> associated with the RID, and submits the fault with
> iommu_report_device_fault(). If the VF is assigned to a VM, then the
> page request gets injected into the VM, otherwise it uses the host
> IOPF handler
>
Got it, VM private PASID space works then.
For VM, the IOASID search is within the VM ioasid_set.
For SVA, the IOASID search is within host default set.
Should be faster than global search once we have per set xarray.
I guess the PASID table is per VM instead of per RID (device)? Sorry if
you already answered it before.


> > > Could we have quota-free IOASID sets for the host?
> > >
> > Yes, perhaps just add a flag such that the set has its own
> > namespace. You mean have this quota-free IOASID set even co-exist
> > with VMs? I still don't get how PRQ works.
> >
> > That is not the use case for VT-d in that we have to have
> > system-wide allocation for host PASIDs. We have enqcmd which can
> > take a PASID from the per task MSR and deliver to multiple devices,
> > so even though the PASID table is per device the PASID name space
> > must be global.
> > > For the SMMU I'd like to allocate two sets, one SVA and one
> > > private for auxiliary domains, and I don't think giving either a
> > > quota makes much sense at the moment.
> > I agree we don;t need the quota if we don't support guest SVA at the
> > same time.
> >
> > So the sva set and aux_domain set PASIDs have their own
> > namespaces?
>
> They share the same PASID space, but they store different objects
> (mm_struct and context descriptor, respectively) so they need
> different ioasid_set tokens.
>
Got it.

> >
> > > There can be systems using only SVA and
> > > systems using only private PASIDs. I think it should be
> > > first-come-first-served until admins want a knob to define a
> > > policy themselves, based on cgroups for example.
> > >
> > > > Signed-off-by: Jacob Pan <[email protected]>
> > > > ---
> > > > drivers/iommu/intel-iommu.c | 8 +++++++-
> > > > drivers/iommu/ioasid.c | 9 +++++++++
> > > > include/linux/ioasid.h | 9 +++++++++
> > > > 3 files changed, 25 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/iommu/intel-iommu.c
> > > > b/drivers/iommu/intel-iommu.c index ec3fc121744a..af7a1ef7b31e
> > > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > > +++ b/drivers/iommu/intel-iommu.c
> > > > @@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
> > > > goto free_iommu;
> > > >
> > > > /* PASID is needed for scalable mode irrespective to
> > > > SVM */
> > > > - if (intel_iommu_sm)
> > > > + if (intel_iommu_sm) {
> > > > ioasid_install_capacity(intel_pasid_max_id);
> > > > + /* We should not run out of IOASIDs at boot */
> > > > + if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
> > > > + pr_err("Failed to enable host PASID
> > > > allocator\n");
> > > > + intel_iommu_sm = 0;
> > > > + }
> > > > + }
> > > >
> > > > /*
> > > > * for each drhd
> > > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > > > index 6265d2dbbced..9135af171a7c 100644
> > > > --- a/drivers/iommu/ioasid.c
> > > > +++ b/drivers/iommu/ioasid.c
> > > > @@ -39,6 +39,9 @@ struct ioasid_data {
> > > > static ioasid_t ioasid_capacity;
> > > > static ioasid_t ioasid_capacity_avail;
> > > >
> > > > +int system_ioasid_sid;
> > > > +static DECLARE_IOASID_SET(system_ioasid);
> > > > +
> > > > /* System capacity can only be set once */
> > > > void ioasid_install_capacity(ioasid_t total)
> > > > {
> > > > @@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
> > > > }
> > > > EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> > > >
> > > > +int ioasid_alloc_system_set(int quota)
> > > > +{
> > > > + return ioasid_alloc_set(&system_ioasid, quota,
> > > > &system_ioasid_sid); +}
> > > > +EXPORT_SYMBOL_GPL(ioasid_alloc_system_set);
> > >
> > > I think this helper could stay in the VT-d driver for the moment.
> > > If the SMMU driver ever implements auxiliary domains it will use a
> > > private IOASID set, separate from the shared IOASID set managed by
> > > iommu-sva. Both could qualify as "system set".
> > >
> > Sounds good. Perhaps remove the special "system set". SVA code,
> > VFIO, VT-d, or SMMU driver can all allocate their own sets.
> > So to meet both SMMU and VT-d requirements, we should do:
> > 1. add an IOASID_PRIVATE flag to ioasid_alloc_set(), indicating
> > this is a private set
> > 2. All APIs operate on the set_id accordingly, e.g. ioasid_find()
> > will only search within the private set. Private set is excluded
> > from from global search (VT-d needs this in PRQ).
> >
> > Since VT-d already needs private PASIDs for guest SVM where
> > GPASID!=HPASID, I feel we can just reuse the per ioasid_set Xarray
> > for both quota-free private set and guest set.
>
> Ok I think this sounds fine
>
> Thanks,
> Jean
>
> >
> >
> > Thanks for the feedback!
> >
> > Jacob
> >
> > > Thanks,
> > > Jean
> > >
> > > > +
> > > > /*
> > > > * struct ioasid_allocator_data - Internal data structure to
> > > > hold information
> > > > * about an allocator. There are two types of allocators:
> > > > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > > > index 8c82d2625671..097b1cc043a3 100644
> > > > --- a/include/linux/ioasid.h
> > > > +++ b/include/linux/ioasid.h
> > > > @@ -29,6 +29,9 @@ struct ioasid_allocator_ops {
> > > > void *pdata;
> > > > };
> > > >
> > > > +/* Shared IOASID set for reserved for host system use */
> > > > +extern int system_ioasid_sid;
> > > > +
> > > > #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> > > >
> > > > #if IS_ENABLED(CONFIG_IOASID)
> > > > @@ -41,6 +44,7 @@ int ioasid_register_allocator(struct
> > > > ioasid_allocator_ops *allocator); void
> > > > ioasid_unregister_allocator(struct ioasid_allocator_ops
> > > > *allocator); int ioasid_attach_data(ioasid_t ioasid, void
> > > > *data); void ioasid_install_capacity(ioasid_t total); +int
> > > > ioasid_alloc_system_set(int quota); int ioasid_alloc_set(struct
> > > > ioasid_set *token, ioasid_t quota, int *sid); void
> > > > ioasid_free_set(int sid, bool destroy_set); int
> > > > ioasid_find_sid(ioasid_t ioasid); @@ -88,5 +92,10 @@ static
> > > > inline void ioasid_install_capacity(ioasid_t total) {
> > > > }
> > > >
> > > > +static inline int ioasid_alloc_system_set(int quota)
> > > > +{
> > > > + return -ENOTSUPP;
> > > > +}
> > > > +
> > > > #endif /* CONFIG_IOASID */
> > > > #endif /* __LINUX_IOASID_H */
> > > > --
> > > > 2.7.4
> > > >
> >
> > [Jacob Pan]

[Jacob Pan]

2020-04-16 00:15:56

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

On Mon, Apr 13, 2020 at 03:06:31PM -0700, Jacob Pan wrote:
> > > > But quotas are only necessary for VMs, when the host shares the
> > > > PASID space with them (which isn't a use-case for Arm systems as
> > > > far as I know, each VM gets its own PASID space).
> > > Is there a host-guest PASID translation? or the PASID used by the
> > > VM is physical PASID? When a page request comes in to SMMU, how
> > > does it know the owner of the PASID if PASID range can overlap
> > > between host and guest?
> >
> > We assign PCI functions to VMs, so Page Requests are routed with
> > RID:PASID, not PASID alone. The SMMU finds the struct device
> > associated with the RID, and submits the fault with
> > iommu_report_device_fault(). If the VF is assigned to a VM, then the
> > page request gets injected into the VM, otherwise it uses the host
> > IOPF handler
> >
> Got it, VM private PASID space works then.
> For VM, the IOASID search is within the VM ioasid_set.
> For SVA, the IOASID search is within host default set.
> Should be faster than global search once we have per set xarray.
> I guess the PASID table is per VM instead of per RID (device)? Sorry if
> you already answered it before.

The PASID table is per IOMMU domain, so it's closer to per RID than per
VM, unless userspace puts all devices in the same VFIO container (hence in
the same IOMMU domain).

Thanks,
Jean

2020-04-21 21:49:15

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

Hi Jean,

Sorry for the late reply, been trying to redesign the notification part.

On Tue, 7 Apr 2020 13:01:07 +0200
Jean-Philippe Brucker <[email protected]> wrote:

> On Mon, Apr 06, 2020 at 01:02:45PM -0700, Jacob Pan wrote:
> > > > + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> > > > + if (!sdata)
> > > > + return -ENOMEM;
> > >
> > > I don't understand why we need this structure at all, nor why we
> > > need the SID. Users have already allocated an ioasid_set, so why
> > > not just stick the content of ioasid_set_data in there, and pass
> > > the ioasid_set pointer to ioasid_alloc()?
> > >
> >
> > My thinking was that ioasid_set is an opaque user token, e.g. we
> > use mm to identify a common set belong to a VM.
> >
> > This sdata is an IOASID internal structure for managing & servicing
> > per set data. If we let user fill in the content, some of the
> > entries need to be managed by the IOASID code under a lock.
>
> We don't have to let users fill the content. A bit like iommu_domain:
> device drivers don't modify it, they pass it to iommu_map() rather
> than passing a domain ID.
>
much better.

> > IMO, not suitable to let user allocate and manage.
> >
> > Perhaps we should rename struct ioasid_set to ioasid_set_token?
>
> Is the token actually used anywhere? As far as I can tell VFIO does
> its own uniqueness check before calling ioasid_alloc_set(), and
> consumers of notifications don't read the token.
>
for vt-d, the per vm token (preferrably mm) will be used by kvm to
manage its PASID translation table.
when kvm receives a notification about a new guest-host PASID mapping,
it needs to know which vm it belongs to. So if mm is used as token,
both vfio and kvm can identify PASID ownership.

> >
> > /**
> > * struct ioasid_set_data - Meta data about ioasid_set
> > *
> > * @token: Unique to identify an IOASID set
> > * @xa: XArray to store ioasid_set private ID to
> > system-wide IOASID
> > * mapping
> > * @max_id: Max number of IOASIDs can be allocated within
> > the set
> > * @nr_id Number of IOASIDs allocated in the set
> > * @sid ID of the set
> > */
> > struct ioasid_set_data {
> > struct ioasid_set *token;
> > struct xarray xa;
> > int size;
> > int nr_ioasids;
> > int sid;
> > struct rcu_head rcu;
> > };
>
> How about we remove the current ioasid_set, call this structure
> ioasid_set instead of ioasid_set_data, and have ioasid_alloc_set()
> return it, rather than requiring users to allocate the ioasid_set
> themselves?
>
> struct ioasid_set *ioasid_alloc_set(ioasid_t quota):
>
> This way ioasid_set is opaque to users (we could have the definition
> in ioasid.c), but it can be passed to ioasid_alloc() and avoids the
> lookup by SID. Could also add the unique token as a void * argument to
> ioasid_alloc_set(), if needed.
>
Sounds good. still pass a token. Thanks for the idea.

> Thanks,
> Jean

[Jacob Pan]